Statistics in Public Administration Study Guide

Ace your homework & exams now with Quizwiz!

Average

"typical" or "middle" -> Central tendency

Six Steps Involved in Evaluation

(1) define goals and activities (2) identify key relationships you'll be studying (3) Determine which research designs you'll be utilizing (4) Define and measure study concepts. (5) collect and analyze data needed (6) Present findings with honesty.

Identify four specific ways in which data that have not been thoroughly cleaned may be problematic.

(1) if a patron inputed their birthday into a survey incorrectly, it could result in an impractical age (like, 200 years old), which could throw off data a great deal. (2) if there is an inaccurate input on how many surveys were completed, it could result in inaccurate data conclusions.

Arithmetic mean

(4 + 3 + 1 + 6 + 1 + 7)/6 = 22/6 =3 4/6 = 3 2/3 = 3.666

Median

1 1 3 4 6 7 = 3.5

Mode

4 3 1 6 1 7 = 1

Give examples of basic and applied research questions that might be raised in the context of (1) a program to reduce adult illiteracy and (2) a program that fights international terrorism. (SI)

A basic research question that may be asked in a program to reduce adult illiteracy may include: What are the outcomes of this program? An applied research question that may be asked in a program to reduce adult illiteracy may include: What can be done to encourage adult literacy? A basic research question that may be asked by a program that fights international terrorism may include: What causes terrorism? An applied research question that may be asked by a program that fights international terrorism may include: What can be done to minimize international terrorist attacks?

What is a boxplot? For what purpose is it used? (Chapter 7 appendix)

A boxplot is a graphical device that shows various measures of dispersion. Boxplots are useful for obtaining a quick, visual, preliminary understanding of data; they are also useful tools for data cleaning. Statistics associated with boxplots are calculated based on the location of data.

What is a census?

A census is a survey or count (tally) of an entire group or population.

Histograms

A histogram displays numerical data by grouping data into "bins" of equal width. Each bin is plotted as a bar whose height corresponds to how many data points are in that bin. Bins are also sometimes called "intervals", "classes", or "buckets".

What is a histogram? How is it different from a bar chart?

A histogram shows the number of observations in different categories (or values) of the variable. Analysts can define the number (or widths) of categories that are used to group the different values of the variable. Bar charts are similar, but they show the number of observations for each different value of the variable. By convention, histograms are used for continuous variables, and bar charts are used for categorical variables.

What are administrative data, and for what purposes are they used?

Administrative data are generated in the course of managing programs and activities. Traditionally, administrative data are used to (1) ensure that resources are not misused, (2) monitor the status of activities, and (3) provide a record of what has been completed and accomplished. Today, administrative data are also collected to (4) meet the needs of performance measurement. These purposes may also be necessary for grant or contract compliance.

There are several practices known to further the integrity and analysis of communication.

Analysis should aim to be honest, objective, accurate and complete. You should never hide facts, alter data, create false results, or create biased results. You should honestly present the whole picture of your analysis. A failure to follow these practices could result in scientific misconduct. It is important to always be honest, forthcoming, and ethical.

Discuss the use of bar charts, pie charts, and line graphs.

Bar charts show the frequency of occurrences through stacks, which can be used to highlight the importance of categories (values). Bar charts are used with ordinal- and nominal-level variables. Pie charts typically are used to focus on equality: Who gets most (or least) of what? Pie charts are used with nominal-level variables. Line graphs are usually used for continuous variables, partly to avoid displaying a large number of bars.

Why are both quantitative and qualitative methods indispensable in addressing questions of basic and applied research? (SI)

Both are indispensable because neither address the full basic question of applied research. Since quantitative and qualitative are used to gather different information, it is important to include both research methods, because both have their place and both are needed.

Explain the following statement: "The median should always be used when a few very large or very small values affect estimates of the mean." Give some examples of variables for which the median is typically used.

Calculating the median is often the best way to calculate the central location of the data. Whenever there is a large distribution between variables it is important to place emphasis on median instead of mean. For example, median should be utilized for things like tax returns, class registration, and prison statistics.

Discuss why customer comment cards do not constitute a generalizable customer satisfaction survey.

Customer cards are more often than not used to bring attention to an error that needs addressing. Most people do not leave comments over a positive experience, and instead only tend to express issues they faced. This is still an important tool for managers to identify issues they may not know are occurring.

Explain the limitations and uses of customer comment cards.

Customer comment cards generate samples that are typically not representative of all customers, and therefore they are not generalizable. They are useful, however, for obtaining feedback about problems that might need attention.

Explain data coding, data input, and data cleaning.

Data coding is the process of preparing data (from pencil-and-paper surveys or electronic or other sources) for input into statistical software programs. Data input (also, data entry) is the activity of recording these data in statistical software programs. Data cleaning is the process of identifying and removing reporting and recording errors. Errors include mistyped values, errors that arise in the process of uploading, and other implausible values that have been recorded. It is common practice to assume that unexamined data usually contain various errors that must be identified and removed.

What are descriptive statistics? Give some examples. (SI)

Descriptive statistics provide summary information about variables, such as their average and frequency distribution.

What are frequency distributions? What are they used for?

Frequency distributions describe the range and frequency of values of a variable. They are used for nominal- and ordinal-level data. Frequency distributions often are a prelude for generating data tables and attractive graphics and are also used for data cleaning.

It is said that in Sweden an empirical association exists between the presence of storks and the incidence of new babies. Explain what is necessary to establish a claim of causation. Do storks really bring babies?

In order to prove a causal relationship between storks and new babies you would need to prove that one variable affects another (a stork being nearby results in a baby being born). Since there is no cause-and-effect relationship between storks and babies, there is no causal relationship between the two.

Discuss the pros and cons of different types of surveys. Why are phone surveys used increasingly?

Mail surveys allow for the most survey items, but the need for follow-up mailings increases the duration of data collection. Internet surveys have few survey items, and the lack of a sampling frame can pose problems. Phone surveys have important speed advantages but may have low response rates. In-person surveys offer the highest response rate but also carry the highest cost. Phone surveys are used increasingly because they can be completed in a short time and can be used to ask many questions. Cell phones are increasingly called in surveys, and U.S. regulations require that phone numbers be manually dialed by interviewers.

Why is the mean frequently used?

Mean is frequently used because modest people consider it to be the "average," (even though the median and mode are also measures of central tendency. It is defined as "the sum of a series of observations, divided by the number of observations in the series." (pg. 106) The mean is most commonly used to describe the central tendency of variables, and speaks of the mean "number of crimes, public safety inspections, wars, welfare recipients, firewall breaches, roads under repair, voter turnout, and so on." (pg. 107) It is appropriate to use the mean for almost any and every report. MeAnS aRe EvErYwHeRe!!!

Find the mean, median, and mode of the following set of numbers: 23, 29, 20, 32, 23, 21, 33, 25

Mean: 25.75 Median: 24 Mode: 23

What are measures of dispersion?

Measures of dispersion provide information about how the values of a variable are distributed.

What is nonresponse bias?

Nonresponse bias occurs when the views of nonrespondents are different from those of respondents, thus affecting the generalizability of the sample.

Why are observations with missing values typically removed before calculating specific statistics?

Observations with missing values are often removed before calculation in order to prevent unrepresentative or biased results from the incomplete data.

Why is obtaining a representative sample important? How is it different from a purposive sample?

Only representative samples allow for generalization to the population. Representative samples have a mix of characteristics similar to that of the population from which they are drawn, whereas purposive samples have an unrepresentative mix of characteristics (for example, "exemplary practices" surveys are often purposive samples). Some threats to validity for surveys are inadequate sampling frames and unrepresentative samples.

What are outliers? How are they dealt with?

Outliers are analyst-defined observations with unusual values relative to other values in the data. Outliers are defined as observations whose values are either less than the inner fence or greater than the outer fence. Outliers may be the result of data-coding errors or reflect actual but unusual values in the sample. The textbook suggests that observations that are flagged as outliers generally should be retained when they are not coding errors, when they are plausible values of the variable in question, and when they do not greatly affect the value of the mean (of continuous variables).

The developers of the adult literacy program mentioned in question 4 claim that the program is effective. By what measures might this effectiveness be demonstrated?

Program evaluations are used in social science research to determine whether and in what ways the literacy program may work. Qualitative and Quantitative observations are used to determine the effectiveness.

What is random sampling, and why is it important?

Random sampling is a sampling method whereby each member of the population has an equal chance of being selected for the sample. Random sampling is the most valid way of making representative samples.

What are secondary data, and for what purposes are they used?

Secondary data are data that have been collected by other agencies for their own purposes but that are available to managers and may be relevant for their purposes. Secondary data can provide important information about communities and can be useful for needs assessment, benchmarking, and outcome measurement. Managers and analysts are expected to be familiar with the secondary data in their fields.

How do analysts determine whether a variable is normally distributed?

Some analysts rely on a visual inspection, aided by a computer-generated curve that is superimposed over the histogram. Analysts also use measures of skewness and kurtosis to determine whether the shape of the observed curve is consistent with a normal distribution. Sample data are not expected to match a theoretical bell-shaped curve perfectly because of deviations due to chance selection.

What are some important tasks of analysts engaged in statistics?

Some important tasks of analysts engaged in statistics are as follows: (1) understanding the definition and purpose of a statistic, (2) ensuring that a statistic is appropriate to the data and problem at hand, (3) understanding the test assumptions of a statistic, (4) applying a statistic to the problem at hand in ways that are mindful of the preceding points, (5) drawing correct conclusions, and (6) communicating results in ways that are appropriate for both professional and general audiences.

What are standardized variables?

Standardized variables (also called z-scores) are variables that have been transformed such that their means are exactly 0 and their standard deviations are exactly 1 (or unity).

List some standards for writing survey questions.

Survey questions should be clear (unambiguous and specific) and easy to answer. They should avoid double-barreled and leading questions. And they should avoid negative statements

What types of surveys are mentioned in the text, and for what purposes are surveys used?

The four types of surveys are mail, Internet, phone, and in-person surveys. Surveys are commonly used in program evaluation research and, increasingly, performance measurement. Surveys are increasingly used when such knowledge needs to be quantitative, comprehensive, and systematic.

Reading a histogram

The heights of the bars tell us how many data points are in each bin. For example, this histogram says that Leonard's patch has 888 pumpkins whose mass is between 666 and 999 kilograms.

What is the formula for determining the location of the median?

The location of the median is determined by the formula (n + 1)/2. For example, if there are 97 observations, the median is the value of the 49th observation, when observations have been ordered. When there are 98 observations, the median is the mean of the 49th and 50th observations.

When should both the mean and median be used? When should the mode be used?

The median should be reported along with the mean when a few very large or very small observations affect the value of the mean. The mode is used infrequently, but an advantage of the mode is that it can be used with nominal-level data, which is not possible for calculating the mean or median.

How should analysts deal with the problem of missing data in calculating statistics?

The most common approach is to exclude such observations from calculations.

Consider the following statement: "Calculating the mean is straightforward, but managers and analysts may encounter some practical issues that, for the most part, concern the data rather than the formula itself." Give examples of these practical issues.

The most common issue analysts and managers face when calculating the mean is missing information and variables. While one or two variables missing from a large number of data may not affect much, data missing from a small set of variables can cause dramatic changes to the results. This often results in unrepresentative or biased information. It is best to avoid using incomplete data.

Define sampling error. Do small or large samples have small sampling errors? Why?

The sampling error is the percentage by which sample findings vary in 95 of 100 repeated samples. Large samples better reflect population characteristics and thus have smaller sampling errors.

What is the standard deviation?

The standard deviation is a measure of dispersion that is calculated based on the values of the data.

Name the three measures of central tendency. How is each defined?

The three measures of central tendency are mean, median, and mode. The mean is the sum of a series of observations, divided by the number of observations in the series. The median is the middle value in a series (or array) of values that have been ordered from low to high. The mode is the most frequent (typical) value(s) of a variable.

What role does the measurement level play in univariate analysis?

The type of univariate statistics that should be used depends on the level of measurement.

What is a weighted mean? For what purposes is it sometimes used?

The weighted mean is defined as a mean for which the observations have been given variable weights. Weighted means are commonly used to adjust for over- and undersampling in surveys, for example.

Identify and discuss problems in the quality of administrative data.

There are five main issues managers struggle with when it comes to administrative data: (1) missing and incomplete data (2) inaccurately reported data (3) definitions have changed over time making it difficult or impossible to compare data (4) not being able to disaggregate data in necessary ways (5) confidential data is unavailable (6) technological deficiencies.

Identify threats to validity arising from biased questions and sampling in surveys.

There are two main threats to validity when it comes to surveys: "(1) inadequate sampling frames and (2) unrepresentative samples." (pg. 90) Inadequate samples and unrepresentative samples can both result in choosing too small of a sample size. The smaller the sample size, the less likely the sample represents the general publics' opinions.

What is the distinction between univariate and bivariate analysis? (SI)

Univariate analysis describes single variables, whereas bivariate analysis examines the relationship between two variables.

Give some examples of univariate and bivariate analyses. (SI)

Univariate data describes singular variables. For example, the average score on a final, compared to the grade you received, would be an example of univariate analyses. Bivariate data is the comparison of two variables. For example, if you are comparing your grade on the final with your friend, and the different ages you both are, that is a bivariate analysis since you're comparing grades and ages.

What are variables? What are scales?

Variables are succinctly defined as empirically observable phenomena that vary (see Chapter 2 in the text). Scales are the collection of specific attributes (or values) used to measure a specific variable (see Chapter 3 in the text). There are four levels of measurement scales: nominal, ordinal, interval, and ratio. You need to be familiar with these concepts, as they are key to choosing the correct statistic.

What statistical property makes the standard deviation a desirable statistic?

When data are normally distributed, 68.3 percent of the observations lie within ±1 standard deviation from the mean, 95.4 percent lie ±2 standard deviations from the mean, and 99.7 percent lie ±3 standard deviations from the mean.

You need to prioritize welfare for human subjects when conducting research.

You should minimize any harm they may experience. Most research on people is now subject to oversight by the institutional review board.

Hypothesis

a prediction about the state of the world (see experimental hypothesis and null hypothesis).

Independent variable

a variable (often denoted by x ) whose variation does not depend on that of another.

Dependent variable

a variable (often denoted by y ) whose value depends on that of another.

Benchmarks

are a standard measurement of performance. Performance standards equates to "'managing by the numbers.'" (pg. 69)

Variables

are empirically observable phenomenons that can vary. For example gender and income.

Quasi experiments

are for imperfect research that lack baseline, comparisons, or randomization. Classic experiments are randomly assigned to different control and experiment groups.

The six competencies

are similar to forming a hypothesis for science class. The six steps include: familiarizing yourself with the information you are analyzing, collecting the data needed, analyzing this data, communicating the findings you have found with others, using the data you've analyzed to create a theory, and baring in mind the ethics behind the analysis you have made.

Outcomes

are specific changes in behaviors/conditions that are measured from various aspects of program goals. Outputs are the immediate, direct results of program activities. The logic model acknowledges that there are many public and nonprofit programs with long-term goals. These long term goals are supported by the immediate program information. Outputs are key indicators and measure what is considered to be successful/desired results from activities. Outputs are sometimes also used in order to highlight and track problem areas. "Outputs are the direct result of program activities, and outcomes measure goal attainment. Both are needed." (pg. 63)

Attributes

are the characteristics of a variable and the specific ways that specific variable can vary, for example the ________ of gender are male and female.

Quantitative research methods and

are the data collected in order to use statistical methods. The reason behind this is to provide statistical evidence about various impacts and programs.

Quantitative Research Method

are used to collect the data that will be used to analyze statistical methods. This data is collected from surveys and administrative records, in order to produce the numbers used to express societal issues, monitor problems, and determine efficiency and effectiveness of programs. This data is used to hypothesis ways to improve these programs. ___________ are known to suffer from lack of details.

Statistical methods

are used to forecast by using past and present data. Techniques used range from simple to extremely complex.

Control variables

are used when trying to find the rival hypothesis in empirical research. Rival hypothesis threats the credibility of a conclusion.

Causal relationships

are when one variable causes another. It is different from an association because there is no effort to identify patterns.

There are specific characteristics of a variable called

attributes, and every variable has one. These attributes describe the variable.

Independent variables

cause effects on other variables, but they themselves are independent and not affected by other variables. Dependent variables are the variables that are affected by independent variables.

Qualitative Research Methods

collect data from words, symbols, and artifacts and are often non statistical in nature. This data is often is collected through interviews, focal groups, and observations. This method is traditionally used to describe new phenomena.This data is known to be detailed, but can lack generalizability and quantification.

There are six steps for program evaluation:

defining goals and activities, identifying key relationships, determining research designs, defining study concepts, collecting data, and presenting findings.

Study of Relationships

describes the relationships between variables.

Descriptive analysis is different from the study of relationships because

descriptive analysis is information about individual variables and relationships study different relationships between variables.

A study examines the impact of gender and drug use on school performance and political orientations. Identify the dependent and independent variables.

he dependent variable is the variable that is manipulated by other variables, and in this case the dependent variable is the school performance and political orientation. The Independent variables are the variables that manipulate others, but they themselves are not shaped by other individuals. In this case, the independent variable would be drug use and gender.

There are two problems with dual purposes:

improving programs/policies and revealing truth about the success of a program.

Forecasting

is a prediction about the future. Forecasting specifically discusses what the future will look like, and plans and provides a normative model of what the future should look like. Planning often begins with forecasting in order to establish a common goal of what is to be accomplished. Alternate and future developments can be tweaked along the way.

A hypothesis

is a theory that hasn't been proven yet.

Performance measurement

is an analytical process that is used for assessing progress of achievements of programs goals. It is designed to provide information on an ongoing basis in order to determine whether a policy or program is improving and provides a way for managers to improve, monitor, and measure results.

Descriptive Analysis

is information about the nature of common features of different variables, for things like the mean and frequency of distribution.

Performance management

is the assessment and management of work processes/employees towards a decided goal. There is often ongoing feedback and reviews of effectiveness and efficiency. Performance management measures accountability, service delivery, and mainigal decision making.

Qualitative research methods

is the data collected for non statistical methods. These things include: words, symbols, and artifacts. Used to describe phenomena and relationships.

Statistics

is the knowledge and standard for drawing conclusions from data. There are specific tools utilized in order to reach these conclusions.

Effectiveness

is the level or results of a program or policy and points to different key results. Effectiveness forces managers to analyze which key points are and aren't working. Effectiveness is usually measured using different output and outcome measurements.

Research Methodology

is the methodology used to investigate phenomena. These can be utilized for many different problems including public administration.

Efficiency

is the unit cost to produce a good or service. It is calculated using the output of our outcomes over inputs (O/I). Efficiency can be calculated in various ways and should be calculated based based on program management concerns.

The logic model

is used by public and nonprofit organizations in order to conceptualize programs performances. The model defines a way to describe relationships in resources and results: Inputs > Activities > Outputs > Outcomes > Goals

Control Variable

is used in empirical research to evaluate the rival hypotheses.

Applied Research

is used to solve practical problems. Basic research is used to develop new knowledge, such as solving new issues in public management.

Threats to validity

jeopardize the conclusions about the study. Questions the logic behind the conclusion.

Equity

measures compare performances across different groups. Equity measures can be analyzed for different population groups, organizations, programs, etc. Since equity measures are so broad, it compares service performance across districts and populations that can be extra salient.

Performance measurement is used for

program evaluation because performance measurement is able to produce ongoing information which provides thorough information about a program or policy. This gives managers an easy way to evaluate this information. These measurements use key indicators and are measured based on systematic and quantitative information. Performance measurement should not be the only tool used to evaluate and understand programs/policies, but gives a great summary of important and frequent information measured.

Workload ratios

refer to the ratios of activities over inputs (A/I). Many managers equate workload ratios for efficiency measures, but these two do not directly affect one another. Workload ratios do not include accomplishments.

Causal Relationships

relationships in which a condition or variable leads to a certain consequence.

Performance measurements

should always be valid. This means they should avoid inaccurate and incomplete data. Clearly conveying all information and data is the only way to accurately and validly report data and measurements.

Scientific misconduct is

the violation of these rules set in place from scientific research and statistics. It may be a breach in ethical standards and or not following the reporting and documentation process. Scientific Misconduct can result in negative effects like ruining someone's career which could result in you being sued.

Establish a claim of Causality

there must be empirical and plausible theory to explain the relationship.

To establish a claim of causality

there must be empirical correlation and a theory that explains this relationship.

Rival Hypothesis

threatens the credibility of a concluded study.

External validity

threats can jeopardize the conclusion of the study.

Association

two situations that are connected, but neither causes the other.

Judgement-based methods

use experts to evaluate the likelihood of future events occurring.

Forecasts

use past, present, and future data to predict what will happen 'tomorrow' with no guarantee.

Scientific research

utilizes different steps, such as the six competencies, in order to discover new species, theories, and gain other scientific knowledge. Scientific research uses different tools and sources of knowledge in order to reach a conclusion.

There is a need to be ethically conscious

when analyzing data. The three main areas include: making the motivation of your analysis known, uphold communication, and to consider the impact your analysis could have.


Related study sets

Chapter 2 Qualities of Accounting

View Set

Practice Exam 2 wrong and flagged

View Set

EMT-B, Ch 5: Medical Terminology

View Set

PT GR 11 ELA The Kite Runner Ch 3-5 Study Guide

View Set

Domain 1 (Exam 1, pg. 2) ANSWERS ONLY

View Set

Med Surg Ch. 47 Caring for Clients with Disorders of the Liver, Gallbladder, or Pancreas

View Set

Bio 61 Exam 3 (ch 18, 19, 20, 40, 41, 47)

View Set

WC EMT CH 34 Obstetrics and Neonatal Care

View Set

Input devices, processing and output devices

View Set

Intro to Business Chapter 10 part 2

View Set