POLS 2072 Q
A survey of 50 students asks the number of drinks they had in the last 48 hours. There are 20 responses of 0, 13 of 1, 10 responses of 2, 6 responses of 3 and 1 response of 6. What is the mean?
1.14
A survey of 50 students asks the number of drinks they had in the last 48 hours. There are 20 responses of 0, 13 of 1, 10 responses of 2, 6 responses of 3 and 1 response of 6.
3.86
A researcher studies presidential approval polls and finds the following values: 46, 46, 48, 49, 51, and 52. What is the median approval rating?
48.5
A researcher studies presidential approval polls and finds the following values: 46, 46, 48, 49, 51, and 52. What is the mean approval rating?
48.67
We have a measure that uses a 0-10 point scale. WE asked 100 people to rate their support for a policy on the scale. The variance would maximized under which condition
50 people answer 0 and 50 people answer 10.
All of the following are interval-level variables except ... a. types of identification accepted to vote b. number of days a state allows for early voting c. percentage of whites, Hispanics, African-Americans, and Asian-Americans in a state d. age of individual voters in the last election
A
box plot
A boxplot is a horizontal or vertical graph of the five number summary. A central box spans from the 1st to the 3rd quartile, showing the interquartile range. A line in the box marks the median. Lines extend from the box out to the smallest and largest observations, showing the range. For interval/ratio variables, markings are sometimes added for the mean and outliers.
z score
A z score is a statistical measurement of a value's relationship to the mean of a group of values. A z score of zero indicates the score is the same as the mean. A positive or negative z score indicates how many standard deviations a particular score is above or below the mean. Z scores allow scores from different data sets to be accurately compared to one another
A conceptual definition meets two criteria
Criteria 1: It identifies the subjects or groups to which the concept applies. Criteria 2: There is variation within a measurable characteristic for the subjects being examined. The first criteria requires that we choose the unit of analysis for which the characteristic will be considered. The second criteria will be met if the unit of analysis can take on different values of the measurable characteristic.
advantages and disadvantages of the mean
Advantages: The mean fully considers the metric information embedded in the variable. It is more stable than other possible measures of central tendency (i.e., it has less variation over repeated samples). Disadvantages: It can have fractional values, even when the variable only can take on whole values. It cannot be computed when extreme categories of a variable are open-ended. It is strongly affected by extreme cases or outliers.
Test-retest method
An assessment of the external reliability of a measure, performed by administering a measure to the same group of subjects at two points in time and comparing the results
Inter-rater method
An assessment of the external reliability of a measure, performed by administering a measure with different observers and comparing the results.
split half method
An assessment of the internal reliability of a measure using multiple items, performed by dividing test items among subjects and comparing them.
example of ecologicla fallacy
EXAMPLE: In recent presidential elections, wealthier states (e.g. HI, NJ, CT, MA, CA) tended to vote Democratic; whereas poorer states (MS, AR, WV, AL, KY) tended to vote Republican. However, wealthier voters tended to vote Republican, while poorer voters tended to vote Democratic. We would have fallen prey to the ecological fallacy if we had incorrectly assumed that because wealthier states tended to vote more Democratic that wealthier voters would have tended to vote Democratic.
True or False: A measure is said to be reliable when it accurately measures the concept being studied.
FALSE
two common approaches to evaluating the validity of a measure
Face validity and construct validity
Face validity
Face validity is simply whether the test appears (at face value) to measure what it claims to measure It is obtained by relying on informed judgment or asking others to rate whether the measure is suitable for its purpose
To measure the concept of religiosity, begin by identifying characteristics of a religious person:
Attends religious services Prays Reads sacred writings Opposes abortion rights Demonstrates closeness to God Could think about others more than themselves
Several spread measures have been devised to summarize the size of these deviation
Average Deviation Variance Standard Deviation
Calculating the Mean for Grouped Interval/Ratio Data
To compute the mean from grouped data: (1) Multiply the value of the category by the frequency of cases in it; (2) Sum all the products; and (3) Divide by the number of cases overall.
mutually exclusive
Cases should fit into only one category
Parsimonious
Categories should be constrained to fewest needed to meet goal
A _____________________________ describes clearly the concept's measurable properties and specifies the units of analysis.
Conceptual definition
A question that may be answered empirically using tangible properties is called a ________________
Concrete question
A researcher develops a survey to measure political ideology on a 7-point scale from strong liberal to strong conservative. The result provides an accurate measure of the ideology of the individuals in the study. This is an example of ________________.
Construct validity
Construct validity
Construct validity assesses the association between the measure of a concept and another concept to which it is related. It is obtained by examining the relationship between our measure and the related construct.
Interval Measures
Interval measures classify and rank cases by using known and equal distances between values. The interval level of measurement allows researcher to assess the degree of difference between values, but not the ratio between them. Examples of interval measures: Temperature on Celsius or Fahrenheit scale Date when measured from an arbitrary epoch
Before we can be confident that a characteristic is useful to understanding a concept, we must ask three questions:
Question 1: Can we conceive of the concept without the characteristic? Question 2: Can the characteristic be measured? Question 3: Can the characteristic vary between subjects?
A measure that provides inconsistent readings of a concept probably suffers from _________________.
Random error
Average Deviation
The average deviation equals zero because the mean is precisely the point about which all distances are balanced.
Variables are typically organized in spreadsheets, whereby
The column heading is the variable name (e.g., the concept or trait). The row heading is each individual subject or unit of analysis. Cell entries indicate the value of the variable for a subject.
A template for writing a conceptual definition takes the following form:
The concept of ______ is defined as the extent to which ______ exhibit the characteristic of _________.
5 Number Summary
The five number summary of a variable consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation written in order from the smallest to largest.
Frequency Distributions
The frequency of a particular category or interval is the number of times the category or interval is observed in the data. Frequency distributions can show either the actual number of observations falling in each range or the percentage of observations. Frequency distributions are portrayed in tables or graphs
Measuring Dispersion Within Interval Variables
The major measures of spread for interval variables are based on deviations from the mean value.
operational defintion should explain
Who will provide the measure? (e.g., respondent, employer, researcher) What will be the measure? (e.g., poll question, behavioral record) When will the measure be collected? (i.e., time frame of data collection) Where will the measure be collected? (e.g., in-person, by phone, online) How will the measure be acquired? (e.g., opinion poll, record inspection)
A normal curve resembles:
a bell
A level of measurement is
a classification that describes the nature of information within the values assigned to a variable
bimodal distribution
a frequency distribution having two different values that are heavily populated with cases
A key feature of an ordinal level variable is that the values are _______________.
able to b ranked
A researcher wants to measure individual support of internationalism so
additive index or index
exhaustive
all cases fall into a category
Internal reliability
assesses the consistency of results across items within a test.
The cumulative percentage records the percentage of cases ...
at or below a given level
why is validity harder to establish than reliability
because there are no set procedures for establishing the validity of a measurement.
A percentile reports the percentage of cases in a distribution ...
below a given value
Suppose a researcher studying attitudes on gun control finds 40% of respondents in favor and 40% of respondents opposed and the remaining 20% uncertain. The distribution of responses would be referred to as ...
bimodal
Standard Deviation
bypasses this problem by taking the square root of the variance
A researcher creates a list of attributes she expects will be present in individuals who describe themselves as conservative. She then narrows the list to include only the essential attributes in order to _______________ her research
clarify
Liberalism is an example of a(n) _________________ in political research.
concept
The primary goals of political research are to describe ___________ and ____________ the relationship between them.
concepts/liberalize
operational definition
describes explicitly how the concept is to be measured
conceptual defintion
describes the meaning of a concept using an empirical characteristic
variation ratio
describes the percentage of all cases that do not fall into the modal category The measure varies between 0 and 1. A value of 0 indicates no dispersion in the data. The maximal value depends on the number of categories of the variable, but converges on 1 as the variable achieves uniformity.
A description of the amount of variation in a variable is called ...
dispersion
characteristics to describe a single variable
distributions, central tendancy, dispersion
J-shaped distribution
down to up
A researcher is studying the attitudes of liberals and determines that one characteristic of liberals is support for abortion rights. Joe describes himself as a liberal so the researcher concludes that Joe supports abortion rights. This is an example of ______________________.
ecological fallacy
Variance
eliminates the tendency of opposing distances to cancel one another out by averaging the squares of the distances around the mean The variance is always a non-negative number. The variance is sensitive to outliers The squaring operation means the variance is not in the original units of measurement but in squared units.
The categories for a variable should be
exhaustive, mutually exclusive, persimonious
external validity
extent that the results of the measure can be generalized beyond the immediate study to other settings, other subjects, or times.
True or False: A method of describing the dispersion of a variable that includes the minimum value, lower quartile, median, upper quartile, and maximum value is called the interquartile range.
false
True or False: Dispersion describes the number of categories in a variable.
false
A table listing how many respondents to a survey reside in each state is called a ...
frequency distribution
types of distributions
frequency distributions, cumulative frequency distributions, percentage distribution, cumulative percentage distribution
coefficient of variation
gives a better sense of how large a standard deviation is. The coefficient of variation measures the spread of a set of data as a proportion of its mean. It should only be used when the variable just contains positive values.
A positively skewed distribution
has cases with extremely high values on the right hand side of the distribution, thereby pulling a "tail" of the distribution in a positive direction
negatively skewed distribution
has cases with extremely low values on the left side of the distribution pulling a "tail" in a negative direction (i.e. skewness < 0).
A normal distribution ...
has no skewness
Distributions with negative kurtosis or Platykurtic distributions
have less extreme data values than a normal distribution or "lighter" tails
Distributions with positive kurtosis or Leptokurtic distributions
have more extreme data values (outliers) than a normal distribution or "heavier" tails.
Distributions with zero kurtosis or Mesokurtic distributions
have roughly the same outlier character as a normal distribution.
A z-score "standardizes" values of a variable based on:
he value's standard deviations from the mean
central tendancy
identify the most representative value of the distribution.
For a nominal variable, we should use the ______ to summarize the extent of variation.
index of qualitative variation
dispersion
indicate the variation around the measure of central tendency
There are 2 types of validity
internal and external
There are 2 types of reliability
internal and external reliability
A researcher who wants to learn the precise differences between her units of analysis will likely use a(n) ______________ level variable to code the characteristics.
interval
Year of birth is an example of a(n) ______________ level variable.
interval
percentile
is a category or value below which a given percentage of the observations fall The percentile is the category of the (N +1)*Pi case, where Pi is the percentile in the form of a decimal terms AND the values are organized in numerical order from smallest to largest. It gives context to a value within a distribution.
Likert scale
is a special type of the more general class of summated rating scales constructed from multiple ordered-category rating items. Each item uses a set of symmetrically balanced bipolar response categories indicating varying levels of agreement or disagreement with a specific stimulus statement expressing an attitude or opinion. Responses are summed to create an index indicating the degree support or opposition with the index.
The reductionist fallacy
is an error in the interpretation of data, whereby inferences about a group are based solely upon statistics collected about individual members of the group.
ecological fallacy
is an error in the interpretation of data, whereby inferences about the nature of specific individuals are based solely upon aggregate statistics collected for the group to which those individuals belong.
Index of Qualitative Variation
is based on the proportion of cases in each category. IQV is normed so that the value is 0 when all the cases fall into a single category and 1 when the cases are evenly spread across k categories. The formula only requires two types of information: the number of overall categories and the percentage of cases in each category
Interquartile Range
is less sensitive to extreme cases. describes how tightly the cases cluster around the median.
variable
is simply an empirical measurement of a concept is a numerical representation of a concept whose categories and corresponding values vary within the unit of analysis
dispersion
is the amount of spread in the frequency of values across categories of a variable Maximum dispersion in nominal variables occurs when there is an even distribution of cases across categories (i.e., uniformity) or when each category occurs just once (individuality). Maximum dispersion in ordinal, interval, or ratio variables occurs when the cases are evenly split between two extreme categories (i.e., polarization).
Skewness
is the degree of departure from symmetry of a distribution
unit of analysis
is the entity (be it a person, organization, jurisdiction, etc), we want to describe and/or assess.
A variable that codes each state by the region of the country it is located in is called a _________________ level variable.
nominal
Gender is an example of a(n) _____________ level variable.
nominal
We typically use a pie chart to graph a _______ variable.
normal
Range
of a variable is the difference between the lowest and highest values among the observations in the data set.
quantitative analysis is...
one approach to describing and analyzing concepts transforms concepts into numerical variables that can be examined using statistical tests To transform concepts into numerical variables requires researchers to undertake a specific measurement process. Following the prescribed steps of the measurement process reduces the likelihood that the statistical inferences made about the concepts examined are biased or misleading.
A variable that communicates relative differences between units of analysis is called a(n) _________________ level variable.
ordinal
A variable that measures per-capita income by state as 'low', 'medium', or 'high' is a(n) ________________ level variable.
ordinal
Education coded as 'high school diploma', 'some college', 'college graduate', and 'advanced degree' is an example of a(n) _______________ level variable.
ordinal
Which of the following is a nominal level variable?
party affiliation
Two prominent measures of dispersion for ordinal variables
range and inteerquartile range
A variable measuring the number of times a person voted in the past ten years is a(n) _______________ level variable.
ratio
A variable measuring the number of times an individual voted in the last decade is most likely categorized at the _______________ level.
ratio
Bar charts are typically used to visualize all types of variable, except ___
ratio
Standard deviation is a typically measure of the variation in a ________ variable.
ratio
Reliability
refers to the consistent measure of a concept, meaning it gives the same readings every time it is taken. Reliability assesses the extent of random error in the measure. A measure need not be free of systematic error to be reliable.
external reliability
refers to the extent to which a measure varies from one use to another.
Internal validity
refers to whether the effects observed in a study are due to the concept of interest and not some other factor
Two criteria are used to gauge the extent of measurement error in operationalizations
reliability and validity
A recent public opinion poll found that 47% of respondents approve of the job the president is doing. Recent and Subsequent polls found approval ratings of 46%, 48%, and 49%. Based upon this we can say the questions in the poll are what kind of measurement of presidential job approval?
reliable
Mean Absolute Deviation
s another way to circumvent the problem of opposing distances to cancel each other out. It uses absolute values instead of squares. The mean absolute value is used less often than the variance because the use of absolute values makes further calculations more complicated and does not possess some of the attractive mathematical properties of the variance.
frequency distribution
shows the number of times a particular category or value appears
Percentage distribution
shows the proportion of cases reporting each value
Cumulative frequency distribution
shows the total number of accumulated cases up to a certain value
Cumulative percentage distribution
shows the total proportion of accumulated cases up to a certain value
One problem with using the mean value of an interval level variable is that it is sensitive to ...
skewness
There are 3 common methods used to evaluate the reliability.
split half method test re test method Inter rater method
distributions
summarize the frequency of individual values or ranges of values for a variable
validity assesses the extent of...
systematic error
validity
the extent to which it records the true value of the conceptual trait the researcher intends to measure.
The median value of a variable is ...
the middle value
If we can answer yes to all of them...
then the characteristic is suitable to be considered for a conceptual definition.
what is the primary goal of political research
to describe concepts and analyze the relationships between them
True or False: A Likert scale is an additive index of five or seven point value ordinal variables.
true
True or False: A distribution with a skinnier left-hand tail is said to have a negative skew.
true
True or False: A frequency distribution table is the best way to describe the dispersion of a nominal level variable.
true
True or False: A survey instrument intended to measure presidential approval ratings that produce drastically different numbers each time it is administered lacks reliability.
true
True or False: An interviewer who records a respondent's answer incorrectly has introduced random measurement error into the research
true
True or False: Every concept must have two essential properties, concreteness and variability.
true
True or False: Frequency of church attendance is one popular way to conceptualize how religious Americans are in political surveys
true
True or False: Mean, median, and mode are all measures of central tendency.
true
True or False: We have information on the number of times people voted in the last ten years. The percentage of people who report voting two times or less is known as the cumulative percentage.
true
Inverted U-shaped curve
upside down half circle thing
histograms or polygons
used to graph interval/ratio variables
Pie Charts
used to graph nominal variables
bar graphs
used to graph ordinal variables
A measure that records the true value of an intended characteristic is said to be?
valid
Levels of measurement help to identify
what statistical techniques can be performed with our data
Which of the following measures best allows us to compare values from different datasets more directly
z score
When studying the passage of Civil Rights legislation by Congress each bill is an example of a(n) ___________________________.
Individual unit of analysis
Units of analysis can either be at the individual-level or the aggregate level.
Individual-level describes units at its lowest possible rank (e.g., voter). Aggregate-level is a collection of individual entities (e.g., state electorate).
Potential Problems with Using the Mode
It may not be unique. Bimodal: If the largest two categories each contain the same number of cases. Multimodal: if three or more categories are the most common. The mode can be overly affected by sampling variation. The mode is very sensitive to how categories are combined
Kurtosis
Kurtosis provides information on the tails (the extremes, or outliers) of a distribution.
Two or more distinct groups of empirical characteristics are known as a ________________________.
Multidimensional concept
Researchers distinguish between 4 levels by which a variable measure can assess an empirical characteristic
Nominal measures Ordinal measures Interval measures Ratio measures
Nominal Measures
Nominal measures simply arrange cases into categories. The categories of a nominal measure cannot be ordered The numbers assigned to the categories have no quantitative meaning and are simply used to classify the cases. Examples of nominal measures: Religious denomination (Christian, Jewish, Muslim, Hindu, etc.) State residence (Alabama, Alaska, Arkansas, Arizona, etc.) Dichotomous variables or variables with two values are typically considered nominal measures. Sex (Male, Female) College Degree (Yes, No)
normal distribution
Normal distributions are symmetric and bell shaped graphs, with most scores in the middle centered around the mean, median, and mode.
Ordinal Measures
Ordinal measures classify cases into categories that can be ranked or ordered, but the relative degree of difference between the rankings cannot be determined. Examples of ordinal measures: Presidential Approval (0 = Strongly Approve; 1 = Somewhat Approve; 2 = Neither Approve nor Disapprove; 3 = Somewhat Disapprove; 4 = Strongly Disapprove) Educational Attainment (0 = Less than high school degree; 1 = High School Degree; 2 = Some College; 3 = College Degree; 4 = Post Graduate Work
applying the three tests to religiosity
Our measure only has one component, so the split-half method is not applicable. Test-Retest Method We conduct an identical poll one week after the first. Respondents should not have changed their churchgoing habits in 7 days. If the operationalization is perfectly reliable than respondents will give identical answers at both points in time. Too many discrepancies signals a problem with our approach that needs to be altered to ensure our results are not skewed by random error. Inter-Rater Method We examine whether different interviewers get the same results for our measure. Too many discrepancies suggests interviewer training needs to be changed.
Ratio Measures
Ratio measures have all the properties of interval variables plus a real absolute zero, meaning you can construct a meaningful fraction (or ratio) with a ratio variable. To differentiate between ratio and interval measures, ask: (1) Does 'zero' mean 'none'? (2) Does 'multiplied by 2' have the exact meaning of 'twice as much as'? If the answers to the both questions are 'yes', the measure is ratio, if either is "no," the measure is interval. Many ratio scales can be described as specifying "how much" of something (i.e. an amount or magnitude) or "how many" (a count): Examples include age, income, or the number of political activities Because the numerical values of both interval and ratio are meaningful, the statistical procedures that can be applied to them are very similar. As a result, some statistical packages differentiate between just three categories: nominal, ordinal, interval/ratio.
The ______________ of a measurement is the extent to which it consistently measures the concept.
Reliability
Additive Indices
Sometimes researchers try to obtain a more precise and reliable measure of a concept by creating an additive index from the summation of responses across multiple indicators.
Measuring Dispersion in Nominal Variables
Spread measures for nominal variables are based on the frequencies of the categories.
Constructing a Simple Frequency Distribution Table
Step 1: Divide the results into categories or intervals, and then count the number of results in each interval. Step 2: Make a table with separate columns for the categories or intervals and the frequency of results in each interval. Step 3: Add up the number of cases for each category or interval and record them in the frequency column.
what are the steps to the measurement process
Step 1: Identify the empirical characteristics of a concept. Step 2: Write a conceptual definition based on the empirical properties. Step 3: Operationalize the conceptual definition. Step 4: Use the date collection from the operationalization of the concept to construct a variable.
example of reductionist fallacy
Studies have shown that individuals who save more of their earnings are less likely to experience financial instability over time. In the economy overall, too much saving and too little spending can cause financial instability for the country, resulting in recessions. We would have fallen prey to the reductionist fallacy if we had incorrectly assumed that just because saving promotes financial stability on the individual level that it also does so on the aggregate level.
Two sorts of error can distort the fit between a conceptual definition and an operational definition.
Systematic measurement error is a consistent, chronic distortion of an empirical measurement. Examples: Mathematical word problems or the Hawthorne effect Random measurement error is a temporary or haphazard distortion of an empirical measurement. Examples: Unavoidable distractions or lying
T/F . The relationship between aggregate-level concepts CANNOT be used to make inferences about the relationship between the concepts at the individual level or vice-versa.
T
T/F A measure need not be free of random error to be valid
T
One way to test the reliability of a measure is the ________________ method.
Test/re-test
Advantageous Properties of the Variance
The variance calculated around the mean is smaller than the average squared deviation around any other value. The variance is proportional to the average squared difference between all the pairs of observations. The variance reaches its maximum when the data are polarized with half the observations at the minimum value and half at the maximum value. If two variables are strictly independent of one another, then the variance of their sum equals the sum of their variances. It permits the decomposition of the variance of a variable (i.e., a dependent variable) into separate parts that are due to other variables (i.e., independent variables).
Two prominent measures of dispersion for nominal variables
Variation Ratio Index of Qualitative Variation
Uses of Variation
Variation is important to understanding the nature of a concept. Variation can be decomposed into different parts to help us understand the cause of the variation. Variation enables us to assess how unusual any particular value on the variable is. The measure employed for this is called a Z score (or a standard score).
The measure of distribution that provides more information about extreme (outlier) values of a variable is:
kurtosis
neg. skewed order of central tendancy
mean median mode
normal distribution order of central tendancy
mean median mode all together
Which of the following is the most resistant measure of central tendency to skew?
median
Suppose you knew the day of the month on which each of your classmates was born. The most frequently occurring day of birth is called the ____________.
mode
The only measure of central tendency that may be used with a two-category variable such as gender is ...
mode
There are 3 major statistics used the summarize the central tendency of a variable.
mode median and mean
positively skewed order of central tendancy
mode median mode