Research Methods

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Skew

a measure of asymmetry in a distribution

Types of Probability Samples

1.Simple random sample 2.Systematic sample 3.Stratified sample 4.Cluster sample

Case

: a single "unit", observed at some point in time Has several dimensions ("variables") Structured, "focused" comparison Each built upon observations What is the larger class of units?

Sample

: a subset of cases/observations drawn from a specified population Why use samples? We can almost never observe the complete population (and, therefore, its parameters) too costly/time-consuming/outright impossible

Standard Error

"Standard error": special name given to the SD of a sampling distribution Estimate of the dispersion of the sampling distribution: how much 𝑥 ̅ departs by chance from 𝜇 Critical for statistical inference a measure of uncertainty

How to sample

1. Define target population -Population to which you would like to generalize your results -If not available in its entirety study population 2.Construct sampling frame -List from which the potential cases/observations are drawn -Operational definition of the population that provides the basis for sampling 3.Devise a sampling design* 4.Determine sample size 5.Draw sample

Case Selection Techniques

1. Typical Cases 2. Deviant Cases 3. Influential Cases 4. Diverse Cases 5.Crucial/Critical Cases 6.Mill's Method: A.Most-Similar Case B.Most-Different Cases

Influential Cases

A subtype of deviant cases But a different purpose: aims to provide a rationale for disregarding certain problematic cases Is there a reason to believe that one or a few cases are driving the results? e.g. countries like Kuwait, Qatar, and Saudi Arabia are outliers in the relationship between GDP/capita and democracy Presence of oil (or Islam, or both) could be a key variable We may need to "control for" such variables in order to get the true relationship between GDP/capita and democracy Exception that proves the rule

Statistical Inference and Uncertainty

sample statistics are point estimates of the parameters ("our best guess") Uncertainty of the point estimates: can be represented by The standard error (𝜎⁄√𝑛) Also by confidence intervals Easier to interpret

Mill's Method Most Similar Cases

A.k.a. "the comparative method" J.S. Mill's A System of Logic (1843) Most-Similar Cases a.k.a. The Method of Difference Compare and contrast cases (minimum 2) with similar attributes (Xs) but different outcomes (Y) Goal: find one attribute (IV) that is present when an outcome occurs, but absent when the outcome does not occur, to determine "causality" Logic of controlled comparison: ceteris paribus (all else equal) "Matching": cases similar in all respect except for the variable of interest

Measures of Dispersion

A measure of central tendency is not enough Can be misleading Example: Average weight on airplane: 155 pounds Average weight in a marathon: 155 pounds Example: The average US income is not equal to the income of the average American Dispersion around the "midpoint" matters a lot Measures of dispersion: summary measures to describe how spread out data are in a distribution A neglected aspect of description In politics, we often use dispersion to describe variables e.g. "polarization," "consensus," "equality"

t-distributions and degrees of freedom

As # of degrees of freedom increases, the t-distribution looks increasingly normal i.e. large sample sizes approximates normal distribution For 𝑛>1000 the t- and normal distributions are virtually identical Most of our statistical procedures from now on will rely on the t-distribution

Semi- structured Interview

Based on a questionnaire/interview guide Questions written (and ordered) beforehand Clear objectives, but flexibility to adapt to direction of conversation Open-ended questions No response alternatives provided Respondents answer in their own words Facilitates comparability Allows for quantitative analysis

Normal Estiamtion

Based on the CLT As long as the sample size (𝒏) is sufficiently large, the sampling distribution of a statistic will follow a normal distribution Inference using the normal distribution When we know the standard deviation of the population (𝝈), we can calculate z-scores (and therefore evaluate probabilities using the normal distribution) And do things like construct a confidence interval But what if we don't know 𝜎 (which is usually the case) or if 𝑛 is small?

95% Confidence Intervals

Based on what we know about normal distributions 68-95-99.7 rule and z-scores 95% of all data within ±2 standard deviations Well, actually, there is 95.45% within 2 SDs There is exactly 95% within 1.96 SDs i.e. z-score (z*) of 1.96 and -1.96 So, the 95% CI for 𝜇: 𝑥 ̅ ±𝑧^∗∙(𝜎⁄√𝑛) 𝑥 ̅ ±1.96(𝜎⁄√𝑛) Upper boundary: 𝑥 ̅+1.96(𝜎⁄√𝑛) Lower boundary: 𝑥 ̅ −1.96(𝜎⁄√𝑛)

Boxplots

Boxplot (or "box-and-whiskers" plot): a graph of the 5-number summary A central box spanning the IQR A line in the box marks the median M Lines (whiskers) extend from the box out to the smallest (minimum) and largest observation (maximum)

Case Selection Techniques for Case Studies

Case studies: purpose and techniques: Purely descriptive (atheoretical) case studies: no inference Plausibility probes: does the phenomenon exist? Hypothesis-testing case studies Typical cases Influential cases Diverse cases Crucial/critical cases Most-similar cases Hypothesis-generating (only) case studies Deviant cases Most-different cases Extreme case (not covered, selection on the DV)

Population and Sample Issues

Can the population be enumerated? e.g. homeless population Is the population literate? Are there language issues? Will the population cooperate? e.g. undocumented immigrants How well do the sample subjects represent the population? Sample-population congruence Target population: undergrads Sample frame: questionnaires left in library lobby (good/bad?) Are response/completion rates likely to be a problem? Completion rates: % of initially contacted people who actually participate Can drastically influence results

Strengths of Case Studies

Case studies are better suited for: Descriptive rather than causal inference In-depth (intensive) analysis rather than breadth (extensive) Case comparability (within-case subunits) rather than representativeness Identifying causal mechanisms rather than causal effects Next week process-tracing Describing invariant causal relationships, rather than probabilistic ones Theory-generating "exploratory " studies rather than hypothesis-testing "confirmatory" studies "Plausibility probes" Rare or unique events (small number of cases)

Central Limit Theorem

Central Limit Theorem Truly remarkable empirical result Draw any random sample of size 𝑛 from any population with mean 𝜇 and finite standard deviation 𝜎. When 𝑛 is large, the sampling distribution of the sample mean 𝑥 ̅ is approximately normal: (𝑥 ) ̅~ 𝑁(𝜇, 𝜎/√𝑛) CLT: Works for any underlying population distribution

Measures of Central Tendency

Central tendency: the "center" or "middle" of a distribution, or a "typical" case e.g. How would you describe the "central position" of a collection of midterm grades? e.g. How about a collection of data about region of residence? Mean vs. median vs. mode

The Data Matrix

Columns: One for each variable (sex, race, height, etc.) Values: different levels of measurement (nominal, ordinal, interval, ratio) Rows: One for each case/observation (e.g. Subject 1, Subject 2... etc.) Reading across rows = a case's specific values of each variable

Correlations

Correlations A measure of association between two continuous (i.e. interval) variables Pearson's 𝒓 (correlation coefficient): 𝑟=(∑▒〖((𝑥_𝑖−𝑥 ̅)/𝑠_𝑥 )((𝑦_𝑖−𝑦 ̅)/𝑠_𝑦 ) 〗)/(𝑛−1) Two variables (𝑥, 𝑦) and their sample standard deviations (𝑠_𝑥, 𝑠_𝑦); 𝑛 observations 𝑥_𝑖−𝑥 ̅: deviation from the mean value 𝒓 measures the direction (+ or -) and strength of the linear association between two variables

DATA = FIT + RESIDUAL

DATA = FIT + RESIDUAL So, we can now think of the data being on average on the line (fit) ± some random error: 𝐷𝐴𝑇𝐴=𝐹𝐼𝑇+𝑅𝐸𝑆𝐼𝐷𝑈𝐴𝐿 So our population regression model becomes: 𝑌=𝛽_0+𝛽_1 (𝑋)+𝜀 (𝑜𝑟:𝑌_𝑖=𝛽_0+𝛽_1 (𝑋_𝑖 )+𝜀_𝑖) DATA = FIT + RESIDUAL 𝑌=𝛽_0+𝛽_1 (𝑋)+𝜀 Residuals: assumed to be independent and normally distributed with mean 0 and std. dev 𝜎 i.e. the sum of all the errors = 0 There shouldn't be any pattern in the residuals (homoscedasticity)

Standard Deviation

Def.: A measure of dispersion about the mean Most common measure of spread To measure the SD, you first need to calculate: Deviations The variance Deviation: distance from the mean (𝑥 ̅) An observation's (𝑥_𝑖) deviation from the mean (𝑥 ̅) 𝑥_𝑖−𝑥 ̅ Example: a student got 85% and the midterm average (mean) is 87%. Deviation = 85 - 87 = -2 Note: the sum of all deviations should equal zero

Case Studies

Def.: An intensive study of a single unit for the purpose of understanding a larger class of (similar) units A case study is (implicitly) comparative ≠ just a "study" a "case" study Always ask yourself: what is my case an instance of? What is the broader "population" or "sample" of which the case is a part? What am I trying to generalize to?

Probability

Def.: formal quantification of uncertainty Uncertainty/randomness important in statistical inference Frequentist approach to probability (used here): The probability of a particular outcome is the proportion of times that outcome would occur in a long run of repeated observations e.g. coin-flipping Law of large numbers: if the probability of "heads" is 𝑝, then as 𝑛→∞ the proportion of "heads" (𝑖.𝑒. (# ℎ𝑒𝑎𝑑𝑠)/(# 𝑡𝑟𝑖𝑎𝑙𝑠 (𝑛)))→𝑝

Interviews

Def: asking individuals a series of questions and recording their responses Can be Face to face or by phone Types: Unstructured, in-depth interviews "Qualitative" research Open-ended questions, "conversation" "Soaking and poking" Good for insight, less for hypothesis testing Structured interviews surveys Semi-structured interviews

Degrees of Freedom

Degrees of freedom (df): the number of pieces of information we have beyond the minimum that we would need to make a particular inference 𝑑𝑓=𝑛−# 𝑜𝑓 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑏𝑒𝑖𝑛𝑔 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 e.g. Sample size of 100 (𝑛=100) used to estimate the mean (1 parameter, 𝜇) Degrees of freedom =𝑛−1= 100−1=99 df

Density Curves

Density curves: describe overall pattern of distribution Area under the curve = relative frequency of all observations that fall in that range i.e. a probability curve Area under the curve = 1 (100% of all probabilities)

Descriptive Statistics

Descriptive Statistics Def.: Method of describing in a meaningful way a large collection of data by summarizing variables Help to better know your data Such numerical summaries = "statistics" e.g. Baseball: "batting average" = 𝐻𝑖𝑡𝑠/(𝐴𝑡 𝐵𝑎𝑡𝑠) Any variable can be described by its: Central tendency Measures: Mean, median, mode Dispersion (or spread) Measures: Variance, standard deviation, range, quartiles, percentiles Don't allow us to make conclusions (i.e. inferences) beyond the analyzed data i.e. ≠ inferential statistics e.g. using a sample statistic to infer a population's parameter But descriptive statistics are the building blocks of statistical inference Next week Single variables not about relationships between variables Cannot test hypotheses

Difference of Means

Difference of Means (t-Test) H0: null hypothesis e.g. Men and women do not differ in their views of gender roles HA: alternative hypothesis e.g. Men and women differ in those views Each group is considered to be a sample from a distinct population Samples must be independent (i.e. scores in one group must not depend on scores in the other) Difference of Means (t-Test) The outcome (or response) variable = dependent variable i.e. the variable that we compare across groups The grouping variable (e.g. gender) = independent variable Group 1 (men) has population mean = 𝜇_1 Group 2 (women) has population mean = 𝜇_2 Does group 1 and group 2 differ? Parameter to be estimated: 𝜇_2−𝜇_1 The difference between group means

Comparing two means

Do Republican and Democrats differ in their attitudes toward national health insurance? Do ethnic groups differ from one another in terms of income? Do atheists and religious people have different political preferences? Do men and women differ in their attitudes toward gun control?

Crucial/Critical Cases

Either a "least likely" and "most likely" case Often selection of cases with extreme values on the IV Least likely case: Used to confirm a theory The "Sinatra" inference: If I can make it there, I can make it anywhere If a "hard" case (unlikely to be predicted by a theory) can nonetheless be predicted by a theory, our confidence in the soundness of that theory is increased e.g. role of bureaucratic politics during Cuban Missile Crisis Most likely case: Used to disconfirm a theory Almost certainly true if the theory is true If a theory cannot even adequately predict an "easy" case, then our confidence in the soundness of a theory drops substantially e.g. theories of war tested on WWI

Power of Sampling

Even samples that seem small can yield accurate information about much larger groups Uncertainty might remain, but we can estimate that uncertainty e.g. "margin of error" in surveys

Questions: wording

Even small wording differences can substantially affect the answers people provide Example: Pew poll (Jan. 2003) "Do you favor or oppose taking military action in Iraq to end Saddam Hussein's rule?" 68% favored military action 25% opposed it "Do you favor or oppose taking military action in Iraq to end Saddam Hussein's rule even if it meant that U.S. forces might suffer thousands of casualties?" 43% favored military action 48% opposed it The introduction of U.S. casualties altered the context of the question and influenced people's answers

Simple Random Sample

Every case (and every possible combination of cases) has an equal chance of being included in the sample Requires a complete list of the population Example: Vietnam draft (random lottery) Every date of the year drawn randomly All birthdays by order picked e.g. Sept. 14 1st date Problem: people noticed that early numbers (more likely to be called) tended to be in latter months of the year (e.g. Oct-Dec) and late ones in early months (Jan-Mar) Capsules had been insufficiently mixed in the drum!

Obtrusive Research

Experiments (quantitative) Ethnography/Field Research (qualitative) Interviews Unstructured (qualitative) Semi-structured interviews (qualitative and quantitative) Surveys (quantitative)

How to conduct a Semi-structured interview

Explain the study Ask all questions as written and in order Follow with probes/prompts Nudging probes, elaborations, clarifications, etc. Do not re-interpret or finish sentences

Survey research

Features of survey research (i.e. polling): Administered via structured interviews (face-to-face or by phone) or questionnaires (by mail, or online) A large number of respondents (i.e. a "sample") chosen to represent a population of interest Usually through probability sampling to ensure accurate estimates of the population characteristics (Week 9) A predetermined set of closed-ended questions i.e. respondents must choose responses from those provided Quantitative analysis of responses Either for descriptive or causal inference

Frequency Distribution v. Sampling Distributions

Frequency distribution: distribution of the actual scores in a sample "Average" of such distribution = sample mean (𝑥 ̅) Sampling distribution: hypothetical distribution of a sample statistic (e.g. 𝑥 ̅) under repeated sampling In repeated sampling, we would obtain different values of a sample statistic (e.g. 𝑥 ̅) Tells us the relative frequency (probability) of each value of a statistic (e.g. 𝑥 ̅

Sample Statistic

From a sample, we calculate a sample statistic Estimate of a population parameter based on a sample drawn from that population We can then make an inference about the population parameters using these sample statistics Not "perfect" uncertain (random sampling error)

Problems with Case Studies

Hard to assess how "representative" a case is Are you looking at an exception or a typical case? Matters a lot for making valid causal inferences Another common problem (also true for large-N analysis): It is not legitimate to derive a theory/hypothesis from a case (or a set of data) and then claim to "test" it on the same case (or data)! Example: Observation: Sanders, an anti-establishment candidate, is doing better than Clinton, an establishment candidate, in the early polls Hypothesis: Anti-establishment candidates tend to do better early in a political campaign than establishment candidates Test: You then "test" your hypothesis using Sanders and Clinton as cases You can't do that, that's "cheating" (i.e. you already know what you are going to find!) Instead: test hypothesis on different data e.g. look at other political campaigns/candidates (where you don't know the answer)

Hypothesis Testing

Hypothesis Testing via Tests of Significance Hypothesis testing: can a hypothesis about a population be supported? e.g. is there an ideological gender gap in US politics? Null hypothesis (H0): Women and men liberalism scores are the same. Hypothesis (HA): Women are more liberal than men. H0 vs. HA assessed via tests of significance Hypothesis Testing Purpose: evaluating empirical evidence in order to determine the level of support for some claim about the population H0: null hypothesis Typically: no effect, no relationship between 2 variables (i.e. parameter = 0) HA: alternative hypothesis Parameter differs from its null value (parameter ≠ 0) Can H0 be rejected in favor of HA?

Tests of Significance

Hypothesis testing Comparing the means of two samples: are they statistically different? i.e. Can we reject H0 that the mean of both samples are the same? t-test Used when the DV is continuous Example: Are democracies significantly wealthier, on average, than non-democracies? Democracy/non-democracy discrete (categorical) GDP/capita continuous Chi-square (𝝌^𝟐) Used when the DV is discrete Example: Are majority-Muslim countries less likely to be democratic than non-Muslim countries? Muslim/non-Muslim discrete (categorical) Democracy/non-democracy discrete (categorical)

Case Selection Techniques

If population of cases is small (e.g. revolutions) case study analysis (not large-N analysis) may be most appropriate Involves purposive (i.e. nonrandom) sampling The following techniques can involve one or several case studies Note: Some techniques require a minimum of 2 All can employ additional cases, if desired

Linear Additive relationships v. Interaction Effects

If there exists an interaction relationship (e.g. if the effect of 𝑥_1on 𝑦 is not the same across values of 𝑥_2), we must add an interaction term (or "interaction variable") to the model: 𝐺𝑢𝑛 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 =𝛼+ 𝛽_1 (𝑃𝑎𝑟𝑡𝑦)+𝛽_2 (𝐺𝑒𝑛𝑑𝑒𝑟) + 𝛽_3 (𝑷𝒂𝒓𝒕𝒚∗𝑮𝒆𝒏𝒅𝒆𝒓) 𝜷_𝟏: effect of party on gun control, controlling for gender 𝜷_𝟐: effect of gender on gun control, controlling for party 𝜷_𝟑: effect of the multiplicative interaction term between party and gender

Randomization

In a large-N sample, we can avoid selection bias if observations are selected randomly Why? Random rule is uncorrelated with all possible explanatory (Xs) or dependent variables (Ys) But random selection is not always feasible or desirable The universe of cases may be unspecified e.g. sampling US foreign policy experts Randomization may leave out "important" cases e.g. studying revolutions without looking at the French revolution

Interpretation of the Confidence Interval

Interpretation: There is a 95% probability that the population parameter lies within the 95% confidence interval around the sample statistic 95% of the confidence intervals constructed in repeated sampling contain the population parameter Statistical significance: values falling outside of this 95% range can be said to be "significantly (statistically)" different from our mean

Non probability

Nonrandom selection Each case has unknown probability of being selected e.g. convenience sampling, snowball sampling Typically the method used (often problematically) in "case studies"

Interpreting regression Coefficients

Interpreting Regression Coefficients Regression coefficients (𝛽,𝛽 ̂, 𝑏 ̂, 𝑏; in SPSS: B) report the amount (and direction)of change in the DV associated with a one-unit change in the IV So 𝛽 is change in 𝑦 given a one unit change in 𝑥 Interpreting Regression Coefficients The sign of 𝛽 determines the type of relationship between 𝑦 and 𝑥 𝛽 >0: Positive relationship 𝛽 <0 : Negative relationship 𝛽=0 : Independence (no relationship) When interpreting regression coefficients: Make sure you are clear about the units in which the IV and DV are measured Remember that 𝛽 is expressed in the units of the DV Example: 𝐼𝑛𝑐𝑜𝑚𝑒=4,000+3,500(𝑌𝑒𝑎𝑟𝑠 𝑜𝑓 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛) 𝛽 = 3,500$ for each additional year of education Next Regression analysis (cont.)

Sample Size

Larger samples will lead to more accurate inferences than smaller samples Note: size of population does not change the required sample size for a given level of accuracy There are diminishing returns to accuracy as sample size increases (+ costs) Example: is my coin a fair one? Parameter: fair coin? (50/50?) Sample statistic: results from coin flips e.g. how many heads? Sample size: number of coin flips n = 8 vs. n= 16 vs. n= 32 vs. n = 64 Increasing sample size reduces the variance of the sampling distribution, making our estimates of the population parameter more accurate

Frequency Distributions

Listing of intervals of possible values for a variable, and the number of observations in each interval Why important? Allow us to visualize important information about our data: central tendency, dispersion, and skew Visual representation: tables and graphs Graphs used: vary by level of measurement Bar Charts Nominal/categorical Ordinal Histograms Interval & ratio continuous data

Questions: what to avoid

Loaded/biased terms and questions e.g. Do you agree with Barack Obama and Nancy Pelosi's efforts to impose massive tax hikes on the American people? e.g. What do you see as the benefits of a tax cut? (without asking about potential disadvantages) "Motherhood and apple pie" questions e.g. Do you feel it is very important to institute common-sense legal reform to stop excessive legal claims, frivolous lawsuits and overzealous lawyers? Social desirability questions Respondent unlikely to tell the truth e.g. Did you drink alcohol on a daily basis during your pregnancy? Subjective self-evaluations e.g. Are you a good student? Instead: What is your GPA? e.g. Are you racist? Instead: Agree or disagree? People of ethnicity X don't get ahead in life because they don't work as hard as people in other ethnic groups. This would be an example of implicit stereotyping, BTW. The type of questions pollsters ask to determine someone's views on race. Long questions (and long surveys!) Compound ("double-barreled") questions Questions that are really 2 questions e.g. Do you believe that Social Security and Medicare should be targets for major budget cuts? Questions that subjects are unlikely to know much about

Diverse Cases

Look for cases representing the full range of variation along relevant dimensions Requires a minimum of 2 cases With different values on either X or Y (or X/Y) (preferably including extreme values ("high" and "low," sometimes with intermediary values added) e.g. 1 democracy and 1 non-democracy e.g. 1 low-income and 1 upper-middle-income country e.g. 1 Muslim-majority and 1 non-Muslim-majority country e.g. Categorical (Jewish/Protestant/Catholic minimum of 3 Not necessarily representative of the population as a whole (but representative of its range)

Typical Cases

Look for examples of a causal relationship with "typical" values i.e. average values of the relationship between X and Y But not necessarily average value of X or Y! Cases with low residuals (i.e. close to the regression line) Residuals = distance between data point (actual value) and regression line (predicted value) No "outliers" Representative, by definition Useful for hypothesis-testing

Median

Median The "middle" value Midpoint in a ordered distribution Point below and above which 50% of the values fall Applicable to interval/ratio and to some extent ordinal measures (nonsensical for nominal/categorical) Arrange values in order from smallest to largest what is the middle value? If an even number average the 2 middle values Main advantage over the mean: much more resistant to outliers Example: When discussing income, why is the median more meaningful than the mean?

Mill's Method Most Different Cases

Most-Different Cases a.k.a. The Method of Agreement Compare and contrast cases with different attributes (Xs) but shared outcomes (Y) Reverse image of most-similar-case method Goal: find the one attribute (IV) these cases share in common to determine a possible cause Key problem: selection on the DV Like Pape's suicide terrorism study So: better for hypothesis-generating than testing e.g. does foreign occupation "cause" suicide terrorism? But can help eliminate "necessary" causes e.g. religious extremism

Multiple regression

Multiple Regression Linear equation, but with more than one IVs (which could also be understood as control variables): 𝑦=𝑏_0+𝑏_1 (𝑥_1 )+𝑏_2 (𝑥_2 )+...+𝑏_𝑘 (𝑥_𝑘 ) 𝑦: dependent variable (DV) 𝑥_1: first independent variable (IV) 𝑥_2: second independent variable (IV) 𝑥_𝑘: 𝑘𝑡ℎ independent variable (IV) 𝑏_0: constant 𝑏_1: regression coefficient of first IV 𝑏_2: regression coefficient of second IV 𝑏_𝑘: regression coefficient of 𝑘𝑡ℎ IV

Confirmation Bias

Need to avoid confirmation bias Tendency to only look for evidence that "proves" your point Instead: always seek evidence that could disprove your hypothesis This is how you create good arguments in social science Cherry-picking (selective use of evidence) Intentional intellectual dishonesty Unintentional selection bias (selection effect)

T-Distributions

Normal vs. t-Distributions Remember z-scores? 𝑧_𝑖= (𝑥_𝑖 − 𝜇)/𝜎 Work for standard normal distribution 𝑧_𝑖 ~ 𝑁(0,1) The standardized sample mean (or one-sample z statistic): 𝑧=(𝑥 ̅ − 𝜇)/(𝜎⁄√𝑛) is the basis for the 𝑧 procedure for inference about 𝜇 when 𝜎 is known If 𝝈 is unknown, however, we must use the following standard error: 〖𝑆𝐸〗_𝑥 ̅ = 𝑠/√𝑛 (where 𝑠=√((∑2_(𝑖=1)^𝑛▒〖〖〖(𝑥〗_𝑖−𝑥 ̅)〗^2 " " 〗)/(𝑛−1)) " " ) t-Distributions When we substitute the standard error 𝑠⁄√𝑛 for 𝜎⁄√𝑛 , however, the distribution of the statistic is no longer normal, but has a t-distribution (a.k.a. Student's t-distribution) Why? Introduce extra variability into the sampling distribution of the mean If a population is normally distributed (𝑁(𝜇,𝜎)), then for a random sample of size n, the sampling distribution of the one-sample t statistic 𝑡= (𝑥 ̅−𝜇)/(𝑠⁄√𝑛) is called a t-distribution with n - 1 degrees of freedom* t-Distributions Very similar to the normal distribution, but with "fatter" tails to reflect larger degree of uncertainty t-distributions are a family of distributions Exact shape of the curve depends on sample size and degrees of freedom*

Unobtrusive Research

Observational studies Analysis of existing statistical datasets (quantitative) Case studies (qualitative*) *but "quantitative" logic of inferenc Document analysis Content analysis (quantitative) Historical research (qualitative) Computer simulations

Boxplots with Outliers

Observations falling more than 1.5*IQR above Q3 or below Q1 can be considered "outliers", and be plotted outside the box as individual points

Parameter

Parameters: characteristics of the population we are specifically interested in e.g. Americans' views on the death penalty e.g. Percentage of US adults who voted

Percentiles and Quartiles

Percentiles: Median: 50th percentile 𝑝^(𝑡ℎ) percentile: a value such that 𝑝% of the observations fall below it and (1−𝑝)% above it Quartiles: most commonly used percentiles Q1 (1st quartile): 25th percentile (lower quartile) M or Q2 (2nd quartile): 50th percentile (median) Q3 (3rd quartile): 75th percentile (upper quartile)

Sampling Error

Population parameter = sample statistic + random sampling error Sampling error: the difference between the sample estimate (statistic) and a corresponding population parameter that arises because only a portion of a population is observed Decreases as sample size increases Sample need not be enormous to yield very precise results That's why polling only 1,500 US adults can yield a precise estimate of, say, voting intentions of millions of Americans

Population, Sample, Statistical Inference

Population, Sample, and Statistical Inference Population: data for every possible case Characteristics of the population = parameters e.g. 𝜇, 𝜎 Sample: subset of case that is drawn from an underlying population Characteristics of the sample = statistics e.g. 𝑥 ̅, 𝑠 Statistical inference: process of using what we know about a sample to make probabilistic statements about the broader population i.e. using a known sample statistic (e.g. sample mean, 𝑥 ̅) to infer an unknown population parameter (e.g. population mean, 𝜇) "Probabilistic" an estimate, with an amount of uncertainty

Population

Population: universe of things we are interested in studying Set of units of analysis Units of analysis contain what you want Defines the N (number of cases) For causal inference, usually very broad, sometimes theoretical e.g. all countries that exist now and will ever exist

Probability

Probability (i.e. random) sampling Based on random selection Gives each case in the population a known chance of being included in the sample Preferred method allows you to make inferences from sample to population

Cluster Sample

Probability sample in which the population is broken down into "natural" groupings or areas, called clusters, and a random sample of clusters is drawn Clusters (geographic/natural units) ≠ strata (variable categories) In multistage cluster sampling, a random sample within each selected cluster is then drawn

Stratified Sample

Probability sample in which the population is divided into strata (or variable categories) and independent random samples are drawn from each stratum e.g. by age group, gender, political party, etc. Proportionate stratified sample: strata are sampled proportionately to population composition Disproportionate stratified sample: strata are sampled disproportionately to population composition A less representative sample, but sometimes useful e.g. if sampling proportionate would leave with too few cases in one category

Regression Analysis

Produces a statistic—called a regression coefficient—that estimates the size of the effect of the IV on the DV Most widely used tool in social sciences Linear regression only requires that DV be continuous (interval or ratio) For categorical DVs: other types of regression used e.g. logit/probit regression (not covered in this course) Equation of a line 𝑦=𝑎+𝑏(𝑥) Another common notation: 𝑦=𝑏_0+𝑏_1 (𝑥)

R square

R square (R2) R2 is always between 0 and 1 The higher the value, the better the fit of the regression line to the data R2 interpretation: % of variance in Y that is explained or accounted for by our model (i.e. by our IVs) Note: R2 is only equal to the square of the correlation coefficient (r) when there is a single IV (i.e. in a simple regression) Otherwise R2 ≠ r2 In multiple regression analysis (i.e. with more than one IVs), the value of R2 is obtained via the formula* R2 Formula (FYI) 𝑅^2=𝑅𝑆𝑆/𝑇𝑆𝑆 𝑅𝑆𝑆=∑_(𝑖=1)^𝑛▒(𝑦 ̂−𝑦 ̅ )^2 𝑇𝑆𝑆= ∑_(𝑖=1)^𝑛▒(𝑦_𝑖−𝑦 ̅ )^2 When additional IVs are included in the model, the adjusted R2 is preferable: 1−(1−𝑅^2 )((𝑛−1)/(𝑛−𝑘−1)) Where: 𝑛 : sample size 𝑘 : number of IVs in the model

Simple Linear regression

Recap Simple linear regression (one IV only): 𝑦=𝑎+𝑏(𝑥) or:𝑦=𝑏_0+𝑏_1 (𝑥) 𝒚: dependent variable (DV) 𝒙: independent variable (IV) 𝒃_𝟎: constant 𝒃_𝟏: regression coefficient Amount of change in 𝑦 for a one-unit change in 𝑥 ("slope") Caveat: the value of a coefficient cannot tell us whether the relationship between 𝑥 and 𝑦 is "statistically significant"

Normal Distribution

Remember z-scores? 𝑧_𝑖= (𝑥_𝑖 − 𝜇)/𝜎 Work for standard normal distribution 𝑧_𝑖 ~ 𝑁(0,1) The standardized sample mean (or one-sample z statistic): 𝑧=(𝑥 ̅ − 𝜇)/(𝜎⁄√𝑛) is the basis for the 𝑧 procedure for inference about 𝜇 when 𝜎 is known

Sample regression Model

Sample Regression Model When trying to estimate the regression parameters described by 𝜇_𝑦=𝛽_0+𝛽_1 (𝑥)+𝜀, our sample regression model becomes: 𝑦 ̂=𝛽 ̂_0+𝛽 ̂_1 (𝑥)+𝑒 (or: 𝑦 ̂=𝑏 ̂_0+𝑏 ̂_1 (𝑥)+𝑒) 𝑦 ̂: estimated mean value of our DV (𝜇_𝑦) 𝑏 ̂_0: estimated average value of the DV when IV=0 𝑏 ̂_1: estimated average change in the DV of each one-unit change in the IV 𝑒: error term

Sampling

Sampling case selection How to choose cases that will allow you to make valid (descriptive or causal) inferences about a population of interest? Bad sampling/case selection introduces bias i.e. systematic (i.e. non-random) error Valid inferences not possible Good sampling/case selection may have nonsystematic (i.e. random) error, but that is OK Valid inferences possible (even if uncertain) Logic of sampling applies to surveys, but also observational studies Thinking probabilistically

Data availability

Selecting cases based on data availability can also be problematic, if data availability is related to the DV Example: Let's say we only have data on labor repression in East Asia and want to know how labor repression (X) affects economic growth (Y) in developing countries. What is the problem? How might the selection rule (geographic region: East Asia) be correlated with the DV (economic growth

Selecting on the DV

Selecting on the DV: Don't do it! Need at least some variation on the outcome variable (DV) How can we explain variations on the DV if it does not vary? Problematic examples: Explaining the outbreak of war by only studying wars The onset of revolutions by only studying revolutions Patterns of voter turnout by only interviewing voters Easiest selection issue to deal with just avoid it

Selecting on the DV: Truncation

So, truncation is likely to lead to biased estimates of causal effects Key is to understand in which direction the bias goes Under- or overestimating the actual effect? Any selection rule that is correlated with the DV in some way is problematic e.g. Picking different levels of income (good - variation on the DV), but leaving out lowest levels (bad - truncated DV)

Standard Deviation

Standard Deviation vs. Standard Error 2 types of variations to keep distinct: Standard deviation (SD): measure of actual spread in the sample data (𝑥) Captured in a frequency distribution Standard error (SE) (a.k.a. "random sampling error"): measure of spread in the sample statistic (𝑥 ̅) Captured in a sampling distribution

Why are Z-Scores Useful?

Standardized units Allow you to consult a standard normal probabilities table (or "z table") to figure out the probability of observing a particular z-score value

Statistical Significance

Statistically significant relationship: a conclusion, based on the observed data, that the relationship between two variables is not due to random chance (and thus exists in the broader population) Never 100% certainty need to specify a significance level (𝜶) Typically the 5% significance level (𝜶 = 0.05) i.e. 95% probability that it is not due to chance Other commonly used levels of significance: 𝛼=0.01 and 𝛼=0.1

Applying the Confiedence Interval

Suppose someone claims that the true value of µ is equal to 66 How likely is that claim, give that your (random) sample mean, 𝑥 ̅, is 59? Because 66 lies above the upper confidence boundary (64.36), we know that the probability is less than .025 that the true mean is 66. We can confidently reject the claim Why? Only 0.025 probability that the true population mean is 66, given that our random sample gave us a mean of 59. There is a possibility that we would wrongly reject the claim, however

Systematic Samples

Systematic Samples Probability sample in which elements are selected from a list of predetermined intervals Pick every Kth element from a list e.g. every 3rd name on a random list K: sampling interval (or the "skip" - number of elements between elements that are drawn) Formula: 𝐾=𝑁/𝑛 N = population size n = desired sample size

Mean

The "average" value Arithmetic average/mean Sum of individual values of a variable, divided by total number of cases Only applicable to interval/ratio measures Nonsensical for nominal/categorical or ordinal variables The "center of gravity" of a distribution Not robust to "outliers" (abnormal observations)

5-number summary, Range, IQR

The 5-number summary: Minimum Q1 M Q3 Maximum Offers a description of center and spread Range = Maximum (highest value) - Minimum (lowest value) Interquartile range (IQR): 𝐼𝑄𝑅 = 𝑄3 - 𝑄1

Central Limit Theorem

The Central Limit Theorem: The sample statistics from random samples of a population will be normally distributed around the population parameter with variance σ2/n. Variance: measure of dispersion of data points about the mean (μ) Standard deviation (σ): same definition (square root of variance)

Cross tabular analysis and chi-square

The t test does not work when the DV is categorical e.g. gender difference in presidential vote (e.g. Obama vs. Romney) Instead use chi-square (𝝌^𝟐) test for tabular association: χ^2=∑▒(𝑓_𝑜 −𝑓_𝑒 )^2/𝑓_𝑜 Where 𝑓_𝑜: observed frequency and 𝑓_𝑒: expected frequency (i.e. the frequency expected if 𝐻_0 is true)

Confidence Intervals

The confidence interval (CI) for a parameter is a range believed to contain the parameter (e.g. 𝜇), with a specific level of probability Typically 95% (but can by any range) Confidence intervals for the mean are based on a point estimate (𝑥 ̅) and the spread of the sampling distribution (the standard error) When the sampling distribution is approximately normal (almost always, via the CLT), the CI is simply plus or minus a specific number of standard errors around the point estimate CI for the mean (𝜇): Point estimate = 𝒙 ̅ (where 𝑥 ̅=(∑_(𝑖=1)^𝑛▒𝑥_𝑖 )/𝑛) Standard error =𝝈⁄√𝒏 (where 𝜎=√((∑2_(𝑖=1)^𝑛▒〖〖〖(𝑥〗_𝑖−𝑥 ̅)〗^2 " " 〗)/𝑁)) "plus or minus a specific number" of SEs How many? Depends on how "confident" you want to be e.g. 95% confidence ±1.96 𝑆𝐸𝑠 (this is your "margin of error")

p-values

The hypothesis test is based on a test statistic* The p-value is the probability, computed assuming H0 is true, that the test statistic will take a value at least as extreme as that actually observed The smaller the p-value, the more the data contradict H0 If the p-value is as smaller than a specified level of significance (e.g. ≤ 0.05), then the results are statistically significant at that specified level of significance The p-value is the primary means of assessing H0 e.g. if p ≤ 0.05 reject H0 at the 5% level

Mode

The most "frequent" value The most "typical" value in a frequency distribution Applicable to any level of measurement Mostly used for discrete variables May not be near the "center" of a distribution for highly skewed data Mode There is always only one mean or one median, but there can be more than one modal value Bimodal vs. multimodal vs. uniform distributions Number of modes often used to describe the shape of a distribution e.g. single- vs. double- vs. multi-peaked "Mode" is often used imprecisely: the different modes may not have exactly the same frequency Mode For a unimodal, symmetric distribution, the mean, median, and mode coincide

Normal Distribution

The naturally occurring shape of all sampling distributions Symmetric, bell-shaped, unimodal Completely characterized by its mean (𝜇 or 𝑥 ̅) and its standard deviation (𝜎 or 𝑠) 68-95-99.7 rule: Pr⁡(−1𝜎<𝑥<1𝜎)≈68.27% Pr⁡(−2𝜎<𝑥<2𝜎)≈95.45% Pr⁡(−3𝜎<𝑥<3𝜎)≈99.73%

Usefulness of Standard Deviation

The normal distribution and the 68-95-99.7 rule Usefulness of the Standard Deviation Standardized measure: No need for context to understand its meaning Example: "This plant sample is 53cm tall" Is that tall? Short? Average? How does it compare? What's a cm? If 53cm is more than 3 SDs above the mean, and the heights are normally distributed we know it's relatively very tall Usefulness of the Standard Deviation Probability: the sampling distribution of the sample mean (𝒙 ̅) is distributed normally, and centered at 𝝁, with a standard deviation of 𝝈.

Deviant Case

The opposite of a typical case Case demonstrating a surprising value (relative to expectation) e.g. Botswana (economic development) e.g. US welfare state (relatively underdeveloped) "Deviant" = poorly explained by a model Anomalous, an "outlier" Large residuals (compared to regression line) Main use: to develop new explanations Hypothesis-generating not hypothesis-testing

Questions: Order

The placement of a question can have a greater impact than the particular choice of words Does question come too early or too late to arouse interest? Is the answer influenced by prior questions? Order effects: questions early in a questionnaire may impact how respondents answer subsequent questions (by providing context for subsequent questions) Contrast effects: order results in greater differences in responses Assimilation effects: order results in more similar responses

Standard Normal Distribtution

The standard normal distribution is a special case with mean 𝜇 = 0, and std. dev. 𝜎 = 1 𝑥_𝑖 ~ 𝑁(0,1) Any variable 𝑥 following a normal distribution 𝑁(𝜇,𝜎) can be standardized (i.e. turned into a standard normal distribution), by calculating its z-score: 𝑧= (𝑥 − 𝜇)/𝜎

T-statistic and Statistical Significance

The t statistic provides the critical value from which the p-value is calculated Compare this p-value to the desired level of significance e.g. α = 0.05; or α = 0.01; or α = 0.001. In published regression tables, those are usually identified as stars (* or ** or ***) Helps us determine whether to reject the null hypothesis that there is no relationship between our IV and DV Informally, if the t-ratio is larger than 2 or 3 (think "z scores"), we can typically reject the null hypothesis (i.e. the result will be "statistically significant")

Cross tabular analysis and Chi-Square

The t-test does not work when the dependent variable is categorical (nominal or ordinal) Example: Are majority-Muslim countries less likely to be democratic than non-Muslim countries? Democracy (yes/no) and religion (Muslim/non-Muslim) Instead use cross tabulations and the chi-square (𝝌^𝟐) test for tabular association: χ^2=∑▒(𝑓_𝑜 −𝑓_𝑒 )^2/𝑓_𝑜 Where: 𝑓_𝑜= observed frequency 𝑓_𝑒= expected frequency (i.e. the frequency expected if 𝐻_0 is true)

Two sample test statistic

The two-sample test statistic is =〖(𝐻〗_𝐴 −〖 𝐻〗_0)/𝑆.𝐸. 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 Since we don't know 𝜇_1 and 𝜇_2 or 𝜎_1and 𝜎_2, we use the following two-sample t statistic to test the hypothesis 𝐻_0:𝜇_1=𝜇_2 𝑡= ((𝑥 ̅_1−𝑥 ̅_2 ))/√((𝑠_1^2)/𝑛_1 +(𝑠_2^2)/𝑛_2 ) Most statistical programs (incl. SPSS) automatically perform a t test The t statistic gives you a critical value, from which you can get the p-value (also automatically reported0

Z-Score

The z-score for any observation, 𝑥_𝑖 (i.e. a value of a variable), is the number of standard deviations an observation falls from the mean of 𝑥: 𝑧_𝑖= (𝑥_𝑖 − 𝜇)/𝜎 where 𝑧_𝑖 ~ 𝑁(0,1) Converting a variable into a z-score is called standardization

Regression and Statistical Significance

To figure out whether the regression coefficient (𝒃_𝟏) is statistically significant, we need to conduct a t-test to get a t statistic The t statistic (a.k.a. the t-ratio) is simply the coefficient divided by the standard error: 𝒕=𝒃_𝟏/𝑺𝑬 In SPSS regression outputs, the t value and the SE are always given In published regression tables (see, e.g., Ross on oil and democracy), the SE alone usually appears in parentheses below the regression coefficient

Covariational Evidence

To identify a cause and effect variables must covary "No-variance" designs provide no leverage for causal inference. Why? Sources of covariation: Spatial: Cross-cases vs. within-case comparison Cross-cases: comparison of different cases (e.g. UK vs. Ghana) NOT case studies (e.g. large-N analysis, comparative method) Within-case: variation in a single unit, say, over time (e.g. France before and after the Revolution) or in different regions (i.e. breaking the unit into subunits) i.e. case studies Temporal: Synchronic vs. diachronic comparison Synchronic: at same time (e.g. French regions during Revolution) Diachronic: at different times (e.g. before and after the Revolution) + Diachronic and synchronic variation can be combined So, using N= 1 for single case studies may be misleading, as we are always looking for within-case covariational evidence to assess a causal relationship

Weaknesses of Case Studies

To identify a cause and effect variables must covary "No-variance" designs provide no leverage for causal inference. Why? The Fundamental Problem of Causal Inference Cannot observe the counterfactual (fundamental uncertainty) Second best: experimental designs Create an (imperfect) alternative reality (control & test groups) for testing effect of X on Y Nonexperimental alternatives: observational studies Compare actual cases

Relatively Frequency

Transforming raw frequencies into a proportion or percentage Percentage: 𝑝_𝑘= 𝑓_𝑘/𝑁 𝑓_𝑘 = raw frequency of 𝑘th category 𝑁 = total number of observations (total frequency) Bar charts, histograms can still be used, but also pie charts

Stochastic Variation

We know the regression line is just a "best fit" for the data Not all points fall precisely on the line There is a lot of random (i.e. stochastic) error Vertical distance between the data point and the line = residuals

Nature

What if we use the entire population of cases? Often, you can observe the "surviving" cases of a process These may be unique/unusual/exceptional in some ways, not representative of the phenomenon you want to explain Example: What society in the past had the most sophisticated art? Societies with stone vs. with wood sculpture

Test statistic

When comparing 2 samples (e.g. sample means), the relevant test statistic to use depends, in part, on the level of measurement of variables Examples (covered in class): If the DV is continuous: use a difference of means t-test If the DV is categorical: use chi-square (𝝌^𝟐) tabular analysis

Inference Using Confidence Intervals

When n is small or when 𝝈 is unknown (which is typically the case), we must use a t-distribution to make inferences about a population parameter Recall: the normal distribution always has the same shape 95% CI: z-scores of -1.96 and +1.96 marks the lower and upper bounds The shape of a t-distributions, by contrast, varies Its lower and upper bounds of a CI will depend on its degrees of freedom The one-sample t confidence interval for 𝜇 is: 𝑥 ̅ ±𝑡^∗∙(𝑠⁄√𝑛) (Instead of: 𝑥 ̅ ±𝑧^∗∙(𝜎⁄√𝑛)" " ) where 𝑡^∗ is the value for the 𝑡(𝑛−1) density curve with area between 〖−𝑡〗^∗ and 𝑡^∗ Note: this interval is exact when the population is normally distributed, and is approximately correct for large n. Margin of error = ± 𝑡^∗∙(𝑠⁄√𝑛) The 95% CI defines the boundaries of plausible hypothetical claims and implausible hypothetical claims All hypothetical values of µ that fall within the 95% CI are considered plausible and are not rejected All hypothetical values of µ that fall outside the 95% CI are considered implausible and are rejected

Properties of the Standard Deviation

𝑠≥0 𝑠=0 only if variable is a constant (no spread) The greater the dispersion, the greater the SD Not resistant to outliers (like the mean)

Multiple regression and the logic of control

𝑦=𝑏_0+𝑏_1 (𝑥_1 )+𝑏_2 (𝑥_2 ) Interpretation of the coefficients: Same as for simple regression models However: in multiple regression, coefficients (𝑏_1, 𝑏_2...) actually represent the effect of the IV on the DV while holding constant the effect of the other variable(s) e.g.: in model above, 𝒃_𝟐 represents the effect of 𝒙_𝟐 on 𝒚 controlling for the effect of 𝒙_𝟏on 𝒚 Note: the difference between an "IV" and a "control variable" is a purely conceptual/theoretical one Statistically, IVs and CVs are treated the same


Set pelajaran terkait

Sampling Distribution and Estimation

View Set

Environmental science ch 8 true or false

View Set

Exam #4 Chapter 39: Management of Patients with Oral and Esophageal Disorders

View Set

ITN - Modules 1 - 3: Basic Network Connectivity and Communications

View Set

BSAN 160 - Activities and Quiz Questions

View Set

Medical-Surgical:Cardiovascular and Hematology

View Set

CH 1 - 13 Combined (45 Hour Post-license 1st renewal)

View Set