Research Methods 2 Exam 2
extraneous variables
'third variables' that do not appear in the initial hypothesis and produce spurious relationships between i-vars and d-vars
Test Statistic
The test statistic is a statistic calculated from the sample data to test the hull hypothesis.
Lambdaλ
Variables - nominal values - 0 to 1.00 0 = no association 1.00 = perfect association Direction - not applicable
Cause
explanation for some characteristic, attitude, or behavior of groups, individuals, other entities (such as families, organizations, or cities), or for events ex: what causes people to be violent? what causes teenage births to drop?
PRE reflects...
how well knowledge of one variable improves classification of values on another variable
PRE measures
proportionate reduction of error
Causal effect (nomothetic perspective)
the finding that change in one variable leads to change in another variable, other things being equal (ceteris paribus)
2. Appropriate time order
the independent variable comes before the dependent variables (i.e.- the temporal priority of the independent variable) the change in X must occur before the change in Y e.g. - Graduate from college in 2006--> get a new job with degree and make more money in 2007
between group variation
variability among group means, weighted by sample size (BSS)
Univariate Analysis
1 variable major question: What? Goal: Description
Elements of a Significance Test (5)
1. Assumptions 2. Hypotheses 3. Test statistic 4. P-value 5. Conclusion
Chi-Square Test
1. Assumptions- two categorical variables (nominal or ordinal) 2. hypotheses - H0- the variables are statistically independent (i.e. - no association) Ha- the variables are statistically dependent 3. test statistic 4. p-value... P= right-hand tail probability above observed x^2 value, for chi-squared distribution with df = (rows - 1) (columns - 1) 5. conclusion- report p-value if decision needed, reject H0, at a level if p<a
Nomothetic Causal Explanation
An explanation involving the belief that variation in the independent variable will be followed by variation in the dependent variable, when all other things are equal (ceteris paribus)
Lambda λ
E 1 = errors made using mode of d-variable (uneducated errors) E 2 = errors made using mode of dependent variable within categories of i-variable (educated errors)
3. Nonspuriousness
The relationship between the independent and the dependent variables must NOT be due to a third variable (i.e. - the relationship must NOT be spurious, or false) ex: in regions where there are more storks, there are more children born. however, the REAL CAUSAL VARIABLE is whether or not the region is rural or urban --- rural families have more children than urban families (they need the help) rural areas have more storks
Assumptions
The type of data: levels of measurement, continuous or discrete The form of population distribution: for some tests, normal is required (e.g., small sample) The method of sampling: random sampling is required for most statistics The sample size: the validity of many tests improves as the sample size increases
Basic criteria for making claims of causality
Three most critical criteria 1. empirical association 2. appropriate time order 3. nonspuriousness Two additional criteria- 4. identifying a mechanism 5. specifying the context
ANOVA
analysis of variables
intervening variables
causal mechanism; the variables that explain the relationship between the independent and dependent variables
Statistical Significance vs. Substantive Significance
chi-squared statistic and the p-value tell us nothing about the nature or strength of the association large chi-square values can occur with weak association, if the same size is large
Notation
computed for each group g groups sample sizes sample means sample standard deviations
The two sets of percentages for females and males are called __________ __________ on the dependent variable, college plans.
conditional distributions
Percentages
crude measure of the strength of associations (the larger the % differences across the categories of the independent variable, the stronger the association) the rough '10 percentage point rule'
1. Empirical Association
empirical (observed) correlation between the independent and dependent variables (ie- they must vary together) a change in X is associated with a change in Y e.g. - as education increases, future income increases
Assumptions
independent random samples the dependent variable has normal population distributions equal standard deviations for all g groups interval/ratio dependent variable
The independent variable __________ the dependent variable
influences
E 2 =
number of errors made predicting the dependent variable KNOWING the distribution of the independent variable
E 1 =
number of errors made predicting the dependent variable NOT KNOWING the distribution of the independent variable
Gamma
number of some pairs minus number of opposite pairs divided by number of same pairs plus number of opposite pairs
Direction of Association
only applicable when both independent and dependent variables are at least ORDINAL variables
Epsilon
simple statistic used to summarize percentage differences calculated by identifying the largest and smallest percentages in each row and then subtracting the smallest from the largest
the variables are _________________- ______________ if the conditional distributions are not identical - having an association
statistically dependent
Two categorical variables are ______________ _________________ if the population conditional distributions on one of them are identical at each category of the other - having no association
statistically independent
Measures of Association
statistics that indicate the strength (and in certain cases, the direction) of the relationship or association between two variables
ANOVA
tests for differences among to or three groups **t-test is a special case of ______
t-test for 2 independent samples
tests whether the observed different between two means are statistically significant most appropriate for dependent variable measured at interval/ratio level, and an independent variable with two groups (used to define two groups) estimates the probability that the observed difference between the 2 means is a result of random chance or sampling error
Idiographic Causal Explanation
the concrete, individual sequence of events, thoughts, or actions that resulted in a particular outcome for a particular individuals that led to a particular event
Causal effect (idiographic perspective)
the finding that a series of concrete events, thoughts, or actions resulted in a particular event or individual outcome
4. Identifying a causal mechanism
the process that creates a connection between the variation in an independent variable and the variation in the dependent variable it is hypothesizes to cause e.g. going back to the education and income example.. the causal mechanism is that the education provided the qualifications and training that allowed you to get a higher paying job
within group variation
variability among individuals within the same group (WSS)
Gamma γ
variables (strength)-- -1.00 to 1.00 0=no association -1.00=perfect (negative) +1.00= perfect (positive) direction -- + positive (variables change or move in the same direction) - negative (variables change or move in the opposite directions) Caveat - have the values been arbitrarily assigned? if so +/- sign may be reversed
Bivariate Analysis
2 variables major question: Why? Goal: Explanation
Statistical Significance
A measure of the likelihood that an observed relationship between the variables in a probability sample represents something that exists in the population rather than being due to sampling error. the likelihood that the observed relationship could have resulted from sampling error is very small (e.g., less than 5%), then we have the confidence needed to generalize our finding from the sample to the population from which it was drawn.
Interpreting Crosstabs (3 terms)
Association - is there an association between the variables? Strength - what is the strength between the two variables? Direction -- what is the direction of association?
Types of Causal Explanation
Nomothetic Idiographic
Hypotheses
Null hypothesis (H0): the hypothesis that is directly tested. This is usually a statement that the parameter has value corresponding to, in some sense, no effect. That is, the two variables are not related; they are independent of one another. Alternative hypothesis (Ha): the hypothesis that contradicts the null hypothesis. This hypothesis states that the parameter falls in some alternative set of values to what the null hypothesis specified; often called research hypothesis. That is, the two variables are related; they are not independent of one another.
Conclusion
Report P-value. Make a formal decision: often accomplished by comparing P-value to a pre-set "α-level" representing strength of evidence required to reject the null hypothesis (α-level usually set at .05, .01, or .001) Reject H0 and accept Ha if P ≤ α-level Fail to reject H0 [But never accept a null hypothesis!!!] Provides an interpretation of what the P-value or decision about H0 tells us about the original question motivating the test. Most studies require very small P-values (e.g., P ≤ .05) before concluding the data sufficiently contradict H0 to reject it. In such cases, the results are said to be statistically significant at the .05 level. Smaller P provides stronger evidence against H0 and supporting Ha.
Inferential Statistics
Statistics that are concerned with making generalizations from samples to populations.
Descriptive Statistics
Statistics that are widely used to describe and summarize main features of data, characteristics of a sample, or the relationships between variables in a dataset.
Chi Square Test
Testing whether the association we see in the crosstabulation is statistically significant.
P-Value
The P-value, denoted by P, is the probability of a test statistic value at least as large as the observed value when H0 is true. The smaller the P-value, the more strongly the data contradict H0; the stronger evidence to reject H0; the more statistically significant the relationship is. It is the likelihood that the observed relationship could have resulted from sampling error. It is the probability of making an error if rejecting H0 when it were true.