Research Methods Final
Student's t-distribution
a probability distribution that can be used for making inferences about a population mean when the sample size is small
Controlled effect
a relationship between a causal variable and a dependent variable within one value of the another causal variable
Direct relationship
a relationship that runs in a positive direction, increase in x = increase in y
Inverse relationship
a relationship that runs in the negative direction, increase in x = decrease in y
Random sample
a sample that has been randomly drawn from the population
Conceptual dimension
a set of concrete traits of similar type
Variance
average of the squared deviations, indicator of how dispersed the data is, allows us to find standard deviation
Cross-sectional study
interviewed at one point in time, reliability is an issue but it's cheaper
Panel study
interviewed at two different times, better reliability but it's more expensive
Cross-tabulation
table that presents percentages of categories in the independent variable
Frequency distribution
tabular summary of a variable's values
Measure of association
tells the researcher how well the independent variable works in explaining the dependent variable; examples are: _
Alternative-form method
test-retest but with two different tests
Laboratory experiment
the control group and the test group are studied in an environment created wholly by the investigator, ex. going to a specific research venue
Additive relationship
the control variable is a cause of the dependent variable but defines a small compositional difference across values of the independent variable - because the relationship between X and Z is weak, X retains a causal relationship with Y after controlling for X; Z also helps to explain the dependent variable; in a set of additive relationships, the tendency and strength of the relationship between the independent variable and the dependent variable are the same or very similar in all values of the control variable
Population
the universe of cases the researcher wants to describe
Independent variable
the variable that represents the causal factor in an explanation
Dependent variable
the variable that represents the effect in a causal explanation, it is dependent on the other variable; in graphs it is the vertical pillar which is dependent on the location of the foundation
Sample
a number of cases or observations drawn from a population
Reliability
consistency
Negative relationship
downward sloping line \
Test-retest method
like it sounds, evaluates reliability
Central tendency
typical or average value, measured by mean, median, and mode
Positive relationship
upward sloping line /
Normal distribution
used to describe interval-level variables; bell curve
Face validity
using informed judgment to determine what an operational procedure is measuring what it is supposed to measure, "On the face of it are there good reasons to think that this measure is not an accurate gauge of the intended characteristics?
Median
value of a variable that divides the cases right down the middle
one-tailed test of statistical significance
1.645
1.645
Aka one-tailed test of statistical significance. In normal estimation, the absolute value of Z that marks the boundary between .95 of the curve and .05 in one tail is 1.645. Therefore, the lowest plausible difference is defined by the sample statistic minus 1.645 standard errors. If this value is greater than 0 we can reject the null hypothesis.
Adjusted R-square
Often close but always less than regular R-square; adjusts for the fact that squaring any negative error leads to a positive number and inflates the estimated value of R square
Types of measures of association
PRE measures, R-square and adjusted R-square, regression coefficient, Pearson's r, lambda
Pearson's r
Pearson's correlation coefficient; xi = individual observations of x; x bar = sample mean of x; sx = sample standard deviation of x; yi = individual observations of y; y bar = sample mean of y; sy = sample standard deviation of y; symmetrical measure of association, meaning that the correlation between the dependent variable and the independent variable is the same as the correlation between the independent variable and the dependent variable = it is neutral on the question of which variable is the causal variable and which is the effect; it is not a PRE because it does not tell us how well we can predict the dependent variable by knowing the measure of the independent variable; it is bounded by -1 and +1 and communicates strength and direction by a common metric
Asymmetric measure of association
_
Error sum of squares
_
Regression sum of squares
_
Rule of direction for nominal relationships
_
Symmetric measure of association
_
R-square
a PRE measure (of association) bracketed between 0 and 1 that may be interpreted as the proportion of the variation in the dependent variable that is explained by the independent variable; to calculate = ∑(yi - y bar)2; measures the goodness of the fit between the regression line and the actual data
Population parameter
a characteristic of a population, ex: dollar amount of the average PAC contributions or percent of adults who voted; = sample statistic + random sampling error
Aggregate-level unit of analysis
a collection of individual entities (ex: neighborhoods or census tracts)
Bimodal distribution
a frequency distribution having two different values that are heavily populated - not a bell curve
Standard error of the difference
a more formal derivation of the standard error of the mean difference; to calculate = square each standard error, sum them, and take the square root
Hypothesis
a testable statement about the empirical relationship between cause and effect
Dummy variable
a variable for which all cases falling into a specific category assume a value of 1, and all cases not falling into that category assume a value of 0
Intervening variable
a variable that acts as a go-between or mediator between an independent and dependent variable, for ex: having a higher education doesn't make you vote, the fact that you have collaborated with your peers in college does
Controlled comparison
accomplished by examining the relationship between an independent and a dependent variable, while holding constant other variables suggested by rival explanations and hypotheses
Index
additive combination of ordinal variables coded identically
Controlled comparison table
aka control table; presents a cross-tabulation between an independent variable and a dependent variable for each value of the control variable
Zero-order relationship
aka gross relationship, uncontrolled relationship - a difference obtained from a simple comparison; an overall association between two variables that does not take into account other possible differences between the cases being studied; summarizes a relationship between two variables
Systematic measurement error
aka measurement bias, distorts empirical measurement and mis-measures, inherent problems with measurement system
Census
allows researchers to obtain measurements from all members of a population
Ecological fallacy
an aggregate-level phenomenon is used to make inferences at the individual level (whole-to-part)
Rival explanation
an alternative cause for the dependent variable, ex.: everyone in the control group was healthier
Variable
an empirical measurement of a characteristic; variable name (marital status), variable values (married), numeric codes (corresponds to 1)
Sample statistic
an estimate of a population parameter, based on a sample drawn from the population
Linear relationship
an increase in the independent variable is associated with a consistent increase or decrease in the dependent variable
Compositional difference
any characteristic that varies across categories of an independent variable, ex: democrats and republicans difference in gender, income, preferred ice cream flavor - not all present a plausible rival explanation
Random sampling error
as variation goes up, this increases in direct relation to the population's standard deviation
Mean
average
Conceptual definition
cannot use one concept to define another
Nominal-level variable
communicates differences between units of analysis on the characteristic being measured, ex: marital status, religious denominations, gender, race, etc.; must measure with mode
Interval-level variable
communicates exact differences between units of analysis, ex: age, family members, commute time
Ordinal-level variable
communicates relative differences between units of analysis; can be ranked, ex: support for school prayer; measure with mode or median
Control group
composed of citizens who did not receive the treatment
Test group
composed of subjects who receive a treatment that the researcher believes is causally linked to the dependent variable, ex: patients with a certain disease undergoing treatment
Types of tests of statistical significance
confidence interval, p-value, two-tailed (eyeball), one-tailed (1.645), chi-square, z-score, P-value
Field experiment
control and test groups are studied in their normal surroundings, probably unaware an experiment is taking place
Operational definition
describes the instrument to be used in measuring the concept and putting a conceptual definition into operation, describes explicitly how the concept is to be measured empirically
chi-square test of significance (x2)
determines whether the observed dispersal of cases departs significantly from what we would expect to find if the null hypothesis was correct; to calculate, the expected frequency is the total frequency equally applied to both the independent samples, (observed frequency -expected frequency)^2/expected frequency, calculate for every variable and sum them to find chi-square; null hypothesis says that the result should be close to 0; if chi-square is greater than the degrees of freedom, the null hypothesis can be rejected
Prediction error
difference between the estimated value of the dependent variable on a scatter plot based on the line that best fits the data and the actual position of the data; = yi - y hat where yi = individual value of y and y hat = an estimated value of y
Negative skew
distribution with a skinnier left-hand tail <
Positive skew
distribution with a skinnier right-hand tail >
Confidence interval approach
equal to +/-2 rule of thumb; uses the standard error to determine the smallest possible mean difference in the population. If the smallest possible difference is greater than 0, the null hypothesis can be rejected _
Partial regression coefficient
estimates the mean change in the dependent variable for each unit change in the independent variable controlling for the other independent variables in the model
+/-2 rule of thumb
estimation of 95 percent confidence interval to find boundaries of the lower and upper ends; the sample mean +/- 1.96 (rounded to 2) standard errors
Random assignment
every participant has equal chance of ending up in the control or test group
Construct validity
examines the empirical relationships between a measurement and other concepts to which it should be related "Does this measurement have a relationships with other concepts that one would expect it to have?"
Conceptual question
expressed using ideas, frequently unclear and difficult to answer empirically
Concrete question
expressed using tangible properties, can be answered empirically
Proportional reduction error (PRE)
gauges strength of a relationship (measure of association); a prediction-based metric that varies between 0 and 1 - if knowledge of the independent variable does not provide any help in predicting the dependent variable, PRE will assume a value of 0, if it does it will equal 1.
Test of statistical significance
helps you decide whether an observed relationship between an independent variable and a dependent variable really exists in the population or whether it could have happened by chance when the sample was drawn; examples are: _
External validity
if the results of a study can be generalized and applied to situations in the non-artificial, natural world (lab experiments)
Central limit theorem
if we were to take an infinite number of samples of size n from a population of N members, the means of these samples would be normally distributed. This distribution of sample means, furthermore, would have a mean equal to the true population mean and have a random sampling error equal to the population standard deviation divided by the square root of n. It states the 95 percent confidence interval.
Critical value
marks the upper plausible boundary of random error and so defines the null hypothesis' limit
Regression analysis
measure of association, produces the regression coefficient that estimates the size of the effect of the independent variable on the dependent variable; measures direction and could it have happened by chance
Lambda
measure of association: designed to measure the strength of a relationship between two categorical variables, at least one of which is nominal-level; to calculate = (prediction error without knowledge of the independent variable - prediction error with knowledge)/prediction error without knowledge; _
Cronbach's alpha
measures internal consistency
Split-half method
measures internal consistency
Sampling frame
method for defining the population it wanted to study; poor sampling frames lead to sampling bias or selection bias
Mode
most common answer
Null hypothesis
negative hypothesis, states that in the population there is no relationship between the variables and any relationship observed in a sample was produced by random sampling error
Raw frequency
number of responses
p-value approach
researcher determines the exact probability of obtaining the observed sample difference under the assumption that the null hypothesis is correct. If the probability value (p-value) is less than or equal to .05, then the null hypothesis can be rejected
Interaction effect
occurs in multiple regression analysis when the effect of an independent variable cannot be summarized by a single partial effect. Instead, the effect varies depending on the value of another independent variable in the model; if interaction is going on in the data, or the researcher has described an explanation or process that implies interaction, then a different model needs to be identified.
Response bias
occurs when some cases in the sample are more likely than others to be measured
Multicollinearity
occurs when the independent variables are related to each other so strongly that it becomes difficult to estimate the partial effect of each independent variable; there are too few samples.
Standardization
occurs when the numbers in a distribution are converted into standard units of deviation from the mean of the distribution; a standardized value is called a Z-score
Type I error
occurs when the researcher concludes that there is a relationship in the population when in fact there is none; more serious than Type II
Type II error
occurs when the researcher infers that there is no relationship in the population when in fact there is
Total sum of squares
overall summary of the variation in the dependent variable. It also represents all our errors in guessing the value of the dependent variable for each case, using the mean of the dependent variable as a predictive instrument. _
Correlation analysis
produces a measure of association, Pearson's r, between interval-level variables; measures strength and direction of relationship
Random measurement error
random errors such as fatigue, commotion, unavoidable distractions
Standard error
random sampling error, standard error of the mean: calculate = standard deviation / square root of the sample size
Inferential statistics
refers to a set of procedures for deciding how closely a relationship we observe in a sample corresponds to the unobserved relationship in the population from which the sample was drawn
two-tailed test of statistical significance
same as eyeball test; indicates upper and lower limits of random sampling error
Mean comparison table
shows the mean of a dependent variable for cases that have different values on an independent variable
Regression coefficient
slope of the regression line, rise/run
Sample size component of random sampling error
square root of n; as the sample size goes up, random sampling error declines as a function of the square root of the sample size
Degrees of freedom
statistical property of a large family of distribution, including the Student's t-distribution. The number of degrees of freedom is equal to sample size n minus the number of parameters being estimated by the sample
Partial relationship/partial effect
summarizes a relationship between two variables after taking into account rival variables
Standard deviation
summarizes the extent to which the cases in an interval-level distribution fall on or close to the mean of the distribution. To calculate: (each individual value - mean)^2; take average of all squared deviations, take the square root; 68% of samples fall between two standard deviations (one on either side of the mean) and 95% of samples fall between four standard deviations (two on either side of the mean)
Spurious relationship
the control variable, Z, defines a large compositional difference across values of the independent variable, X, and this compositional difference is a cause of the dependent variable Y - after holding Z constant, the empirical association between X and Y turns out to be completely coincidental; in a spurious relationship, after holding the control variable constant, the relationship between the independent variable and the dependent variable weakens or disappears
Unit of analysis
the entity we want to analyze
Random sampling error
the extent to which a sample statistic differs by chance from a population parameter; to calculate = standard deviation/square root of the sample size OR = square root of [sample proportion times (1 - sample proportion)]/square root of n; inverse relationship between random sampling error and sample size
95 percent confidence interval
the interval within which 95 percent of all possible sample estimates will fall by chance; the upper and lower confidence boundaries are defined by the sample mean minus (1.96 x standard error); stated by the central limit theorem
Hawthorne effect
the knowledge they are being studied changes responses
Probability
the likelihood of the occurrence of an event or set of events
.05 level of significance
the minimum standard to reject the null hypothesis to ensure Type I error is committed less than 5 times out of 100; if the null hypothesis is true how often by chance will we obtain the relationship observed - if the answer is more than 5 times out of 100 we do not reject the null hypothesis
Interaction variable
the multiplicative product of two or more independent variables
Sample proportion
the number of cases falling into one category of the variable divided by the number of cases in the sample
Cumulative percentage
the percentage of cases at or below any given value of the variable
Salient
the person really cares about an issue; importance
Interactive relationship
the relationship between the independent and dependent variable depends on the value of the control variable - for one value of Z, the X-Y relationship might be stronger than for another value of Z; in a set of interaction relationships, the tendency or strength of the relationship between the independent and dependent variable is different, depending on the value of the control variable
Curvilinear relationship
the relationship between the variables depends on which interval or range of the independent variable is being examined; relationship may change from positive to negative or just change in strength
z-score
to calculate = (deviation from the mean) / (standard unit = standard deviation); this number indicates at which standard deviation your sample lies and how unusual the measurement is (what probability one has of getting that measurement). Then, look up z-score on chart; if it is positive, the number on the chart indicates what percentage of the samples are above it, if negative the number on the chart indicates what percentage of the samples are below it
t-ratio
to calculate, (observed sample difference - 0)/standard error of the difference
Validity
truthfulness
Multidimensional concept
two or more distinct groups of empirical characteristics
Multiple regression
we are able to isolate the effect of one independent variable while control for the other independent variables
Individual-level unit of analysis
when a concept describes a phenomenon at its lowest possible level
Random selection
when every member of the population has an equal chance of being included in the sample; needed for a valid sample
Selection bias
when nonrandom processes determine the composition of the test and control groups so that they differ; occurs when nonrandom processes create compositional differences often unbeknownst to the researcher between test group and the control group
Internal validity
within the conditions created artificially by the researcher, the effect of the independent variable is isolated from other plausible explanations (lab experiments)
Regression line
y = a + b(x); a is the y-intercept, b is the slope (regression coefficient), and x and y are the independent and dependent variables; equations provides a general summary of the relationship and allow predictions for future variable values