Research Design
Spearman rho
A correlational technique used primarily for rank ordered data (ordinal scale); nonparametric
Canonical correlation
A correlational technique used when there are two or more X and two or more Y. (Example: The correlation between (age and sex) and (income and life satisfaction)
Likert scale
A representation of a continuous attitudinal variable; A numerical scale used to assess people's attitudes that includes a set of possible answers and that has anchors on each extreme; True Lickert scale is 5-point and has midpoint, i.e., neither disagree/agree
Delimiting variable
A variable that is constraining generalization of results.
Target population
Population that is of interest to the investigator and about which generalizations of study results are intended. To whom do you want to generalize results? "Universe" For example, you want to generalize your findings to all elementary school teachers.
Power rule of thumb
Power needs to be .8 or better (Beta error of 80%). Calculate power by 1-probability of a Type 2 error.
Mixed methods--triangulation
Qualitative and quantitative done simultaneously
Q-M-R-I-C: Alignment of the Chain of Reasoning
Question-Method-Results-Interpretation-Conclusion
Cluster Sampling
Random, but random by clusters/groups of individuals (i.e., a randomly selected class of students, a randomly selected block of residents)
Precision of measurement
Refers to the validity and reliability of the score you get; lack of error and being on target with what you're trying to measure.
RP--RPS--RQ--RH--RNH
Research Problem--Research Problem Statement--Research Question--Research Hypothesis--Research Null Hypothesis
Same Thing in 3 Theories 1. true score 2. universe score 3. ability or trait parameter
Same Thing in 3 Theories 1. hypothetical error-free score in classicla test theory 2. Generalizability theory 3. Item Response Theory
SBR
Scientifically Based Research
SRS
Simple Random Sample--Each element has an equal chance of being selected; names out of a hat
Effect size
Smaller differences require a larger sample. Heterogeneous population requires larger sample. Homogeneous population requires smaller sample.
If you don't get variability in your measure then....
you won't find a relationship.
General Evolution of Research
1. Descriptive research 2. Comparative research 3. Experimental research 4. Training designs
Type 1 Error
Rejecting a true null hypothesis; Saying there is a significant difference when there isn't a difference
consequential validity
The way in which the implementation of a test can affect the interpretability of test scores; the practical consequences of the introduction of a test
consequential validity
The way in which the implementation of a test can affect the interpretability of test scores; the practical consequences of the introduction of a test (unintended or intended)
Population
Theoretical group of elements to which we intend to generalize results
Accuracy
Variability is reduced with a larger sample size. Reduced variability leads to increased accuracy.
Bivariate correlational design
Correlational, looks for relationships between variables, also called "zero-order" correlation
Endogenous variable
DV, have arrows pointing to them in a graphic model of the theory
Cross-sectional research design
Data collected at one point in time, giving a snapshot, i.e., comparing Freshmen in 2012 with Seniors in 2012; Can be strengthened by matching for important variables
Factor analysis
a statistical procedure that identifies clusters of related items (called factors) on a test(create subscales); used to identify different dimensions of performance that underlie one's total score
test-criterion relationship
includes concurrent and predictive validity; these were old validity terms that now fit here under evidence based on relations to other variables.
Parametric tests
independent samples t-test, dependent samples t-test, ANOVA, repeated measures ANOVA
Research question
indicates the logic of design, variables, and an indication of sample; should be succinct, clear and complete; usually several RQs in a single study
Sampling error
is directly and inversely related to the sample size and homogeneity of the sample; the bigger the sample, the smaller the error; the more homogeneous the sample, the smaller the error (?)
IV
manipulated variable, the intervention variable
Multiple regression
measure of linear relationship based on several independent variables and a single dependent variable; used to predict and explain; uses a regression equation model (model+error=outcome); an inferential test of statistical significance; determination of the percentage of variation in the outcome that can be predicted by the independent variables--variance is explained by the model/total variance.
Nonparametric tests
median test, Mann-Whitney U (rank ordered data), Wilcoxan Matched Pairs Signed Ranks test, Kruskal-Wallis H test (rank ordered data), Friedman's test (ANOVA of ranks)
Criterion-referenced tests
negatively skewed, restricted range, standard setting problems. (e.g., state by state, NCLB)
Assumptions for parametric tests
normality, homogeneity, interval level variables
construct irrelevant variance
occurs when scores are influenced by factors irrelevant to the construct; measure capture extra "stuff" that is irrelevant to the construct being examined (example--test anxiety that hinders performance on a math test)
DV
outcome variable, usually continuous
Norm-referenced tests
performance reported as compared to normed group; Who is the norm group? That's important; (SAT, Stanford); tends to give general measures; it's difficult to show changes in norm-referenced tests.
Mixed methods-exploratory
qualitative and then quantitative
Mixed methods--explanatory
quantitative and then qualitative
Research hypothesis
succinct, specific statement that indicates a testable prediction about the nature of the relationship between two or more variables, i.e., "There is a positive relationships between ____ and ____." "Students in X condition will demonstrate more Y than students in Z condition."
What are the components of statistical significance?
sample size, difference in the means, and standard error of the means
Beta weight
standardized regression coefficient; used to describe skewness and kurtosis
Standard error of measurement
the difference between the actual score and the highest or lowest hypothetical score; unrelated to the accuracy of scoring. (from VA DOE handout)
Measurement
the process of assigning numbers or looking at a variable quantitatively (usually with a scale)
dummy variable
the way a dichotomous independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other; identify a group (i.e., 0 for male and 1 for female)
Goal of measurement
to capture the DV with PRECISION, SUFFICIENT VARIABILITY, and SENSITIVITY to investigate relationships and/or differences
Histogram
used for continuous categories, i.e., 1-3, 4-6., 7-9
Bar chart
used for discrete categories, i.e., chocolate, vanilla, strawberry
Confounding variable
variable that can't be separated out from the levels of IV; a function of the experimental design
Moderating variable
variable that changes (intensifies, weakens or reverses) the nature of the relationship between two other variables.
Systematic sampling
when you have a list and you select every nth person on the list, (every 9th, 10th, etc.); This is just as good as SRS and even better if you can rank order your list.
***Advantages of SEM over ANOVA & regression
-can include observed and latent variables and relationshipsamong latent constructs can be examined -Several DVs can be studied in a single analysis -Equation residuals can be correlated in SEM
Generalizability Theory/IRT (Measurement error)
...
To increase variability in DV measure...
...use a scale with more choices (i.e., 7 point scale instead of 3 point scale) ....if you must use dichotomous questions, use MORE questions.
True Experimental design
with randomization or random assignment
AERA
American Educational Research Association
Method
Consider Sampling, Instrumentation and Procedures/Intervention; Method--SIP
Types of evidence that rule out random sources of error
1. stability (test-retest) 2. equivalence (alternate-form) 3. stability and equivalence 4. internal consistency 5. agreement (inter-tester; inter-rater) 6. generalizability theory
Rules of thumb for sample size
1. Comparative research--15-20 in each group 2. Correlational research--at least 30 for bivariate 3. Multiple regression--60+30 for each new variable 4. Experimental--15+ (Need more for applied research; Need 8 per group with random assignment and homogeneity of group)
Types of Nonprobability Sampling
1. Convenience (available; haphazard) 2. Purposeful (purposive; judgmental) 3. Quota (keep sampling until you get what you need, i.e., enough male teachers) 4. Volunteers
Types of nonexperimental research
1. Descriptive 2. Comparative (differences...What is difference between _____ and ____? 3. Correlational (relationships...can be bivariate, multivariate, predictions, correlational path analysis) 4. Causal comparative or ex post facto
Principles of scientific, evidence-based inquiry
1. Pose good questions that are testable (can get empirically-based answers) and that impact knowledge 2. Link to theory 3. Methods appropriate to RQs 4. Coherent, explicit chain of reasoning 5. Replicate and generalize appropriately 6. Disclose and dissiminate
How do you know that you are using a measurement with precision, sufficient variability and sensitivity?
1. Read the literature. 2. Pilot test.
Types of Probability Sampling
1. SRS 2. Systematic 3. Stratified 4. Cluster
How to design research to maximize differences and/or relationships
1. Select variables that will be sufficiently sensitive. Sometimes the more specific the concept variable the better (i.e. academic self concept vs. self-concept) 2. Develop/select measures that provide variability. 3. Select samples that provide high variability for targeted variables.
Types of longitudinal research
1. Trend--selecting samples from a changing population (having 5th grade teachers complete survey every year--won't be the same teachers, but it will always be 5th grade teachers and show trends of 5th grade teachers) 2. Cohort--Selecting samples from the same population--A sample of 1999 grads one year and another sample of 1999 grads every five years thereafter. 3. Panel--using the same sample throughout--the same participants over time/high attrition rate
Power is related to...
1. effect size 2. n (sample size) 3. p-value
External sources of measurement error
1. procedures, items, context (M&J "Random"); 2. bias from researchers administering measure; 3. observer bias
Types of regression
1. simultaneous--enter model into SPSS 2. Stepwise--most common; as variable goes in, the computer enters variable that is most related and then moves to the next variable 3. Heirarchical--research determines order of variable entry and controls/adjusts for effect of that variable 4. Logistical--has to do with the odds of something happening; dichotomous DV 5. Discriminant function-dichotomouse DV; used to determine if student will fit into certain group or not.
Evidence for reliability
1. stability (high correlation between test--restest score) 2. equivalence (high correlation between alternate form of measurement given) 3. internal consistency (most commonly used evidence for reliability; use Cronbach's alpha to report reliability between test items) 4. agreement--(inter-tester and inter-rater reliability) 5. generalizability theory--reliability is seen as a characteristic of the use of the test scores rather than a property of the test itself
Type 2 Error
Accepting a false null hypothesis; Failing to find a difference when there is a difference.
What is purpose of SEM?
Allows you to study and test complex relationships among variables where variables may be observed or unobserved; model-based approach--we can test theoretical models to evaluate their validity, see if the theorized model fits what happens in the real world. Structural Equation Modeling is used to determine whether a hypothesized theoretical model is consist with data collected; model is hypothesized apriori; SEM confirms a model; it evaluates the measurement model and the path model; sometimes called causal modeling
Structural Equation Modeling
Also known as latent variable modeling; Provides data on fit between theory and model; Can incorporate latent variables; Variables not measured, only approximated Included in the SEM diagram; Need LISREL software; used a lot in EdPsych
Evaluation (as related to measurement)
Determining the meaning of the measurement numbers; determining the merit/worth of the measurement
Stratified sampling
Divide population first and then select from the separate groups. This can be used to enhance accuracy of estimates. If you stratify and select from each group, you reduce standard error. Advantages of stratified sampling are 1) more representative sample and 2) reduced standard error.
Effect size rule of thumb...
Effect size should be .33 or greater.
Causal Comparison Design
Existing groups that experience different interventions that are not controlled by experimenter; Looks like an experiement; Natural experiment with intervention, but the intervention was not controlled by the researcher
About standardization...
Greater flexibility almost inevitably increases measurement error, but the sacrifice in reliability may reduce construct irrelevance or construct underrepresentation in the assessment, which may improve validity. Hmmmmm......
Test-criterion relationship
How accurately do test scores PREDICT criterion performance? This is used as evidence for validity.
Operational definition
How the conceptual definition is measured for the study, i.e, the specific scale used
exogenous variable
IV, arrow point away from them to the DV/endogenous variable
Internal sources of measurement error
within the subject subject bias social desirability luck health demand characteristics (M&J "Participant")
Consequences of using a sample that is too small...
Increased chance of high variability in the sample. Increased chance of committing a Type 2 error (fail to find an exisiting difference)
Proportional stratified sampling
Insure proportionate representation of specific variables in sample. For example, if 75% of teachers are female, and you insure 75% of your sample is female
Ex post facto design
Like an experimental design, except that it already happened, so researcher is studying after the fact, did not create/plan/control experiment; Existing groups with different "interventions" in the past
Classical Test Theory (Measurement error)
Observed score = true score + error (internal and external error) + bias
Extraneous variable
Outside variable that may affect the dependent variable, i.e. lighting, noise, etc. Outside of the experiment design
Quasi experimental design
without randomization or random assignment
Benefits of SEM
Structural Equation Modeling can be used 1) to investigate directional influence and causal relations of multiple variables 2) to study the relationship among latent constructs that are indicated by multiple measures; 3) with experimental and non-experimental data 4) cross-sectional or longitudinal data
Survey or Study Population
The accessible population; population from whom you drew your sample. For example, you selected your sample from elementary teachers in Hanover County, Virginia.
Population size
The bigger the population, the lower the sample size required. Think of rent chart example in notes.
Validity
The degree to which evidence and theory support the interpretations of the test scores; need sound scientific base for the proposed score interpretations; tests can be used/interpreted in more than one way and each way must be validated
Sampling error
The difference between the sample statistic and the population parameter; the degree to which your sample results do not accurately reflect the population reality
Construct validity
The extent to which there is evidence that a test measures a particular hypothetical construct; Researchers look for evidence based on test content, response processes (how did the participant answer--was the answer based on the construct being measured or something else? For example, on mathematical reasoning test, did the participant demonstrate math reasoning or simply utilize a memorized algorithm? On a test to measure extroversion and introversion, were the responses influenced by social conformity?)
Number of Variables Studied
The mor variables you study, the more subjects you need.
Power
The power of your results is directly and positively related to sample size and homogeneity of sample; The higher the power, the greater faith you have in failing to reject the null (i.e., your nonsignificant results become more important)
predictive validity
The success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior. (Fits under test-criterion relationship and evidence based on relations to other variables)
Correlation does not imply causation!!!
True experimental design is needed to demonstrate causation.
Logistic Regression
Typically dichotomous DV with multiple IV with fewer assumptions and odd ratio results; Involving outcomes that are categorical. The dependent/criterion variable only has two values - the occurrence or nonoccurrence of an event (or presence/absence of a condition) typically coded 0, 1. The independent/predictive variables can be continuous, ordinal or categorical.
Discriminant function analysis
Typically dichotomous DV with multiple IV; Is used to determine which variables discriminate between two or more naturally occurring groups. Discriminant function analysis is multivariate analysis of variance (MANOVA) reversed. In MANOVA, the independent variables are the groups and the dependent variables are the predictors. In DA, the independent variables are the predictors and the dependent variables are the groups. As previously mentioned, DA is usually used to predict membership in naturally occurring groups. It answers the question: can a combination of variables be used to predict group membership? Usually, several variables are included in a study to see which ones contribute to the discrimination between groups.
Sensitivity of measurement
Use a measure that has the possibility of showing relationships; Consider everything that contributes to the variability; Established standardized instruments may be less sensitive to specific DV in the study.
Intervening or mediating variables
Variables inside the individual (such as thoughts, feelings, or psychological responses) that come between the stimulus and a response). Not measured, but helpful in explaining why something is happening.
latent variable
Variables which aren't directly observed, but inferred by responses to a number of other variables that serve as indicators (e.g. extraversion, intelligence)
Conceptual definition
What the researcher is trying to measure, the abstract concept, what the concept really is, really means
Collinearity
When IVs are highly related. For multiple regression, there is the assumption that IVs are NOT highly related.
Disproportionate stratified sampling
When you purposefully tweak the proportionate representation of a specific variable in your sample. For example, if you want to compare gender differences between teachers, you may need to use diproportionate stratified sample to get enough males in your sample (50/50) even though there are many more female teachers than males in the population.
Mixed design
a design with within subjects factor and between subjects factor
Research problem statement
a single sentence that indicates in general language what will be researched
multiple regression
a statistical technique that predicts values of one variable on the basis of two or more other variables; The great value of multiple regression is in the ability to predict one score based on multiple other scores; In multiple regression, an independent variable is often called a predictor and the dependent variable is called the criterion. Ideally, the IVs are independent of one another, although this is seldom completely true. When IVs correlate, it is said that there is multicollinearity, or just collinearity. Example:
Correlation matrix
a table presenting the correlations among several variables
Mediator variable in a path model
a variable that serves as BOTH a source variable and a result variable; it affects AND is affected by other variables in the model
generalizability theory
an alternate view of reliability, where reliability is seen as a characteristic of the use of the test scores, rather than a property of the scores themselves; attempts to answer the question: "in what situations/conditions is this test reliable?"; examines sources of consistency and inconsistency in test scores (using ANOVA) and attempts to identify and label any systematic sources of error or interactions between error sources; considers the use of the test across different settings looking at systematic error; not looking for overall reliability; will say that the test is reliable in these specific settings and these specific populations
Path analysis models
an extension of multiple regression in that it involves various multiple regression models or equations that are estimated simultaneously; can be used to examine mediation effects; can be used to examine causal hypotheses between IV and DV
parametric tests
assume a normal distribution, selection of participant is independent, data must represent interval and ratio scale, have more power than non parametric tests
SEM is sometimes called...
causal modeling covariance structure analysis path analysis
Single subject design
experimental design, but not group data
construct underrepresentation
failure to capture important components of a construct; part of the construct is not covered by the measure
Measurement error
hypothetical difference between observed score and true/universe score; random and unpredictable