Stats I and II-2
Chi-Square
(X^2) Nonparametric inferential statistic used to evaluate the relationship between variables measured on a nominal scale Goodness-of-fit test→ whether there is a good fit between observed frequency and expected frequency The distribution is positively skewed •As degrees of freedom increase, the more it loses its skew o When you find your critical value, if the chi-square is greater than the CV, you reject the null; if it is less than the CV, you fail to reject the null. If we fail to reject the null, the two measures are independent of each other→no relation. If the findings are significant and we reject the null, then there is a significant association between the variables. Used to see whether there's a relationship between two categorical variables Compares the frequencies you observe in certain categories to the frequencies you might expect to get in those categories by chance.
Mixed Design aka Split-plot Design
A Mixed Design includes a between-subject and within-subjects factor in the same design. It allows evaluation of the effects of variables that cannot be effectively manipulated within-subjects.
Covariate
A covariate is a correlational variable in an experimental design (e.g., self-esteem). It is useful to include one because it allows experimenters to "subtract out" the influence of the covariance, reducing error variance. Thus, it makes the design more sensitive to the effects of the independent variable. It can also be used to rule out confounding variables.
Deductive-Statistical vs. Inductive-statistical explanations
A deductive-statistical (general to specific) explanation is based on universal laws that are absolute, whereas an inductive-statistical explanation contains uncertainty and probabilistic aspects that are not as absolute (e.g., most of psychology).
Random-model vs. Fixed-model ANOVA
A fixed-model ANOVA has a fixed variable across replications. Treatment levels are deliberately selected and will remain constant across replications. A random-model ANOVA has a varying variables across replications and normal sampling error. Increasing generalizability of study, but can substantially affect F values.
Joint probability vs. expected frequencies in X^2 test of independence
A joint probability is defined simply as the probability of the co-occurrence of two or more events. (e.g., Racial bias in death sentencing—the probability of being nonwhite and given the penalty and the probability of being white and given the penalty. )Given the two events, their joint probability is denoted p(A,B), just as we have used p(blue, green). If those two events are independent, then the probability of their joint occurrence can be found by using the multiplicative law, as we have just seen.
Variance
A measure of spread. The averaged square derivation from the mean
Difference between statistic and parameter
A parameter is a measurement that refers to an entire population (e.g., average self-esteem score). That same measure, when it is calculated from a sample that we have collected, is called a statistic (e.g., mean, variance, standard deviation). Parameters are the real areas of interest, and the corresponding statics are guesses at reality. The reality of interest is the corresponding population parameter. We want to infer something about the characteristics of the population (parameters) from what we know about the characteristics of the sample (statistics). This has implications for whether it is a parametric or nonparametric design because if you use parametric statistics you are then making assumptions about the population you pulled your sample from. Conversely, a non-parametric design makes no assumptions about the disribution of scores underlying your sample
t-test for correlated samples
A parametric inferential stat used to compare the means of two samples in a matched-pairs or a within-subjects design in order to assess the probability that the two samples came from populations having the same mean
Representative Research Design
A representative research design is having a sample from both subjects and stimuli and it is useful when generalizing beyond your subjects and/or stimuli.
Simple Main Effect
A simple main effect is comparing levels of one variable in a level of another variable. Further, it is the effect of one independent variable within one level of a second independent variable.
Parametric Statistic
A stat that makes assumptions about the nature of an underlying population (e.g., that scores are normally distributed)
CTT
CTT is the assumption (Spearman Brown) that the longer the test, the more reliable it is in terms of gaging ability/performance.
Modus ponens
Affirms the antecedent If John is intelligent, then he is rich, John is intelligent. Therefore, John is rich.
Effect Size
Amount by which a given experimental manipulation changes the value of the DV in the population, expressed in sd units
Moderation in Multiple Regression
An interpretation of when an IV influences the DV. A moderator changes the relationship between an IV and a DV. A significant interaction between the moderator and the IV indicates that the effect of the IV on the DV changes depending on the moderator.
Orthogonal Rotation vs. Oblique Rotation
An orthogonal rotation assumes that measures and factors are uncorrelated (it rotates the axis so that the verticies remained perpendicular to each other). An oblique rotation assumes that measures and factors are correlated by rotating the axis so the verticies can be at any angle (except for 90 degrees). Thus, it allows factors to be correlated. These are used in factor analysis to make factors more distinct and easier to interpret.
Extraneous Variable
Any variable that is not systematically manipulated in an experiment but that still may affect the behavior being observed.
Steps in hypothesis testing
Begin with a research hypothesis. Set up the null hypothesis. Construct the sampling distribution of the particular statistic on the assumption that the null hypothesis is true. Collect some data. Compare the sample statistic to that distribution. Reject or fail to reject the null hypothesis.
Falsificationism or antipositivism
By Popper: Theories must be falsifiable by experience to be considered scientific
Cohort-sequential design
Cohort-sequential design: a combination of a cross-sequential design and longitudinal design. It allows researchers to test for generation effects. However, it doesn't eliminate the effects that generation may have.
Correlation
Correlation, r, measures the linear association between two quantitative variables. Correlation measures the strength of a linear relationship only. Describing the situations in which both X and Y are random variables No sampling error is involved in X, and repeated replications of the experiment will involve the same set of X values. Goal: obtaining a measure of the degree of relationship
Cross-sequential design
Cross-sequential design: when several cohorts are observed over several various periods of time. However, it does not take into account the effect of age.
Parametric Design
Experimental design in which the amount of the IV is systematically varied across several levels
Nested Design
Experimental design with within-subjects factor in which different levels of on IV are included under each level of a between-subjects factor
Nonparametric Design
Experimental research design in which levels of the IV are represented by different categories rather than different amounts
Experimenter bias
Experimenter bias is when the behavior of the researcher influences the results of a study. It can stem from two sources: expectancy effects (performance of participant) and/or differential treatment of subjects across groups.
Type I Error
Data says null is false, reality says it is true resulting in false positive p=alpha When true and true p=1-alpha
Type II Error
Data says null is true, reality says it is false resulting in false negative p=beta When false and false p=1-beta
Modus tollens
Denies the consequent If the sun is shining, then I am hot. I am not hot, therefore the sun is not shining.
Regression
Describing the situations in which the value of X is fixed or specified apriori The X's and Y's vary from one replication to another and sampling error is involved in both Goal: understanding relationships rather than predicting outcomes Predicting one variable from knowledge of the other
Discovery and Justification
Discovery: initial thinking, plausibility, acceptability Justification: evaluation, defense, and confirmation of the idea
Error Variance
Error variance is shown to be an unbiased estimate of the corresponding parameter in the population; the variability among scores not caused by the independent variable Sources of error variance: individual differences among subjects; environmental conditions not constant across levels of the independent variable; Fluctuations in the physical/mental state of an individual subject Reducing error variance: extraneous variables constant by treating subjects as similarly as possible; matching subjects on crucial/specific characteristics
Assumptions of ANOVA
Homogeneity of Variance or Homoscedasticity: Population has same variance Normality: For each level of the within-subjects factor, the dependent variable must have a normally distributed around the mean Independence: Observation are independent of one another Sphericity: Difference scores computed between two levels of a within-subjects factor must have the same variance for the comparison of any two levels. (This assumption only applies if there are more than 2 levels of the independent variable.) Randomness: Cases should be derived from a random sample, and scores from different participants should be independent of each other.
When to use standard set coefficients
Homogeneity of Variance: each of our populations has the same variance; expected to occur if the effect of a treatment is to add a constant to everyone's scores.; Univariate test to be significant (p=less than .001) and Levene's to not be significant. Independence: scores/means are independent of one another so that you can attribute variance within the groups not due to treatment.
Assumptions of Correlation and Regression
Homogeneity of variance in arrays •Variance of Y for each value of X is constant in the population Normality in arrays •Assuming that in the population the values of Y corresponding to ny specified value of X (conditional array) are normally distributed around estimated Y.
IRT
IRT does a better job of predicting reliability of adding items and has been shown that because it is shorter, it can be more reliable than other tests.
Benefits of ANOVA
Like t, deals with different differences between or among sample means; unlike t, it imposes no restriction on the number of means. Instead of asking whether two means differ, we can ask whether three, four, five, or k means differ. Allows us to deal with two or more independent variables simultaneously, asking not only about the individual effects of each variable separately, but also about the interacting effects of two or more variables.
Internal vs. External Validity
In relation to an experiment, internal validity is the degree to which you can say that X caused Y. More specifically, this is done through eliminating experimenter bias by use of double blind studies and randomization of assignment to groups. On the other hand, external validity is done by random sampling of a population to ensure the sample accurately represents said population.
t-test for independent samples
Parametric inferential stat used to compare the means of two independent, random samples in order to assess the probability that the two samples came from population having the same mean
Main Effect
Independent effect of one IV in a factorial design on the DV. There are as many main effects as there are IVs. A main effect is the difference between or among marginal means, where the levels of the other independent variable are combined. Specifically, it is the mean difference between levels of one factor when collapsed across other factors.
Independent events
Independent events are when the occurrence or nonoccurrence has no effect on the occurrence or nonoccurrence of another.
t-test
Inferential stat used to evaluate the reliability of a difference between two means. Versions exist for between-subjects and within-subjects designs and for evaluating a difference between a sample mean and a population mean
ANOVA
Inferential statistic used to evaluate data from experiments with more than two levels of an IV or data rom multi factor experiments. Can do both within and between subjects
Controlling FWE with Bonferroni's
Manipulates the per comparison error rate (possible when making only a few comparisons), which leaves FW less than or equal to alpha. Holm's modification is preferred to the standard Bonferroni test because it is less conservative and more powerful.
Four types of causality with pencil example
Material: material used to make something; pencil: plastic, led, eraser Formal: idea of something; pencil: blueprint of design of pencil to manufacturer Efficient: bringing it about; pencil: workers and tools needed to construct/manufacture Final: why it exists; pencil: to write
Mean, Median, Mode in negative skew
Mean falls lower (less resistant to outliers) Median is in middle (more resistant than mean, but not as much as mode) Mode is highest
Three types of Central Tendency
Mean: Calculated from all of the data. The most common measure of central tendency. Used most often. Very susceptible to outliers. Median: Ignores most of the data. Refers to the point at or below which 50% of the scores fall when the data are arranged in numerical order. Not as susceptible to outliers as the mean Mode: Least useful/used. The most common score or the score obtained from the largest number of subjects. Least susceptible to outliers/more resistant and useful in telling readers/consumers what most of sample is doing.
R^2
Measure of amount of variability in the dependent measure accounted for by the best linear combination of predictor variable Measure of effect size R^2 is the correlation coefficient squared and is a standardized measure of strength and direction of a linear relation
Cohen's d
Measure of effect size. Mean difference in terms of standard deviations and is the most commonly used (ex. d=.8. Mean difference is 8/10 of a standard deviation unit) Mean difference/sd
Standard Error of Estimate
Measure of the accuracy of prediction in a linear regression analysis. It is a measure of the distance between the observed data points and least squares regression line
Measure of Central Tendency - Why?
Measures of central tendency are useful in that they help depict what is happening with various aspects of the data or represent the "center" of the distribution. They refer to the set of measures that reflect where on the scale the distribution is centered. These measures differ in how much use they make of the data, particularly of the extreme values, but they are all trying to tell us something about where the center of the distribution lies.
Pearson R
Most popular measure of correlation. Indicates magnitude and direction of a correlational relationship between two variables
Multilevel Modeling vs. Repeated Measures ANOVA
Multilevel doesn't require sphericity, can analyze missing data, takes design effect (sampling hierarchy into account) Problems with using ANOVA: sphericity assumption, design effect (sampling hierarchy), requirement for complete data and designs
Multiple Regression
Multivariate linear regression analysis see when you have a single criterion variable and multiple predictor variables.
Factor Analysis
Multivariate statistical technique that uses correlations between variables to determine the underlying dimensions (factors) represented by the variables
Mutually exclusive events
Mutually exclusive events are when the occurrence of one event precludes the occurrence of another.
Four scales of measurement
Nominal: categorical (e.g., sex, grade, political parties) Ordinal: Ordering people, objects, or events along some continuum where order matters (e.g., ranks in the Navy) Interval: Measurement scale that we can speak of differences between points and satisfies the properties of the nominal and ordinal scales (e.g., difference between 10 degrees and 20 degrees is the same between 80 degrees and 90 degrees). However, it does not have a true zero point. That is why we need ratio scales too. Ratios: Has a true zero point and satisfies the properties of the previously mentioned scales with the ability to speak in ratios (e.g., length, volume, time). An example of this: We generally think of an interval scale used in temperature (difference between 62 and 62 degrees is same as difference between 92 and 94 degrees), but it depends on what we are measuring. If measuring comfort, the numbers are no longer in interval scale. A person in a room of 62 degrees and then 64 degrees would be noticeable, but may not be noticeable at 92 and 94 degrees. The underlying variable being measured (comfort), not the numbers themselves are important in defining the scale.
Fitting line to normal curve vs. kernel density plot
Normal curve to data: superimpose a theoretical distribution to the data made up of only a few characteristics of the data (e.g., mean, standard deviation) and did not fit curve to the actual shape of the distribution. Individual data points and their distributions do not play a role in plotting it. Kernel Density plots do almost the exact opposite. They try to fit a smooth curve to the data while at the same time taking account of the fact that there is a lot of random noise in the observations that should not be allowed to distort the curve too much. They pay no attention to the mean and standard deviation of the observations. This produces an overall curve that fits the data quite well.
Assumptions of Regression
Normality in Array: the population values of Y corresponding to any specified value of X are normally distributed around estimated Y. Homogeneity of Variance in Arrays: the assumption that the variance of Y for each value of X is constant in the population.
Things that affect power
Power is a function of several variables: the probability of a Type I Error (alpha); the true alternative hypothesis (H1); sample size (N), and the particular test to be employed. Alpha: If we are willing to increase alpha, the cutoff point moves to the left, thus simultaneously decreasing beta and increasing power, although with a corresponding rise in the probability of Type I Error (false positive). H1: The chances of finding a difference depend on how large the difference actually is, power will increase, but chance of a Type II Error also increases (false negative). The bigger the effect, the better chance to detect it N: Because we are interested in means or differences between means, we are interested in the sampling distribution of the means. We know that the distribution of the mean decreases as either n increases or sample size decreases. This is also the same reason that small changes in the experimental design can influence power. Error: The more error, the more spread out the distribution is (more overlap); the less error, the closer together the distribution (pulled closer, less overlap). When SE decreases, the distributions tucks in around the mean Statistical test used (dependent measure test more powerful)
p value
Probability estimated from the data that an observed difference in sample values arose through sampling error. Must be less than or equal to chosen alpha level for difference to be statistically significant
Relational Research
Relates variables to one another
Reliability
Reliability: the consistency leading to reliability made up of test retest reliability and the reliability of test/measure components/internal consistency; assesses if a measure produces same/similar results if internal correlations are high. Parallel forms reliability establishes the reliability of a test/measure by administering parallel (alternative) forms of it repeatedly.
Partial Correlation in Multiple Regression
Removes the effect of a third variable from the other variables.
Semi-partial Correlation in Multiple Regression
Removes the effect of a third variable. However, it only does so from ONE of the other variables. These are used when two variables are influenced by a third variable.
Correlational Research
Research that uses no IV and uses two or more DV to identify possible correlational relationships
Least-squares Regression Line
Straight line, fit to data, that minimizes the sum of the squared distances between each data point and the line
Stratified Sampling
Stratified Sampling is used to obtain a representative sample. To do so, the population is decided by demographics and then selected randomly. However, this type of sampling may still lead to over or under representation.
Least powerful hoc test
Scheffe's test is the least powerful of all the post hoc tests we discussed because it allows you to keep the significance level at .05 for all comparisons—protecting against everything.
Descriptive Research
Seeks only to describe what is happening Useful in the early stages of research Not sufficient to answer the more interesting questions (e.g., why or how)
Qualities of Good Research Idea
Should agree with previous findings, Coherent, Parsimony, Falsifiability
Types of Regression
Simple (Standard): When all variables are entered into the regression equation at the same time. It is used when the following two are not Statistical: Composed of three types: forward, backward, and stepwise. ----Forward is when the regression equation starts empty and then the program "throws" in variables (based on significance) to account for most variance. -Backward starts full and then variables are removed, taking out any variable that is not contributing. -Stepwise starts empty and adds in, but may delete them later if found to not be contributing It is used when examining predictors is the aim. Adding variables increases the prediction of the event occurring. However, it tends to over-fit. It's good at explaining specific data, but not at generalizing findings. Hierarchical (sequential): When the order of the variables in the regression equation is determined by a theory or a model. It is used when you are working from a specific theoretical model.
Simple Random Sampling
Simple Random Sampling is when participants are randomly selected from the population. This type of sampling reduces systematic bias, but doesn't guarantee a representative sample in that same segments of the population may be underrepresented or overrepresented.
Solomon's Four Group Design
Solomon's four-group design is a variation of the basic pretest-posttest design. However, it adds two groups to the pretest-posttest (posttest group, treatment, posttest group). This allows you to evaluate the impact of pretest on posttest performance.
Logical Positivism
Sometimes called logical empiricism The only statements that are meaningful are those that can be verified by experience "Bucket theory of science" Empirical facts can verify most anything
Linear Regression
Stat technique used to determine the straight line that best fits a set of data
Calculating Error
Subtracting each score from the mean is the deviation Deviation = xi - mean Sum of Squared errors→ square each deviation and add them together to get the total error
Variance formula
Sum of squared deviations from the mean (sum of squares) divided by df
Systematic Sampling
Systematic Sampling is used with Stratified Sampling. It is done by selecting every person to sample after some xth person that is randomly selected. For example: Sample every 11th person in phone book starting on random part of page.
Assessing Reliability in a Questionnaire
Test-retest reliability, parallel form reliability, and split-half reliability (among others). Test-retest reliability requires multiple administrations of a test. However, it can be problematic if concepts being measured fluctuate over time and the high chance of participants recalling their responses from earlier administration of the test. To correct this, parallel form reliability uses an alternate form of the test to administer the second re-test. One of the best ways to split a test to account for this issue is split-half reliability. It is then assessed with only one administration of the test with half being correlated with items from the second half of a test.
Time-sequential design
Time-sequential design: considers age, cohort, and time of measurement. Within this design, participants are observed at different times. However, it does not consider cohort effects.
Definition of IV and DV
The IV is what we manipulate to see what effect, if ant, that it may have on the DV, what is being measured.
Prettest-Posttest Design
The basic pretest-posttest design is used to assess the impact of some change in performance. The pretest is administered before exposure to the treatment condition. This is a true experimental design. However, there are some issues in associated with it: -Can't counterbalance -Pretest sensitization Taking the pretest may influence the way participants perform in the experiment. There are two ways to fix this: Eliminating the pretest and using Solomon's four-group design
Central Limit Theory
The central limit theorem states that even if a population distribution is strongly non-normal, its sampling distribution of means will be approximately normal for large sample sizes (over 30). The central limit theorem makes it possible to use probabilities associated with the normal curve to answer questions about the means of sufficiently large samples. Mean for sampling dist. of mean is equal to Mew! Variance is equal to sampling dist.'s variance = population variance divided by n SD (Standard Error) of sampling dist. of mean = standard deviation of population divided square root of n Sampling distribution will approach the normal distribution as n approaches infinity •If n=30, sample is normal
Familywise Error (FWE)
The likelihood of making at least one Type I error (false pos) across a number of comparisons
Tolerance in Multiple Regression
The most common way used to evaluate collinearity. That is, collinearity is when predictors are correlated. High collinearity may cause issues; all variables should have a small tolerance=multiple collinearity.
Power
The probability of correctly rejecting a false H0 when a particular alternative hypothesis is true Power= 1-B Find effect size and then non centrality parameter (delta). As delta increases, power increases.
Semipartial Correlation in Multiple & Hierarchal Regressions
The squared semipartial correlation (sr^2) in a simple (standard) multiple regression indicates the amount by which the multiple R^2 is reduced if the IV were removed. The squared semipartial correlation (sr^2) in a hierarchical (sequential) regression indicates the amount of variability added to multiple R^2 by each IV. The value depends on when the IV is entered into the equation (i.e., IVs entered earlier hold more weight).
F Ratio
The test statistic computed when using an ANOVA. It is the ratio between-groups variance to within-groups variance
Conventionalism
Theories are conventions that help us organize the world Duhem-Quine thesis: it is impossible to falsify a theory because there is no such thing as a truly falsifying test
Compare theory, hypothesis, model
Theory is based more on general principles and described as a "large map" whereas hypotheses are more focused, but derives from theory. Hypotheses derive from theory and are testable, descried as a "small map." Models are a more specific implementation of theories in which hypotheses are tested.
Validity
Validity: is the degree to which a test/measure measures what it is supposed to. Content Validity requires that the tests/measures represent the material they should. Criterion Validity is the degree to which the tests/measures correlate with one or more outcome criteria.
Error Variance
Variability in the value of the DV that is related to extraneous variables and not the variability in the IV
Mediaton in Multiple Regression
When the IV and the mediator are correlated—thus there is a path that links the three variables together. Causal model—if mediator went away, IV would still likely lead to the DV.
Interaction
When the effect of on IV on DV in factorial deign changes over the levels of another IV An interaction is understood by examining the simple effects. It is when the mean difference between cells is different that what you'd expect based on your main effects
Proportional Reduction in Error and R^2
When we index error in terms of the sum of squared errors (when we do not use X to predict Y, error is SSofy. When we use X as the predictor, the error is SSreisdual. Because the value of R^2 can be seen to be the percentage by which error is reduced when X is used as the predictor.
Winsorized sample vs. Trimmed sample
Winsorized sample are closely related. The trimmed values are replaced by the most extreme value remaining in each tail. Thus, 10% would replace the two lowest values and the two highest values. A trimmed sample is defined as a sample from which a fixed percentage of the extreme values in each tail have been removed. Useful in dealing with extreme outliers.
Noncentrality parameter
is used to determine an estimate of how much a population mean and sample mean shift from zero such that the t distance shifts from zero when the null hypothesis is false. In other words, NCP tells you how wrong the null is. Power determines how likely we are to find a noncentral distribution that is greater than the critical value the t would have under the null. Describes the distribution of a test statistic when the null is false. This leads to their use in statistics, especially calculating statistical power. Aka delta. As delta increases, power increases. Composed of effect size and N. D increases when N increases.
Effect Size for ANOVA
r^2 represents how much of the overall variability in the dependent variable can be attributed to the treatment effect. Eta-squared (N^2) is the oldest and simplest measure of the strength of an experimental effect. Is an index of how different the scores in the complete data set are from one another. Omega-Squared (w^2) assesses magnitude of the experimental effect with balanced or equal n.