Statistics - Midterm
Different measures of effect size
- Cohen's D - do not say "small medium or large" when interpreting. It's pretty much the same as a z score, but we'll compute with an online calculator. - For ANOVA - Eta Sq (same as R sq) vs Partial Eta Sq
Different ways to increase power:
- Increase sample size - If sample size is larger, power will be higher because sampling distribution will have smaller SD (i.e. smaller standard error) - Increase alpha (not recommended) - Increase treatment effect (instead of 1x per week intervention, 2x per week intervention - Within subject design
1. What are the four levels of measurement and how are they different?
- Nominal - Numbers represent names of categories or groups. This is used to operationalize categorical variables, they do not contain information about quantity - Ordinal - Indexes degree/quantity, but only in a limited way. These numbers tell us that two scores are different, and that one is more/less than the other (but not by how much). - Interval - Used to quantify variables with magnitude information (which score is more or less, and by how much). Mathematical calculations are now possible at the interval scale. Creator can choose which values are included on the scale - Ratio - Has the absolute zero, meaning that a certain value on the scale always carries the same meaning, as the value of zero is always the same. Example: Height. 6 feet always means the same thing Note - SPSS does not differentiate between interval and ratio scales, it lumps them together as "scale"
1. List the steps of the NHST
- State the null and research hypotheses (directional or non-directional) - State the alpha level - Note the characteristics of the sampling distribution of the null hypothesis - Find out the location of the sample mean in the sampling distribution (compare the critical value and the calculated statistic) - Retain or reject the null hypothesis by comparing the probability to the alpha or comparing the z-score of the mean to the critical values
Levene's Test for Equality of Variances
- Test for homogeneity of variance - Can be too sensitive, and be statistically significant even when variances are not vastly different - F-max test can be used when Levene's test is significant and the sample sizes are not vastly different (ch. 3 - don't need to know this yet)
central limit theorem
- The larger your sampling distribution (N) of the means, the more your distribution will be normal. When the sample is larger than 30, the distribution of means should be near perfectly normal. If the raw score distribution is normal, sampling distribution of means will be normal regardless of number of scores in each sample.
1. What are the four characteristics of the mean?
- The mean is sensitive to all scores in a distribution - The mean is too sensitive to extreme scores - The sum of deviation scores from the mean is always zero. A deviation score is a difference between the mean and a single score from the distribution. - The sum of the squared deviation scores is minimal when it is calculated from the mean - a way to compensate for the seesaw effect in calculating the deviation scores total (which is always 0, rule #3). By squaring them, you can then add them (to a value other than 0), and see how the scores and their squares are different from each other in total
Assumptions of One-way ANOVA
- The scores (or errors) are independent and normally distributed (same as independent samples t-test because only two sample means are involved) - Homogeneity of variance
1-2-3 rule
68% of scores in a normal distribution fall within 1 standard deviation above and below the mean. 95% of score fall within 2 standard deviations, 99% of scores fall within 3 standard deviations
1. How do you decide between applying the directional and non-directional hypotheses?
A directional hypothesis is appropriate when the researcher has reasons to expect an increase or decrease of the mean (either one). These may be more appropriate if researchers do not have strong evidence that there may be a directional effect (i.e. that the effect will occur in a positive or negative direction only)
How can we directly test our research hypothesis using statistics? Use an example to explain
A scientific hypothesis can be directly tested by replicating the experiment over and over, on as many samples as possible, so that the results resemble the population as closely as possible. According to the central limit theory, this will ensure a normal distribution, as the sample will be large. An example of this is political polling. If you want to confirm the hypothesis that a certain political candidate is well-liked by the public, you should replicate the poll in as many samples as possible to get as accurate a representation of the population as possible. First, the poll should be conducted on one sample of individuals. Then, if the results confirm your hypothesis, the poll should be replicated on as many new samples as possible.
1. What is the meaning of adding a predictor to a statistical model according to GLM?
According to GLM principles, statistical analysis is an approximation of data, done in an attempt to explain population phenomena. In order to analyze a sample when no other data is available, the sample mean can be used as an estimate. However, to make these estimates more accurate than the mean alone, predictors (additional information) are added to the analysis, through statistical modeling. Statistical models can be simple, with only one predictor, or complex, with several predictors. Regardless of the number of predictors included in the model, the model is never perfect, and will always contain errors. However, a model with a few errors can still provide a good prediction of the sample. In its simplest form, GLM can be described as Data = Model + Error.
1. How are planned comparisons and post hoc comparisons different? How does a researcher make a decision between the two?
After an F-test finds pairwise differences to be significant, post hoc analyses are used to minimize accidental findings of these significant differences, and ensure their accuracy. When researchers use the F-test followed by post-hoc analyses, it typically means that they are not sure where they will see group differences. This reflects a more exploratory research approach, without specific hypotheses. However, if researchers have reviewed prior literature and have formed a hypothesis for where they will find group differences, planned comparisons are used. Researchers set up planned comparisons before running the analysis to test these specific hypotheses
Pooled SD
Average of the two sample SDs, is used to calculate Cohen's d (how many standard deviations away your sample mean is from the population mean - a way of interpreting effect size
1. The GLM is an integrated approach to all statistical analyses. Why haven't we utilized it earlier?
Because historically in the social sciences, only special circumstances of the GLM were needed. So, simpler procedures were developed so that the GLM did not have to be used. This resulted in the integrated GLM approach not being taught in social science courses, as it would not be frequently used.
1. Why is median sometimes preferred to mean?
Because the mean is too sensitive to extreme scores, but the median is always the center-most score, the 50th percentile. It can sometimes provide a more clear representation of the center point of a distribution, if the mean is affected by outliers in the data
1. What is a normal distribution, how do z-scores relate?
Bell curve, a perfectly balanced range of scores where mean, median, mode are all the same. Many datasets fall into this shape, and can be easily understood as a result, as to where each score falls in relation to mean and standard deviation. The scores can easily be converted into z-scores (standardized), and easily interpreted
1. Based on this chapter, why were Cortina and Landis (2011) skeptical about the effect size interpretations based on Cohen's recommendations? What could be a reasonable way of interpreting effect sizes?
Cohen's d is a way of estimating effect size, particularly useful for z-tests. It is the difference between the population and sample means, divided by population standard deviation. Because dogmatic use of effect sizes lead to misinterpretations of data, without considerations for context and the scientific hypotheses. These researchers criticize that overuse of effect size estimates transitioning common data analysis practices from a dichotomy (Statistically significant vs. Not significant as in NHST) to a trichotomy (small, medium, large effect size), which leads to the misinterpretations that Cohen sought to avoid. Researchers can avoid misinterpretations by not viewing cohen's d or NHST as easy answers in data analysis. They can be informative, but are not the final answer in analysis.
1. What are possible practical advantages of the confidence intervals over the NHST?
Confidence intervals are a dependable way to tell if a score will be significantly different than the sample mean. They provide an advantage over NHST tests in interpreting distributions because they more clearly communicate the uncertainty in making inferences on population statistics. They provide a range of scores rather than a specific value, which helps illustrate uncertainty in statistical estimation. They also utilize raw scores, which can be easier to interpret than critical scores.
How is the confidence interval different from the NHST? How is it similar?
Confidence intervals are the raw score form of critical values. They are an easy way to tell if a score will be statistically significantly different from the sampling distribution mean or not, as any score that falls within the interval will not be significantly different. Confidence intervals provide a range of scores that are not statistically different from the mean as a basis for determining the significance of a single score, whereas the NHST utilizes one test statistic and one p-value as the basis for determining statistical significance, and rejecting or maintaining a null hypothesis.
1. Compare correlation to regression
Correlation describes the relationship between two variables in a general way, and is the basis for all statistical analyses. Regression is a more complex analysis of variables' relationships, but is built and interpreted through correlation coefficients. Specifically, simple regression involves one independent variable (predictor) and one dependent variable. The main purpose of simple regression is to predict the values of the dependent variable based on the values of the predictor
Why is the concept of the degrees of freedom important in the t-test but not in the z-test?
Degrees of freedom represent the true sample size needed to calculate the sample standard deviation (n-1) and sample mean. T-distributions rely upon the sample standard deviation to interpret the data. Because the quality of the sample standard deviation depends on the number of scores in a sample, it is important to use degrees of freedom to make the best inferences on the T-distribution. Degrees of freedom also help account for changes in sample size when interpreting the distribution. With normal distributions and Z-tests, degrees of freedom are not as important because we can already assume the population mean and standard deviation because they represent the true values for that entire population of interest, and are not influenced by sample size.
1. What can be done to minimize the inaccuracies of correlation estimates from a sample?
Ensure the reliability of measures being used, this will minimize the amount of errors in the sample of scores, and minimize fluctuations in data
Kurtosis
Has influence on higher level statistical analyses, including structural equation modeling. Simple definition is "sharpness of the distribution
Skewness
How much a distribution is deviant from symmetry. One tail is longer than the other, the direction of the longer tail indicates the sign of skewness. These are caused by outliers in the dataset, and thus cast doubt on usefulness of the mean
Z-score & skewness
If Z is greater than 2.58, then the distribution is skewed (sample less than 100) If Z is greater than 3.13, then distribution is skewed (sample greater than 100)
1. Explain the sources of the two variances (effect and error variances) in F score
In order to calculate the F-score, we must calculate variance of sample mean differences, and variance of errors/chance factors. Variance is used in this distribution instead of standard deviation because variance is easier to calculate by hand. In order to calculate variance of sample mean differences, the numerator of the T-score equation (sample mean - population mean) is squared. This calculates the variance between the groups being studied, also called effect. In order to calculate variance of errors, the denominator of the T-score equation (standard error) is squared. This calculates the variance, or error, within the groups being studied. Both variances utilize deviation scores when being calculated, so both are subject to issues regarding degrees of freedom. This differs from the T-score, where only the denominator is dependent on degrees of freedom.
1. Why is variance sometimes preferred to standard deviation in analyses?
It is sometimes used to detect small changes in a distribution, because it is more sensitive to small changes than SD. It is also quicker to use when doing hand calculations
1. What could be the advantages of using the NHST, given the limitations mentioned?
It provides a starting point for researchers to test initial hypotheses and findings in their sample, to see if they should continue with deeper analyses on a specific effect or relationship. Helps eliminate the idea that your results are due purely to chance.
1. How are the mean and regression like each other?
Mean and regression have several characteristics in common, the first being that they are both sensitive to extreme scores, and therefore they help examine outliers. One of the four main characteristics of mean is that it is too sensitive to extreme scores. Therefore, when calculating the mean, it is important to note that an extreme score in a sample can affect the mean, which would cause it to no longer be a true representation of that sample. Similarly, in regression, a least squares regression line, or "mean line" is used to summarize the data points on a scatterplot, and examine the outliers in a sample. This line represents the mean of a dataset, and also similar to mean, each score's distance away from this line is its standard deviation, however it is called "residual" in the regression context. The sum of all the scores' residuals is 0, similar to mean and standard deviations. Another similarity between regression and mean is that in order to avoid a seesaw effect when calculating the total residual score in a regression, they are squared, just as standard deviation scores are, which minimizes the sum of squared residuals
1. What is the homogeneity of variance assumption?
One of the assumptions of one-way ANOVA, it states that the variances of the scores in each of the samples (being examined in one-way ANOVA) have the same variance. This is because these samples are assumed to be random samples from the same population according to the null hypothesis. As a result they are expected to have equal variances.
1. What are possible reasons of the paired samples situation?
Possible situations when you could use a paired samples t-test would be if you had two sets of scores or data from the same source (ex - pre- and post-test), or natural pairs (twins, couples, etc.), or matched pairs through a matching procedure. Because of pairing, the scores in a pair are considered related and dependent to each other. The pairs are converted into one set of difference scores during the paired sample t-test by deducting one score from the other.
1. Why is the overall F-test necessary before any post hoc comparison?
Post hoc comparisons are only performed if the initial F-test is statistically significant. This is why the F-test is necessary before post hoc comparisons. Following this rule helps minimize accidental findings of statistically significant pairwise differences, as it establishes overall statistical significance of the group differences first.
1. What are the factors influencing the power of a result? What difference might knowing about statistical power make in interpreting the findings of a research study?
Power is 1 - Beta. Your ability to be accurate and sensitive to stating what you want to state in the results.
How to calculate Standard Error (S) (SE)
SD of sampling distribution of means (Standard Error) = SD/sq root of N
What are the similarities and differences between z-test and one sample t-test?
Similarities: Both distributions are based on an unlimited random sampling of groups with a specific sample size from the population. Also, the means of both distributions are expected to be the same as the population mean. Differences: The z-distribution is perfectly normal whereas the t-distribution is not. The shape of the t-distribution depends on the sample size, because the quality of the sample SD as a substitute for the population SD depends on the sample size. When the sample gets bigger, the sample should improve, which will make the t-distribution more normal. Also, in t-distributions, we don't have the population standard deviation, so we replace it with sample standard deviation. The critical values are higher, the N sizes are smaller, and null is harder to reject.
1. Standard deviation, variance....
Standard deviation - It is the average amount of deviation of the scores from the mean, calculated using the sum of squared deviation scores (SS, sum of squares) and variance. Variance - Need to calculate this in order to find standard deviation. It is the SD squared, similar measure of how scores differ from the mean
1. Describe the relationship between t and F scores
T tests are useful in comparing two independent sample scores, whereas one-way ANOVA and F scores compare two or more sample means. When there are more than two scores, T tests do not work, because when you try to run multiple t-tests for all possible pairwise differences, the alpha level becomes inflated (because they are all accumulated). So, the alpha level for the comparison of the three means will be higher than the set alpha level. One-way ANOVA and F scores avoid the inflated alpha level by comping multiple sample means at once.
What are the differences between the null hypothesis distribution and the alternative hypothesis distribution?
The alternative hypothesis distribution (also called the research hypothesis distribution) is a process that assumes the research hypothesis is true, and each replication of a trial will produce the estimated (hypothesized) effect. This mean of the alternative hypothesis distribution would provide the best estimate of the true effect size. This distribution is only possible when the investigation has been done previously, and the results have been replicated/repeated empirically. The null hypothesis distribution is used when we do not have any previous sampling data that demonstrates our hypothesized effect. So instead of testing our hypothesis (the existence of some effect or relationship), we test the opposite of our hypothesis, that no effect or relationship exists. These two distributions are mutually exclusive.
1. What are the four conditions needed to make sound causal inferences? Why is having only correlation not enough to make causal inference?
The four conditions that must be met in order to make sound causal inferences... - Temporal precedence means that the cause precedes the effect. Because researchers often collect data on the hypothesized cause and effect at the same time, this precedence is not always clear. However, the direction of effect must be established in order to have a clear causal relationship. - Correlation/co-occurrence means that the hypothesized cause and effect tend to happen together, and that they have a strong relationship with one another. - Non-spuriousness means that there are no other possible causes that explain the relationship seen in the research. Alternative explanations weaken the causal link between the hypothesized cause and effect. For a strong causal inference, this must be ruled out. The causal mechanism is the process by which the cause leads to the effect, or the explanation for how the relationship happens. Without this explanation, the causal inference is incomplete.
1. Can you describe possible reasons why correlation coefficients from a sample do not provide an accurate estimate of the true relationships between the variables being studied?
The four factors affecting the size/accuracy of correlation coefficients - Range restriction - the full score range is not included in the sample (ex - SAT scores being examined against GPA at a very good college. These students will not have low SAT scores, so low SAT scores are being excluded from the sample) - Imperfect reliability of measures - Curvilinear trend - data is not represented by a straight line - Part-whole relationship - attempts to correlate measures/variables that are too similar, or a measure that is part of another measure (ex - examining SAT verbal scores against overall SAT scores)
1. How can you explain the concept of the p-value to a person with no statistical background?
The p-value is the probability of getting a certain result due to random chance. For example, if your p-value = .001, it means that there is a .1% chance that your result is due to chance. P-values are used to measure the significance of results, and are often used in NHST tests as a way to either retain or reject the null hypothesis.
1. Why does the reference score in the paired samples t-test equal zero? Can it be a different value other than zero?
The reference score is typically zero because a null hypothesis assumes that there will be zero difference between the paired scores, or that no change will be seen.
1. What is the definition of the unstandardized regression coefficient?
The unstandardized regression coefficient is b, and it used the measurement unit of Y. In the correlation equation, this means that every 1 point change in X equals the change in Y. Example - Y = 2.3 + 15.5X, meaning that for every 1 point change in X, the amount of change in Y = 15.5. When the measurement unit of X and Y are meaningful, the unstandardized regression coefficient provides an easily understandable interpretation.
1. Why do we need z-scores?
They are standardized versions of distribution scores, which always carry the same meaning, making them easier to interpret within a dataset. This conversion doesn't change the shape of a distribution. They are most useful in a normal distribution, however can be used in other contexts as well.
1. What is the see-saw effect?
This is based off the idea that the sum of deviation scores from the mean is always 0 in a normal distribution (3rd rule of the mean). The mean is the exact balance point between all the deviation scores, which are all located on either the negative or positive side of the mean, but always add to 0.
1. How is the SD of a random sample related to the population SD? What happens when the sample size gets bigger?
Unless you have a huge number of scores in a sample, your sample SD will likely be smaller than the population SD. The larger the sample size is, the less difference between the sample SD and population SD. Because the quality of the sample SD as a substitute for the population SD will be lower, the shape of the sampling distribution will look less like a normal distribution, if the sample size is small.
Explain why the SD of the sampling distribution is smaller when the sample size gets larger
When a sample size gets larger, it more closely resembles the population, and the scores begin to fall closer to the mean. When the scores in a sample move closer to the mean, the standard deviation becomes smaller. When the standard deviation of a sample is small, it indicates that sample is a more accurate representation of the population, as most scores approach the mean.
What was the biggest difference between the unstandardized and standardized regression coefficients?
When both X and Y represent meaningful variables, unstandardized regression coefficients (b) are used as the units of measurement, because they provide an intuitive, easily understandable interpretation of the coefficients. This is possible because we have a shared meaning of the measurement unit X and Y. However, most scores in an analysis are not based on common measurement units, rather analyses incorporate scores with wide varieties of numerical characteristics. In order to make intuitive interpretations with arbitrary measurement units, the scores are transformed using regression analysis, and the standardized regression coefficient (Beta) is used. The biggest difference between the two regression coefficients is that standardized regression coefficients have a maximum score of 1(similar to the correlation coefficient), and that the variances of independent and dependent variables is 1. Therefore, standardized coefficients refer to how many standard deviations a dependent variable will change, per standard deviation increase in the predictor variable. Further, when an analysis has only one predictor (univariate regression), the standardized regression coefficient is the same as the correlation coefficient between the predictor and criterion variables, which makes it easily interpreted.
How do you decide on the "test value" in the one sample t-test?
You choose a test value based on the possible mean scores in your sample, and which one you want to use as a comparison for your data. For example, if you are looking at a sample of SAT scores, you would choose a possible SAT score as your test value. You could choose what is known to be the population mean (500), or you could choose to use a specific student's score (600). This selection should fit in with the questions you want to answer with your data.
How to calculate Z score
Z = x - sample mean/SE
Central tendency indicators
mean, median, mode, range
Variability
ways to describe dispersion of the data - range, deviation, standard deviation, variance (st. dev. Squared)