Research Methods Final
4 characteristics of power
1. Sample size 2. Effect Size- small effects means less power 3. Level of significance: .01-> 0.05, increase in alpha increases power 4. Balance- sample sizes equal to one another
What are the disadvantages of using two-tailed tests of significance?
A disadvantage of a two-tailed test is that sometimes you do not need to analyze the negative side of the tail for certain statistical analyses. Two tailed tests also have less power than one tailed tests when using p values, since you have to divide the p value by 2.
What does it mean to say that a statistic is biased?
A statistic is biased when it systematically misestimates the population parameter
Interpret a z score of -1.80.
A z score of -1.80 means that scores were about 1.80 standard deviations below the mean.
What's the difference between a z test and a z score? Make sure to talk about: a) what's being compared and b) the units of each z.
A z test measures how far a sample mean is from the population mean, either above or below. A z test is measured in standard error units. A z score is the number of standard deviations a score is from the mean, either above or below. The units of a z score are standard deviation units.
In study A, a researcher examines whether marital status (never married, married, other) is related to years of education. In study B, a research examines whether marital satisfaction (higher scores reflect more satisfaction, ranging from 0-7) is related to years of education. What type of questions are these?
A- difference B- associational
Of all the 95% confidence intervals for the population mean that you construct, about what percent will contain the true population mean?
About 95% will contain the true population mean
What's the difference between alpha and p?
Alpha, decision rule set a priori - how rare something has to be to reject the null; usually .05 p - observed probability (how rare it actually is)
What will the sampling distribution of the mean looks like? Assume your N is 100.
Approximately normally distributed
What question does a one-way ANOVA answer?
Are there group differences in some variable?
What question does a point biserial correlation answer?
Are there group differences in some variable?
What question is a paired samples t test answering?
Are there group differences in some variable?
What question are you answering with a independent samples t test?
Are there group differences in some variables?
Say you have a variable that represents the average score across several questions answered on a Likert scale. You determine that this variable is approximately normally distributed. How do you treat this variable in analyses?
As interval/ratio
What happens to the SEM as the sample size becomes larger? What does this imply about the dangers of using samples that are very small?
As sample size becomes larger, the SEM decreases, which means that the sample mean becomes closer to the population mean and more precise generalizations can be made. This implies that small samples will produce large SEMs, and generalizations about the sample won't be accurate and will not be close to the population parameter.
What are the advantages of having a set of scores that has a mean of zero and a SD of 1 (z scores)?
Because mean is 0, sign immediately says whether above or below the mean. Because SD is 1, numerical size of z score indicates how many SDs above or below the mean the score it.
Researchers are interested in whether increasing mindfulness reduces stress. They recruit 100 adults who are randomly assigned to a health education program or a mindfulness intervention.
Between subjects
In a research study using a sample of 30 participants, the correlation is .09 and is not statistically significant. Should the researcher conclude that there is little or no relationship between X and Y? Why or why not?
Correlations only test linear relationships, so there could be a non-linear relationship; also, with any test, you can't prove a hypothesis (you could be making a type II error).
What is effect size? Discuss the relationship between effect size and power
Effect size is the the extent to which two distributions do not overlap, or how distinct they are from one another. The larger the effect size, the more power because of the lack of overlap between the two distributions which means there is meaningful difference between them.
How does hypothesis testing evaluate the role of chance? Make sure to reference standard errors in your answer.
Every test statistic compares the calculated estimate to the size of that estimate that you'd expect by chance alone. Standard error (the variability of that estimate in the sampling distribution) becomes the size of the difference that you'd expect by chance, because it is differences in the sampling distribution that are due to sampling error alone.
Describe how an ANOVA (F statistic) estimates differences in the DV due to the IV, and the differences in the DV that are due to chance.
F = variability between groups (differences because of levels of the IV, plus random error) / variability within groups (which is size of diff get just by chance, random error)
For the study below, what's the independent variable? Dependent variable? Is it observational or experimental? Although prior social science research has established the ability of gender ideologies to influence the domestic division of labor, it has neglected to disentangle their potentially unique influence on paternal involvement with children. Past research examining the influence of gender ideology on parenting behaviors does not acknowledge potential differences that may result from accounting for each parent's gender ideology. Using both waves of the National Survey of Families and Households (N = 1,088), I assess the effect of both mother's and father's gender ideology on two measures of paternal involvement. Whereas egalitarian fathers demonstrate greater involvement than traditional fathers, mother's gender ideology failed to predict paternal involvement. Egalitarian mothers do not appear to negotiate greater father involvement successfully.
Gender Ideology/ Paternal involvement/Observational because there is not a control group, and because they are just measuring parental involvement; nothing is being manipulated
What question does a multiple regression answer?
How does this set of IVs predict this DV? Which IV is most strongly related to the DV? What are the unique associations?
Again, let's say I ask people to report on how strongly they agree (1=strongly disagree, 5 = strongly disagree) with a variety of statements, and then I take the average. How would I treat this variable functionally (i.e., what's its functional grouping)?
I would treat this variable as a functionally interval/ratio variable, assuming that the data was skewed normally. If it wasn't normal, I would say that it is ordinal.
Consider the following set of data, which represents 15 scores on a 10-point quiz: 0 1 3 3 4 4 4 5 5 5 7 8 9 9 10. If the score of 10 is changed to 225,000,000 but the other numbers remain the same, what's the general effect on the mean and the median?
If the 10 score is changed to 225,000, the median would stay the same, and the mean would would become much higher due to the extreme number. The median would stay 5, and the average would go from 5.13 to 15,000,004.5
Suppose that the difference between the two sample means is statistically significant. What decision should the researcher make about the null hypothesis? Assuming that all procedures and calculations in this experiment were done correctly, why might this decision still be incorrect? If this decision is incorrect, which type of error would the researcher be making, Type I or Type II?
If the difference between the two sample means is statistically significant then the researcher should reject the null hypothesis. This decision may still be incorrect if the sample size was small or the study had low power or if the alpha was set too low. This would be a type 1 error, because of chance/sampling error
Suppose that the difference between the two sample means is not statistically significant. What decision should the researcher make about the null hypothesis? Assuming that all procedures and calculations in this experiment were done correctly, why might this decision still be incorrect? If this decision is incorrect, which type of error would the researcher be making, Type I or Type II?
If the difference is not statistically significant, the researcher should fail to reject the null hypothesis. This may be incorrect if the sample size was too small to detect statistical differences. If the decision is incorrect, the type of error being made would be a Type II error. AType II error is typically due to low power, which is often because of a small sample size but remember that power is influenced by several factors
Take the same distribution of numbers. If the score of 10 is changed to 0, what is the general effect on the mean and the median? Why does the median change in this example, but not in the preceding example?
If the number 10 is changed to a 0, then the median would become lower, it would become 4. The mean would lower from 5.13 to 4.47. The median changes in this example because the preceding example involved a large, robust outlier, and this example a smaller number replaced a slightly larger one, so the mean changed in this example and not the previous.
You are using the .05 criterion of significance. If the two-tailed p value is .03, what decision should you make about H0? Why?
If the p is low, the Ho must go. Because the p value is less than .05, you must reject the null hypothesis.
If there are only 2 groups, will a one-way ANOVA yield the same results as a t test for two independent sample means? Explain.
If there are only two groups, a one-way ANOVA will yield the same results of an independent samples t test. By using a one-way ANOVA, there is not as high of a type I error rate.
Which of the following measures of variability is the least sensitive to the effect of extreme scores?
Interquartile range
The sum of squared deviations from the mean:
Is always a minimum value
What question are you answering with a one sample t test?
Is my sample mean similar or different from the general population?
What assumption of an independent samples t test does the Levene's test help us evaluate?
It helps us determine whether variances are assumed to be equal or not, also known as homogeneity.
If the results of a one-way ANOVA are statistically significant, why is it desirable to compute a measure of effect size?
It is desirable to compute a measure of effect size because a researcher can find the proportion of variance in the dependent variable that is explained by the independent variable, and not due to chance. F doesn't give us this information, and this info is important for evaluating more than statistical significance. Large F says just that effects are likely not due to chance, but are affected by things like sample size - doesn't say whether small, moderate, large effect. ES important estimate of practical importance. (Just a note that not ALL measures of effect size give you the proportion of variance in the DV that's accounted for by the IV)
A researcher wishes to test the hypothesis that the means of five populations are different. Why would it be a bad idea to test the researcher's hypothesis by computing 10 separate t tests?
It would be bad to use 10 separate t tests because the researcher would not be able to compare means between all 5 populations simultaneously. Using an ANOVA, the researcher would be able to run one test and test all five means of all 5 populations simultaneously. Furthermore, it can be highly problematic to run multiple t tests when you have 3 or more comparison levels, because there would be a greater degree of error among the t tests that are run.
What is a level of significance? How is that different from a p value?
LOS is a decision rule - how rare something needs to be to reject the null hypothesis; p value is an observed value - how rare something (or something even more extreme) is (due to chance alone)
Read the following descriptions, and evaluate the type of design. Researchers recruit children who were born at an extremely low birth weight and paired them based on age, gender, and social class with other children who were not low weight at birth.
Matched
Central Limit Theorem
NO matter the shape of the parent population, if you take a large enough sample, the sampling distribution of the mean will approximate a normal distribution.
Consider the situation above. Should the researcher conclude that the two means are equal? Why or why not?
No they should not conclude that the two means are equal. Because the effect sizes of the two distributions may be very small, this means that the overlap is influenced by the mean difference and the mean difference will be decreasing. However, you cannot conclude that the two means are equal, just that the difference between them is most likely not statistically significant
Let's say that you retain the null hypothesis. You had a sample of 5. Should you have confidence in that decision? Why or why not?
No you should not have confidence in that decision. A small sample size increases the standard error and the confidence interval making the analysis less precise. Shouldn't have confidence in decision to retain the null with low power because likely a type II error
Let's say I'm interested in whether handedness (left, right, ambidextrous) is related to academic grades. What type of variable is this?
Nominal
What are the disadvantages of using one-tailed tests of significance?
One disadvantage of using a one-tailed test is that the degree of chance or extreme statistics is only being shown for one side of the distribution. That means, that differences can only be observed in that one direction. When trying to compare if something is better or worse with research, one-tailed tests are not advantageous to use. This is why it is better to use two-tailed tests.
Let's say I ask people to report on how strongly they agree (1=strongly disagree, 5 = strongly disagree) with a variety of statements, and then I take the average. In terms of the classic 4-category grouping, what type of variable is this?
Ordinal
Researchers surveyed 1000 married women, and asked them various questions about the quality of their marriage. Based on this information, they classified women as a) low quality, b) moderate quality, c) high quality. What type of variable is this?
Ordinal
You're interested in detecting the difference between two means. If that mean difference is relatively small, will it be easier to detect if the variability is large or small? Why?
Relatively small - large spread or dispersion can mask small mean differences
Does cultural sensitivity training improve mentoring programs? Researchers recruit participants into a mentoring program; half-way through, they have participants evaluate the program. Then, they administer a sensitivity training and evaluate the program again at the end.
Repeated measures
What's the difference between the standard error of the mean and the standard deviation? Why is larger, and why?
SD is average variability in a distribution, not distance of single score from the mean. Similar for the SEM -- on average, how far away sample means are from population mean .The answer to that would be the standard deviation because extreme means are less likely than extreme scores.
What problems are caused by having to use samples? Include and explain the term sampling error in your answer.
Sampling error: differences between the value of a statistic observed in a sample and the corresponding population parameter (thus, error) caused by the accident of which cases happened to be included in the sample
What is statistical power?
Statistical power is the ability to detect meaningful difference
When should you use a Pearson Correlation?
TO detect linear relationships, when your predictor variable and your outcome variable is functionally I/R
What question are you answering in a pearson correlation?
TO what extent are 2 variables associated with each other?
Should you treat a result that has a two-tailed p value of .06 the same as a result that as a two-tailed p value of .45? Why or why not?
Technically, you treat them the same because you can't reject the null hypothesis for either, but we sometimes flag .06 as marginally significant - but still retain the null hypothesis
Why will the sampling distribution of the mean be approximately normally distributed?
The Central Limit Theorem explains this. This theorem states that no matter the original shape of a distribution, if you take a large enough sample then the sampling distribution of the mean will be approximately normal
What's the primary benefit of the standard deviation over the variance?
The benefit of the standard deviation over variance is that the standard deviation is in original units, and not squared units so it's easier to interpret.
Discuss the relationship between sample size and power.
The larger the sample size in a study, the more power the study has.
Researchers were interested in the whether there was a difference in academic achievement based on parenting style (authoritative, authoritarian, permissive, involved). For this test, p = .004. Interpret p based on the conceptual definition of what p represents.
There is a .4% chance that a difference in achievement based on parenting style as large or even larger than observed would occur by chance alone (if the null was true)
You are drawing inferences about the mean of one population. Why is there a different critical t value for different degrees of freedom?
There is a different critical t value for different degrees of freedom because t tests are estimations, and the n values for tests vary as well as the numbers that contribute to the total n number. Therefore, degrees of freedom vary to account for variation.Because it adjusts for the fact that more precise with larger samples.
Consider drawing a card from a deck of cards 10 times (and replacing the card each time). Are the events (i.e., the results of the flips) independent or mutually exclusive. Why?
These events are independent. The occurrence of pulling one card out of the deck, has no influence on the next card that is drawn since you are replacing the card after each pull.
Are the mean and variance resistant? Why or why not?
They are not resistant because they are influenced heavily by outliers/ extreme scores due to the fact they use the mean in calculating these variables, and the mean is also non-resistant.
Do researchers in the behavior sciences want to draw conclusions about populations or samples? Why?
They want to draw conclusions about populations because it is representative of a larger grouping or category of individuals, whereas a sample is just a slice out of a greater number of individuals. By making conclusions about populations, more individuals may be impacted by the findings of the research.
A student obtains a test score of 72 on Test X and 72 on Test Y. The student therefore concludes that there is a high correlation between Text X and Test Y. Why is this conclusion incorrect?
This conclusion is incorrect because you have to convert the test scores to z scores. This gives you an unbiased look at the test scores and accounts for differences in averages among the two tests. After finding the z scores and calculation the r value, then a correlation can be made between the two tests. Also, single scores say nothing about how scores on different variables move together.
If the SD of a set of scores is zero, what does this imply about the scores?
This means that all of the values are the same.
What does it mean to say that z scores are standardized?
This means that z scores are unit free, and it facilitates comparison between different data sets that may have had different units
A researcher is concerned that his distribution is not normally distributed. He calculates the skewness statistic, and the standard error of the statistic. When he divides the skewness / standard error, the result is 1.30. What can you tell him about the shape of his distribution?
This tells him that his data is approximately normal, due to the fact the result is less than 2. (might want data about kurtosis before make conclusion that it's approximately normal, but certainly no significant/meaningful skew)
In a correlational study, X is the number of hours of violent TV programs that participants watch, and Y is the number of violent act committed by participants in real life. Supposed that there is a moderately high correlation (say, .48) in a sample of 100 American males and this result is statistically significant. Explain why we can't infer causation from a correlational study by showing that each of the following is possible. X could cause Y. Y could cause X. The relationship between X and Y could be caused by a third variable (consider psychological impact of violent behavior).
We can't infer causation from a correlational study because of confounding variables. These 'third' variables can influence the results of a study, as shown in number 3 that psychological impact of violent behavior could also be related to hours of violent tv watched and number of violent acts committed. This study is missing random assignment to control for the confounding variables. Lastly, it is hard to determine a time order relationship with this study, as violent acts may have been committed before watching violent TV programs; Directionality problem: watching violent TV could make you violent OR being violent could make you want to watch violent TV
One assumption of an independent samples t test is that the data are drawn from an approximately normal distribution. Discuss under what conditions we need to be concerned about violations of this assumption.
We need to be concerned about violations when the sample size of the data is small, because larger data sets will yield an approximately normal distribution.
When should you do an independent samples t test?
When samples are composed of independent (different) samples, your IV is dichotomous, your DV is functionally I/R
When should you use a paired samples t test?
When samples are composed of the same participants at different time points, the IV is dichotomous, and the DV is functionally I/R
When should you use a multiple regression?
When you have more than 1/multiple independent variables, when your predictor variables are dichotomous or functionally I/R, and when your outcome variable is functionally I/R
When should you do a z test?
When you know the population mean and the population SD
When should you do a one sample t test?
When you know the population mean; don't know the population SEM
When to use a one-way ANOVA?
When you want to test differences between: 2 or more groups, your IV is categorical (nominal) or <5 levels ordinal, and your DV is functionally I/R
When should you use a point biserial correlation?
When your predictor variable is dichotomous, and your outcome variable is functionally I/R
You have an intervention that you've developed to reduce depressive symptoms; you've developed this treatment after a thorough review of existing treatments, and you believe it will be very effective at reducing these symptoms. You recruit a sample and give them this treatment. Should you do a one-tailed or two-tailed test to analyze these data? Why?
You should do a two tailed test for this intervention. Because we want to analyze the difference in symptoms among people who are in the intervention and those who are not. Therefore, some people could do worse with the intervention than if they were not a part of it. To accurately see these negative results, or scores lower than the population mean, you would need a two tailed test.
Let's say that I have a variable representing the program of students enrolled in 549 this semester: 1=ADS, 2=MFT, 3=PS, and 4=other department. I want to describe the central tendency for this distribution. What measure should I use? Why?
You should use the mode. The variable in this question is nominal. Therefore, the mode should be used to describe the central tendency because the numbering system is strictly categorical.
You are drawing inferences about the mean of one population. When would you use the t distributions as the theoretical model rather than the normal curve model?
You would use the t distributions as the theoretical model when the population standard deviation is not known. IF standard deviation is known, then you would use the normal curve model.
If you find a p level to be 1.1, you can be sure that:
an error in calculation has been made
In a positively skewed distribution, are the unusual scores extremely large or extremely small?
extremely large
In a negatively skewed distribution, at what end of the graph is the tail?
left
How can you find the median on a frequency distribution?
looking at the cumulative percent column, the median is at 50% of the cumulative percentages. The median is always the first value that is at 50% or above in the cumulative percentage column
The easiest way to reduce Type I errors is to:
make your alpha level smaller
For each of the following, select whether you think the correlation would be positive, zero, or negative. Number of hour spent watching TV per week; grades in HS
negative
For each of the following, select whether you think the correlation would be positive, zero, or negative. Grades in HS advancement psych course; scores on adv placement psych test
positive
A t-test is used in place of the z-score for groups, when which of the following must be estimated?
the population standard deviation
The best point estimate for the population mean is:
the sample mean
The t distribution that you use to find your critical values closely resembles the normal distribution when:
the sample size is large
The standard normal curve (i.e., when you standardize your distribution) retains
the shape of the original normal distribution
To keep beta (the Type II error rate) small, when the effect size is expected to be small, you would need
to use large sample sizes
In hypothesis testing, we set up and test ___________ hypotheses. They are called the ______________________ hypothesis (which is typically what the researcher believes) and the ____________ hypothesis (which is what we test).
two/ alternative/null
Suppose that the variable you have measured in a sample of subjects does not have a normal distribution in the population. Which of the following is recommended?
use a fairly large sample size (at least 30 or 40)
For each of the following, select whether you think the correlation would be positive, zero, or negative. Intelligence; size of big toe on right foot
zero