Biostatistics 4 questions
What does the value of the F statistic from an ANOVA represent?
An F statistic is a value you get when you run an ANOVA test or a regression analysis to find out if the means between two populations are significantly different. - A T-test will tell you if a single variable is statistically significant - F test will tell you if a group of variables are jointly significant.
What does "interaction" refer to?
An interaction effect happens when one explanatory variable interacts with another explanatory variable on a response variable. - this is opposed to the "main effect" which is the action of a single independent variable on the dependent variable. EX: you were studying the effects of a diet drink and a diet pill (the explanatory variables) on weight loss. - The "main effects" would be the effect of a diet drink on weight loss, and the effect of the diet pill on weight loss. - The interaction effect happens when the drink and pill taken at the same time (the combination could speed up or slow down weight loss) EX: Synergy is interaction effect
Researchers wanted to compare job satisfaction of new Walmart employees with new HEB employees over the course of their first four months. 5 newly hired employees at Walmart and 5 newly hired employees at HEB were asked to rate their job satisfaction on a scale of 1-10 after every month. The statistical results are provided: Month: F = 5.478 (critical value = 4.342) Walmart v. HEB: F = 2.365 (critical value = 2.675) Interaction: F = 4.389 (critical value = 3.678) Which choice best describes the most appropriate test statistic used as well as the conclusions reached from the data provided? A. Two way ANOVA (2 independent factors); New hires at Walmart and HEB had significantly different job satisfactions. There was no significant difference between the different months. There was no interaction effect present. B. Two way ANOVA (2 dependent variables); New hires at Walmart and HEB did not have significantly different job satisfactions. There was a significant difference between the different months. There was an interaction effect present. C. Two way ANOVA (mixed factors); New hires at Walmart and HEB did not have significantly different job satisfactions. There was a significant difference between the different months. There was an interaction effect present. D. Two way ANOVA (2 independent factors); New hires at Walmart and HEB did not have significantly different job satisfactions. There was a significant difference between the different months. There was an interaction effect present. E. Two way ANOVA (mixed factors); New hires at Walmart and HEB did not have significantly different job satisfactions. There was a significant difference between the different months. There was no interaction effect present.
C. Two way ANOVA (mixed factors); New hires at Walmart and HEB did not have significantly different job satisfactions. There was a significant difference between the different months. There was an interaction effect present. -t he two way ANOVA used had two mixed factors: the store (HEB or Walmart) is independent - the month is dependent because the same people were measured four times. - Because F is less than the critical value for the stores, the null cannot be rejected and thus there is no significant difference.
What are the underlying assumptions of a Repeated Measures ANOVA?
One factor with at least two levels, levels are dependent. By saying that the levels are dependent, it means that they share variability in some way. Almost identical to the On-Way but with a few changes. 1) *Independent observations* (or, more precisely, independent and identically distributed variables). This is often -not always- satisfied by each case in SPSS representing a different person or other statistical unit. 2) The test variables follow a multivariate normal distribution in the population. However, this assumption is not needed if the sample size >= 25. 3) *Sphericity.* This means that the population variances of all possible difference scores (com_1 - com_2, com_1 - com_3 and so on) are equal. Sphericity is tested with Mauchly's test.
What are the underlying assumptions of a Two-Way ANOVA?
Two or more factors (each of which with at least two levels), levels can be either independent, dependent, or both (mixed) à In this example, there are two factors gender (w/2 levels) & days (w/3 levels). Independence. - The observations are assumed to be independent. Equal variance. - The observations (equivalently, the errors) are assumed to have equal variance. Normality. - The errors (not the residuals!) are assumed to be normal. Equivalently the observations within each combination of factors are normal. However, since we don't have access to the errors, we examine this assumption by reference to the residuals, which (if the other assumptions hold) will approximate them. ́ The values of the factors are known/fixed.
What are the underlying assumptions of a Nested ANOVA?
Use nested ANOVA when you have one measurement variable and more than one nominal variable, and the nominal variables are nested (form subgroups within groups). It tests whether there is significant variation in means among groups, among subgroups within groups, etc. A nested ANOVA, like all ANOVA s, assumes that - the observations within each subgroup are normally distributed and have equal standard deviations.
What are the assumptions of Kruskal-Wallis?
Use when one nominal variable and one measurement variable, you would usually analyze using one-way anova, BUT the measurement variable does not meet the normality assumption of a one-way anova - OR use when one nominal and one ranked - does NOT assume that the data are normal - DOES assume that the different groups have the same distribution - groups with different standard deviations have different distributions. If your data are heteroscedastic, Kruskal-Wallis is no better than one-way ANOVA, and may be worse. Instead, you should use Welch's ANOVA for heteroscedastic data.
What does partitioning of variances mean and how does contribute to your testing hypotheses?
Variances can be divided up, that is, partitioned - the variance is computed as the sum of squared deviations from the overall mean, divided by N-1 (sample size minus one). - Given a certain N, the variance is a function of the sums of (deviation) squares, or SS for short. For hypothesis tests, the things we might care about are how far the true significance level might be from what we want it to be, and whether power against alternatives of interest is good. The variance of your dependent variable should be equal in each cell of the design. - this can impact the significance level when sample sizes are unequal.
You are developing the data from a study in which you have measured enzyme activity in a gene product for which you have developed 2 new alleles by site directed mutagenesis that each altering an amino acid in the protein relative to the wild type. Since this protein is a dimer you now can have 6 possible genotypes observed in your randomly mating mouse population. You have randomly harvested 50 individuals from your population, divided them by sex and genotype, and measured enzyme activity in the brain of each mouse. What would the most appropriate analysis be that would help determine if the mean enzyme activity of the six genotypes are equal, the mean enzyme activity of the males and females are equal, and there is no interaction between the genotype and sex of the mouse affecting enzyme activity? a. Two-way (factorial) ANOVA b. Nested (Hierarchical) ANOVA c. Repeated Measures ANOVA d. One Way ANOVA
a. Two-way (factorial) ANOVA - This test will evaluate the differences across genotypes, between males and females as well as interaction between the sex and genotype
A one-way ANOVA can be used to: a. determine if the means of three or more groups are the same. b. determine which groups within the dataset have different means c. determine if you have interaction between the nominal variables being tested. d. all of the above
a. determine if the means of three or more groups are the same. - it is an extension of the two sample t-Test, and only tells you if the cluster of groups have the same mean or not.
As variability due to chance decreases, the value of F will a. increase b. stay the same c. decrease d. can't tell from the given information
a. increase
Assuming that the null hypothesis being tested by ANOVA is false, the probability of obtaining a F- ratio that exceeds the value reported in the F table as the 95th percentile is: a. less than .05. b. equal to .05. c. greater than .05.
a. less than .05.
To determine whether the test statistic of ANOVA is statistically significant, it can be compared to a critical value. What two pieces of information are needed to determine the critical value? a. sample size, number of groups b. mean, sample standard deviation c. expected frequency, obtained frequency d. MSTR, MSE
a. sample size, number of groups
The ________ sum of squares measures the variability of the sample treatment means around the overall mean. a. treatment b. error c. interaction d. total
a. treatment
Assuming calculations have been performed for a One-Way ANOVA with equal sample sizes, if each data value in one of the samples is increased by a fixed amount, then any change in the F test statistic and the P-value is attributable only to the change in the sample mean. a. true b. false
a. true
The error deviations within the SSE statistic measure distances: a. within groups b. between groups c. both (a) and (b) d. none of the above e. between each value and the grand mean
a. within groups
If the sample means for each of k treatment groups were identical (yes, this is extremely unlikely), what would be the observed value of the ANOVA test statistic? a. 1.0 b. 0.0 c. A value between 0.0 and 1.0 d. A negative value e. Infinite
b. 0.0
An adult human has 32 teeth. However, as we get older we face events that can cause us to lose our adult teeth. Examples include losing a tooth from an accident, getting a tooth pulled due to infection or crowding, and having your wisdom teeth removed. You went to five different universities across Texas and randomly sampled twenty students from each university to provide you with how many adult teeth they still have. You are excited to analyze your data and you have decided to do a One-Way ANOVA. Questions: What is your Among Groups Degrees of Freedom and what is your Within Groups Degrees of Freedom? If you reject your null hypothesis with the One-Way ANOVA, should you perform a Tukey-Kramer Test, a Bonferroni Test, or a Dunnett's Test? a. Among Groups DF=4. Within Groups DF=19. Bonferroni Correction. b. Among Groups DF=4. Within Groups DF=95. Tukey-Kramer Test. c. Among Groups DF=19. Within Groups DF=4. Dunnett's Test d. Among Groups DF=95. Within Groups DF=19. Dunnett's Test. e. Among Groups DF=95. Within Groups DF=4. Tukey-Kramer Test.
b. Among Groups DF=4. Within Groups DF=95. Tukey-Kramer Test.
What is the function of a post-test in ANOVA? a. Determine if any statistically significant group differences have occurred. b. Describe those groups that have reliable differences between group means. c. Set the critical value for the F test (or chi-square).
b. Describe those groups that have reliable differences between group means.
You are performing a pilot study on the reduction of serum triglycerides by a new pharmaceutical over a 6 week period. You have 8 volunteers (yes...all IRB approved and all!) who will take the prescribed dose daily for 6 weeks and you collect triglyceride levels (5 ranges) from their blood samples at the end of week 2, week 4 and week 6. What would be the most appropriate statistical test to evaluate whether there is any difference in the serum triglyceride levels in week 2, 4 & 6? a. One-way ANOVA b. Friedman Test c. Kruskal-Wallis Test d. Repeated Measures ANOVA
b. Friedman Test - The Friedman test is equivalent to a repeated measures ANOVA and due to the small sample size and ordinal data collected this test is the most appropriate.
You have performed a 2-way ANOVA and have discovered that the Interaction between your factors is significant, what should you do to to clarify your analysis? a. Perform a Kruskal-Wallis test b. Look at each factor individually with a one-way ANOVA c. perform a Tukey-Cramer test to partition the difference in means d. Perform a two-sample t-Test on pairwise comparisons.
b. Look at each factor individually with a one-way ANOVA
You carried out an ANOVA on a preliminary sample of data. You then collected additional data from the same groups; the difference being that the sample sizes for each group were increased by a factor of 10, and the within-group variability has decreased substantially. Which of the following statements is NOT correct. a. The degrees of freedom associated with the error term has increased b. The degrees of freedom associated with the treatment term has increased c. SSE has decreased d. FDATA has changed e. FCRIT has changed
b. The degrees of freedom associated with the treatment term has increased
Assuming no bias, the total variation in a response variable is due to error (unexplained variation) plus differences due to treatments (known variation). If known variation is large compared to unexplained variation, which of the following conclusions is the best? a. There is no evidence for a difference in response due to treatments. b. There is evidence for a difference in response due to treatments. c. There is significant evidence for a difference in response due to treatments d. The treatments are not comparable. e. The cause of the response is due to something other than treatment
b. There is evidence for a difference in response due to treatments.
You conducted a study in which the speeds of NFL wide receivers were measured. Running times were collected from seven different teams and each team provided 9 samples. The data collected showed a balanced design, however, greater than 3-fold variations in group standard deviations were also discovered. What test should be used for analysis? a. Two-way ANOVA b. Welch's ANOVA c. KrusKal-Wallis Test d. One-way ANOVA
b. Welch's ANOVA - Welch's ANOVA is used when there are large variations in standard deviation and sample sizes in each group less than 10
When conducting an ANOVA, the F(data) will always fall between: a. between negative infinity and infinity b. between 0 and 1 c. between 0 and infinity d. between 1 and infinity
b. between 0 and infinity
If the true means of the k populations are equal, then MSTR/MSE should be: a. more than 1.00 b. close to 1.00 c. close to 0.00 d. close to -1.00 e. a negative value between 0 and - 1 f. not enough information to make a decision
b. close to 1.00
The ______ sum of squares measures the variability of the observed values around their respective treatment means. a. treatment b. error c. interaction d. total
b. error
When the k population means are truly different from each other, it is likely that the average error deviation: a. is relatively large compared to the average treatment deviations b. is relatively small compared to the average treatment deviations c. is about equal to the average treatment deviation d. none of the above e. differ significantly between at least two of the populations
b. is relatively small compared to the average treatment deviations
When conducting a one-way ANOVA, the _______ the between-treatment variability is when compared to the within-treatment variability, the _______ the value of F(data) will be tend to be. a. smaller, larger b. smaller, smaller c. larger, larger d. smaller, more random e. larger, more random
b. smaller, smaller
FDATA = 5, the result is statistically significant: a. Always b. Sometimes c. Never
b. sometimes
In the analysis of variance procedure (ANOVA), "factor" refers to: a. the critical value of F. b. the independent variable. c. different levels of a treatment. d. the dependent variable
b. the independent variable.
You obtained a significant test statistic when comparing three treatments in a one-way ANOVA. In words, how would you interpret the alternative hypothesis HA? a. All three treatments have different effects on the mean response. b. Exactly two of the three treatments have the same effect on the mean response. c. At least two treatments are different from each other in terms of their effect on the mean response. d. All of the above. e. None of the above.
c. At least two treatments are different from each other in terms of their effect on the mean response.
A researcher has decided to test whether there is a difference in the means of weight in Asian, African and Caucasian students at an elementary school. The weights for 10 members of each population were acquired. Which type of ANOVA test should be performed? If you are determined to find where any differences are located, and you find a critical value of 2.93 and calculate an F statistic of 6.46, what post hoc test, if necessary, should be conducted? a. Factorial ANOVA with two mixed factors, no further test required b. Factorial ANOVA with two dependent factors, DunnetT's test c. One way ANOVA, Tukey's Test d. Nested ANOVA, Friedman's Test
c. One way ANOVA, Tukey's Test
12. In one-way ANOVA, which of the following is used within the F-ratio as a measurement of the variance of individual observations? a. SSTR b. MSTR c. SSE or MSE d. none of the above
c. SSE or MSE
A research group was interested in the effects of running on stress levels amongst students of different college majors. The researchers surveyed two different majors with 6 individuals in each group: students studying biology and students studying chemistry. The experiment was conducted over a 6 week period in which students were told to run a minimum of 4 miles in a week's time. After each week, each student was told to rank his or her stress level from 1-10 (1 being the lowest, 10 being the highest). The researchers surveyed the same individuals after each week period, indicating that six studied biology and six studied chemistry. After the 6-week period, the research group ran a Mixed ANOVA to see if there was a significant difference in stress levels, but it was apparent that a mistake was made in their calculations. Which of the following statements could potentially be the reason the group made in error in this type of ANOVA? How could this mistake be fixed? a. The research group should have surveyed 6 different individuals each different week, totaling in surveying 72 individuals total. This mistake could have been fixed if they properly surveyed the correct amount of individuals rather than only surveying 12 individuals throughout the 6-week period b. The research group forgot to form subgroups in their experimentation. This mistake could be fixed if the group had ran a Hierarchal ANOVA altogether. c. The research group calculated an error term for each separate week, indicating that more than two error terms were present in their calculations. This mistake could have been fixed if the research group only calculated an error term for the specific major and an error term for the interaction of each week. By calculating an error term for the interaction and for each week, unnecessary error terms were included which skewed the experiments results. d. The research group should have run a Tukey Test prior to beginning the experiment. By running the test prior to beginning the experiment, a mistake could have been avoided because the group could have discovered the apparent differences between the two groups.
c. The research group calculated an error term for each separate week, indicating that more than two error terms were present in their calculations. This mistake could have been fixed if the research group only calculated an error term for the specific major and an error term for the interaction of each week. By calculating an error term for the interaction and for each week, unnecessary error terms were included which skewed the experiments results.
Suppose the critical region for a certain test of the null hypothesis is of the form F > 9.48773 and the computed value of F from the data is 1.86. Then: a. H0 should be rejected. b. The significance level is given by the area to the left of 9.48773 under the appropriate F distribution. c. The significance level is given by the area to the right of 9.48773 under the appropriate F distribution. d. The hypothesis test is two-tailed e. None of these.
c. The significance level is given by the area to the right of 9.48773 under the appropriate F distribution.
An investigator randomly assigns 30 college students into three equal size study groups (early- morning, afternoon, late-night) to determine if the period of the day at which people study has an effect on their retention. The students live in a controlled environment for one week, on the third day of the experiment, treatment is administered (study of predetermined material). On the seventh day the investigator tests for retention. In computing his ANOVA table, he sees that his MS within groups is larger than his MS between groups. What does this result indicate? a. An error in the calculations was made. b. There was more than the expected amount of variability between groups. c. There was more variability between subjects within the same group than there was between groups. d. There should have been additional controls in the experiment.
c. There was more variability between subjects within the same group than there was between groups.
In ANOVA with 4 groups and a total sample size of 44, the computed F statistic is 2.33 In this case, the p-value is: a. exactly 0.05 b. less than 0.05 c. greater than 0.05 d. cannot tell - it depends on what the SSE is
c. greater than 0.05
Partitioning the variance is often done as a post hoc evaluation in random effect or "Model II" one-way ANOVA. This evaluation would tell us a. how much variance is explained among the different groups compared to across the groups. b. which portion of the variation is attributable to interaction between the groups. c. how much variance is explained among the different groups compared to within the individual groups. d. which portion of the variation is attributable to each group.
c. how much variance is explained among the different groups compared to within the individual groups.
Analysis of variance is a statistical method of comparing the ________ of several populations. a. standard deviations b. variances c. means d. proportions e. none of the above
c. means
FDATA= 0.9, the result is statistically significant: a. Always b. Sometimes c. Never
c. never
Assume that there is no overlap between the box and whisker plots for three drug treatments where each drug was administered to 35 individuals. The box plots for these data: a. provide no evidence for, or against, the null hypothesis of ANOVA b. represent evidence for the null hypothesis of ANOVA c. represent evidence against the null hypothesis of ANOVA d. can be very misleading, you should not be looking at box plots in this setting
c. represent evidence against the null hypothesis of ANOVA
If the MSE of an ANOVA for six treatment groups is known, you can compute: a. df1 b. the standard deviation of each treatment group c. the pooled standard deviation d. b and c e. all answers are correct
c. the pooled standard deviation
ANOVA was used to test the outcomes of three drug treatments. Each drug was given to 20 individuals. The MSE for this analysis was 16. What is the standard deviation for all 60 individuals sampled for this study? a. 6.928 b. 48 c. 16 d. 4
d. 4
Which of the following is an assumption of one-way ANOVA comparing samples from three or more experimental treatments? a. All the response variables within the k populations follow a normal distributions. b. The samples associated with each population are randomly selected and are independent from all other samples. c. The response variable within each of the k populations have equal variances. d. All of the above.
d. All of the above.
What would happen if instead of using an ANOVA to compare 10 groups, you performed multiple t- tests? a. Nothing, there is no difference between using an ANOVA and using a t-test. b. Nothing serious, except that making multiple comparisons with a t-test requires more computation than doing a single ANOVA. c. Sir Ronald Fischer would be turning over in his grave; he put all that work into developing ANOVA, and you use multiple t-tests d. Making multiple comparisons with a t-test increases the probability of making a Type I error.
d. Making multiple comparisons with a t-test increases the probability of making a Type I error.
A researcher wants to test the amount of antioxidants present in different types of tea. The types of tea that are tested are Green Tea, Black Tea, Herbal Tea, and Mint Tea. The amounts of antioxidants are measured in milligrams of antioxidant per gram of tea (mg/g). The amounts of antioxidants are measured multiple times for each type of tea, but not with the same bag (measurements are not made on the same individual tea bag more than once; different tea bags are used). The observations for amount of antioxidants in each type of tea fit a normal distribution and the distributions for each category of tea have the same standard deviation. With this information, and knowing that the observations were made independently, which statistical test should the researcher use to analyze their data? If the observations did not fit a normal distribution but all other information remained the same, what statistical test should the researcher perform then? a. Nested ANOVA for the first scenario; One-Way ANOVA for the second scenario. b. Two-Way ANOVA with replication for the first scenario; Welch's ANOVA for the second scenario. c. Two-Way ANOVA without replication for the first scenario; Friedman Test for the second scenario. d. One-Way ANOVA for the first scenario; Kruskal-Wallis Test for the second scenario.
d. One-Way ANOVA for the first scenario; Kruskal-Wallis Test for the second scenario.
Which of the following ANOVAs are matched incorrectly with their corresponding scenarios? a. 2-Way Independent Variables ANOVA: A Bacteriologist wishes to compare the optimal growth temperature for the bacteria Escherichia coli and Salmonella that cause food poisoning. He studied 12 petri dishes of each bacterium and counted the amounts of spores formed at 35, 45 and 55 ℃ to see if there was a significant difference amongst the 3 null hypotheses. The alpha level was set to 0.05. b. 2-Way Dependent Variables ANOVA: A Researcher wishes to compare the amount of blood glucose in 50 randomly chosen Diabetic individuals labeled 1-50 at 2 stages: before and after starting a new insulin treatment. The amounts of blood glucose levels were tested at 2, 4, and 6 weeks before and after the study to see if there was a significant difference amongst the 3 null hypotheses. The alpha level was set to 0.05. c. Repeated Measures ANOVA: A statistics professor wishes to compare the quiz grades of 50 students before and after going to tutoring sessions to see if there was a significant difference amongst the 1 null hypothesis. The alpha level was set to 0.05. d. Randomized Block ANOVA: The state of Texas wishes to compare the percent of people living under the poverty line in 125 of its 254 counties. The counties to be tested were chosen sequentially by alphabetical order to see if there was a significant difference amongst the 1 null hypothesis. The alpha level was set to 0.05.
d. Randomized Block ANOVA: The state of Texas wishes to compare the percent of people living under the poverty line in 125 of its 254 counties. The counties to be tested were chosen sequentially by alphabetical order to see if there was a significant difference amongst the 1 null hypothesis. The alpha level was set to 0.05. - The test is described correctly with wanting to compare different treatments on small plots (Counties) within a larger block (TX). What makes this statement wrong is that the small plots must have been chosen randomly and not systematically.
The primary interest in designing a randomized block experiment is to: a. Increase the between-treatments variation to more easily detect differences among the treatment means. b. Reduce the variation among the blocks. c. Increase the within-treatments variation to more easily detect differences within the treatment means. d. Reduce the within-treatments variation to more easily detect differences among the treatment means.
d. Reduce the within-treatments variation to more easily detect differences among the treatment means.
In a study, subjects are randomly assigned to one of three groups: control, experimental A, or experimental B. After treatment, the mean scores for the three groups are compared. The appropriate statistical test for comparing these means is: a. the correlation coefficient b. chi square c. the t-test d. the analysis of variance
d. the analysis of variance
The underlying assumptions for a One-way ANOVA are: a. the data are obtained from equal sample sized datasets with equal variances. b. the observations are selected from randomized populations with equal variances and means. c. the observations are heteroscedastic and randomly selected from populations with equal variances. d. the observations are independent and randomly selected from Normal populations with equal variances.
d. the observations are independent and randomly selected from Normal populations with equal variances.
The test statistic for a one-way ANOVA, F, represents: a. the ratio of sum of squares of means among groups divided by the mean square of variance within groups b. the ratio of the error mean square divided by the within mean square c. the ratio of the squared interaction divided by the mean square with subgroups. d. the ratio of variance among means divided by the average variance within groups
d. the ratio of variance among means divided by the average variance within groups
How do changes in the value of variables within the sample sets impact the test statistics of an ANOVA and what is this change attributable to?
the value of variables change the homoscedasticity and normality of the data, which affects which tests can be used
What are the underlying assumptions of a One-way ANOVA?
́ One-Way ANOVA - One factor with at least two levels, levels are independent. (e.g. - different dosages; each dosage repeated numerous times) 1. Normality of Sampling Distribution of Means - The distribution of sample means is normally distributed. 2. Independence of Errors - Errors between cases are independent of one another. 3. Absence of Outliers - Outlying scores have been removed from the data set. 4. Homogeneity of Variance - Population variances in different levels of each independent variable are equal.