BIOL 570: FINAL EXAM
"Balance" in an experimental design refers to all of the following EXCEPT: A) The same result for each treatment. For example, in a study examining egg mass for chickens fed either diet A or diet B, you would find the same average egg mass for chickens fed each diet. B) A way to reduce the influence of sampling error in inference. C) The same sample size for each treatment. For example, in a study examining egg mass for chickens fed either diet A or diet B, you would have the same number of chickens fed each diet.
A
Experiments are powerful but they may not be the only option for some questions and they are impossible to perform for other questions. In an observational study, one should try to incorporate as many features of experimentation as one can, but it is impossible to include: A) randomization (in terms of assigning treatments) B) replication C) balance D) blocking
A
If we were to compare the Z distribution and the t-distribution, we would see that A) the t-distribution has "fatter tails." B) the t-distribution is shifted towards larger values C) the t-distribution has a lower variance. D) the t-distribution is more skewed.
A
Imagine that we are interested in whether males or females of a frog species differ in length. We find a paper that reports confidence intervals for the body length of each gender. The 95% confidence interval for the body length of females is 6.8cm to 7.2cm with a point estimate of 7.0cm. In males, the sample mean was 6.75cm and the 95% confidence interval was 6.6cm to 6.9cm. Based on this information, what can we say about what the results of a two-sample t-test would have been (if the null hypothesis were that the genders have the same mean length, and we were using an alpha of 0.05)? A) We cannot tell what the results of the hypothesis test would be without more information. B) We can tell that the null would have been rejected, and the P-value would have been less than 0.01 C) We can tell that the null would not have been rejected. D) We can tell that the null would have been rejected, but we can't guess at the P-value
A
Natalie is a high school science teacher. She reads that a bag of M&M'S has the following proportions of colors: 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue. Natalie decides to do a class exercise with her students where each student opens a bag of candy and counts the number of candies of each color. Once each student has their data, they each will do a statistical test to determine if their sample data are consistent with the proportions given above. What test will each student perform? A) Chi square goodness of fit test B) Two way ANOVA C) One way ANOVA D) Chi square test of association (same thing as a Chi square contingency test or a Chi square test of independence)
A
Short seedlings have small masses but tall seedings can have small or large masses depending on light levels. As a result, a scatterplot with height and mass could be nonlinear. Should one perform correlation analyses and hypothesis tests with these data? A) The data aren't suitable for analysis in this form, but one should see if data transformation might yield a linear relationship that could be analyzed B) No, correlation analyses/hypothesis tests should not be performed. C) Yes, correlation analyses/hypothesis tests should be performed
A
The Analysis of Variance focuses on "variance" or variation. Which statement best reflects what type of variation is being examined? A) Variation among groups relative to variation within groups. B) Variation within groups. C) Variation among groups times variation within groups D) Variation among groups.
A
The following are true statements about residuals EXCEPT: A) Residuals are, by definition, numbers that are 0 or larger (i.e. positive values). B) The black vertical lines in Fig. 17.1-2 represent residuals. C) A residual can be calculated for each data point, and this value can then be squared. The sum of the squared residuals for a data set will be as small as possible if a least squares regression line is used.
A
The primary difference between a Z-test and a t-test is that when we conduct a Z-test we need to know... A) the population parameter representing the standard deviation of the variable that we are testing. B) the skewness of the sampling distribution C) that the variable is normally distributed. D) the population mean.
A
A friend thinks that she will need to transform her data before conducting a one-sample t-test, but she is unsure what mathematical transformation to use. Which statement of advice is valid? A) It is permissible to consider mutliple types of transformations, but you must use the transformation that gives you the highest P value. B) It is permissible to consider mutliple types of transformations, but you should check a transformation by whether or not it results in your sample fitting the assumptions of the test (normality in this case). C) It is never permissible to use multiple transformations. You must know (before looking at your data) what the appropriate transformation would be. D) It is permissible to consider mutliple types of transformations, and you can just choose whatever transformation gives you the lowest P value.
B
Imagine a field fertilizer study where you have 1 m x 1 m plots. You want to see if plant biomass is affected by one of three treatments: a) add nitrogen, b) add phosphorus, c) do not add either. You do your study in a field that has sunny areas and shady areas. You are trying to decide whether to have "blocking" in the study. What is the best answer below? A) You should do blocking because light availability is likely to affect plant growth. A block in a type of habitat (sunny or shady) should not have one plot of each of the three treatments B) You should do blocking because light availability is likely to affect plant growth. A block in a type of habitat (sunny or shady area) should have one plot of each of the three treatments. C) You should not do blocking, because it is essential to have totally random assignment of treatments. D) You should not do blocking because this kind of situation does not relate to the concept of blocking.
B
Imagine that you conducted a small study analyzed using a two-sample t-test. You have done a small scale study and have some initial results. You are contemplating repeating the study with a much large sample size for both groups. Which of the following statements is NOT a valid expectation about the results of the larger study? A) the pooled sample variance that you calculate in the larger study will probably be a more accurate estimate of the variance. However, you can't really predict whether the pooled sample variance from the larger study will be a smaller value or a large value than the pooled sample variance of the small study. B) The 95% confidence interval for the difference in means from the larger study will be more likely to contain the true difference in population means. C) The standard error of the difference in means based on the larger study will probably be smaller. D) The 95% confidence interval for the difference in means from the larger study will probably be a narrower (smaller) interval.
B
Interleaf 4 is entitled "Correlation does not require causation." This is true - two variables (such as sale of ice cream cones and number of crimes in NYC) can be correlated with each other but there is no causal relationship. Now let's imagine a regression analysis that has a slope that is significantly different from 0 and can be used to predict values of Y from values of X. All the following are true statements about "causation" and "regression lines" and this general concept EXCEPT: A) To attribute a likely "causation" to a relationship between variables, the key issue is to have a well designed study (i.e., appropriate experiment and/or well designed observational study where variables are well known and confounding variables are likely eliiminated). B) If one is able to reject the null hypothesis of beta = 0 in a regression analysis, one can conclude that there is a causal relationship -- i.e., that variation in x causes variation in y. C) A statistically significant regression line doesn't necessarily mean that there is a causal relationship between X and Y -- it simply means that one has used a particular type of statistical analysis to predict Y values given X values, and that you have evidence that the slope is not zero.
B
Joe works for a pizza restaurant. To help with marketing, he takes data on a) the gender of the person who orders a pizza and b) whether the order is for a vegetarian pizza, a meat lover's pizza, or an extra cheese pizza. Joe could analyze these data with: A) a two way ANOVA B) A chi square contingency table test (this is the same thing as a chi square test of association or a chi square test of independence) C) a Chi-square goodness of fit test D) A two sample t test
B
The F-ratio in an ANOVA has all the properties EXCEPT... A) it is the test statistic for the ANOVA B) it is a number that can be negative, zero, or positive C) it is a ratio of two "mean squares"
B
The correlation coefficient (r) is a: A) population parameter B) sample statistic
B
Think about the F-distribution (see Fig 15.1-3 as an example). The following are true statements EXCEPT: A) Depending on the details of the study (for example, the number of groups being compared and the sample size), the exact shape of the distribution will change. Hence we need to use Table D. B) If we have a two-tailed alternative hypothesis, we should shade two sides of the distribution to obtain a P-value. C) This is a null distribution for the ANOVA.
B
When we are performing a two-sample t-test, we need to calculate a standard error of the difference between means for the denominator. We estimate the variance of the variable (which is a component of the standard error) using the pooled sample variance, which is... A) the sum of the sample variances. B) a weighted average of the two sample variances. C) the product of the sample variances D) the variance based on the larger sample
B
When we do a paired t-test we are really just performing a one-sample t-test on the... A) the products of the variables for each pair B) the differences between the variables for each pair C) the ratios of the variables for each pair D) the sums of the variables for each pair
B
You are interested in whether the good folks at Mars Incorporated are being honest in their labeling. They claim that a Snickers bar weighs 57g (on average). You obtain a random sample of 10 Snickers bar, and weigh them. What type of hypothesis test would you use? A) Z-test B) one-sample t-test C) Mann-Whitney U-test D) two-sample t-test
B
A researcher is examining data on health care costs. She uses an ANOVA to explore if there is a difference between people from New York, Kansas, and California in average hospital bills for colonscopies. She creates an ANOVA table. She finds that she rejects her null hypothesis. All the following are true EXCEPT: A) She would have used a F test with defining alpha in only one tail of the distribution. B) Her between groups degrees of freedom would be two. C) Given her small P value, she can conclude that Kansas health costs are smaller than those from New York or California. D) Given her small P value, there must be much larger variance between groups than within groups.
C
Bill was examining medical data. The probability of a Ceasarian birth in the United States in 2012 was 0.33. Bill is examining data from 2015 and wants to determine if his data are consistent with the 2015 data or whether there has been a significant change in rates. What kind of statistical test is most appropriate? A) Chi square contigency table test (same thing as a Chi square test of association or Chi square test of independence) B) Chi square goodness of fit test C) binomial test D) paired t test or sign test
C
If we want to conduct a t-test of a mean, but we cannot assume that our data are normally distributed we can... A)use a Z-test. B) just go ahead and use the t-test, because it does not assume that data are normally distributed. C) use a mathematical transformation of each data point if the resulting collection of values is adequately described by a normal distribution. D) use a binomial test.
C
Interleaf 5 contains a great quote by Fisher "To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of." The answers below relate to important issues that are discussed in "Making a Plan" in relation to this quote. All issues noted below should be considered BEFORE the study is done EXCEPT: A)Make sure you have clearly defined the research question and possible outcomes before you start. B) Make sure you have addressed whether your sample size is sufficient to detect the effects you are interested in. C) Make sure your experiment is well designed and has a very large number of treatments - this will allow you to quickly narrow down the important factors in your study system. D) Other researchers in the past or present time may have asked the same research question that interests you - make sure you spend time reading the past literature before you begin your work.
C
Similar to the tables that you were given for the chi-squared distribution, the t-table in your book shows the critical value for a given alpha-level. The critical value is ... A) the probability of a type I error B) the probability of a type II error. C) the threshold representing the smallest magnitude of the test statistic that would result in rejection of the null hypothesis. D) the largest value of the test statistic that one could obtain.
C
Sometimes, we like to use the Mann Whitney U test because... A) it helps us make the best use of a paired study design. B) it give us more statistical power to reject the null hypothesis than a t-test. C) it can be applied to variables that are not normally-distributed. D) it lets us produce a confidence interval for the difference in means between groups.
C
The best way to describe an experiment is: A) The study includes manipulations of the study system. B) The study has a control and at least one experimental treatment. C) The researcher randomly assigns the treatments to units or subjects so effects of the treatments can be compared.
C
The following are all correct statements about correlation and regression EXCEPT: A) The definition of a population is different; for the correlation, we assume a popuation of x and y pairs that have a population correlation coefficient "rho", while in regression, we assume a population regression line with two population parameters - alpha and beta. B) Both regression and correlation focus on linear assocations between two continuous variables. C) Residuals are used in calculation of the slope of the correlation coefficient. D) Many of the calculations (i.e. x-xbar, y - ybar, etc.) are the same for both correlation and regression.
C
All are true statements about two way ANOVA's EXCEPT: A) A "2 way ANOVA" has a numerical response variable. B) The F test statistic is used for a "2 way ANOVA." C) The "2" of the "2 way ANOVA" refers to the fact that it considers two explanatory variables. D) A "2 way ANOVA" is more efficent to perform but two "1 way ANOVA's" provide the same information.
D
On p. 548, a confidence interval for the slope is shown for a data set where one is exploring if age can predict the proportion of black on the nose in lions. All of the above are true statements about confidence intervals EXCEPT: A) 95% confidence intervals for the slope will be smaller (narrower) than 99% confidence intervals. B) The "b" in the equation refers to the sample slope (which we can calculate) while the "beta" symbol refers to the population slope (which is unknown) C) Imagine we could do 99 more samples, each of 32 lions, and calculate a 95% confidence interval for each of the samples. If we consider the 100 confidence intervals (the 99 new ones and the one in the book), we'd expect the true population slope to occur within 95 of the 100 confidence intervals. D) Since this confidence interval is a result from a study of 32 lions, we would expect that if we sample another 32 lions from the same population, the value of the slope from this second sample would lie between 7.56 and 13.73.
D
Sue works for a conservation group that is exploring pollution levels in a river. Over a 200 mile stretch of the river, she locates 10 cities. For each city, she samples river water just upstream and just downstream of the city. Her goal is to see if pollution concentrations are consistently different for the water that is entering a city verus leaving it. She should analyze her data with: A) a one way ANOVA B) a two way ANOVA C) a two sample t test or a Mann Whitney U test D) a paired t test or a sign test
D
The following are features of randomization tests (= permutation tests) EXCEPT: A) In randomization tests, one creates a null distribution for the specific test statistic that is most relevant for your study (Fig. 13.8-1 for example) as opposed to using published null distributions like for the t statistic. B) Since one creates the null distribution by random processes, you will get slightly different null distributions each time you repeat the test. C) They can be used when assumptions for other tests (such as two sample t tests) are not being met. D) A randomization test is essentially the same concept as "randomization" in the context of experimental design.
D
There are many t-distributions which vary by their "degrees of freedom." When we conduct a t-test, the degrees of freedom for the null distribution is... A) the sample size. B) the t-statistic. C) the P-value. D) the sample size minus 1.
D
These are true statements about the standard error of the slope EXCEPT: A) The standard error of the slope is in the denominator of the t statistic used to test hypotheses about slopes. B) Larger sample sizes lead to smaller standard errors. C) A standard error of the slope is a standard deviation of a sampling distribution (i.e. a distribution of all the possible slopes one might obtain if you redid the study over and over again with random samples from the same population). D) The equation for the standard error of the slope has a square root in it; thus its units are square roots of the units used in the actual sample slope.
D
When looking for a difference in means between two populations a paired study design analyzed using a paired t-test usually has more statistical power than an unpaired design analyzed using a two-sample t-test. The increase in power of the paired design is a result of the paired study design... [choose an answer to correctly complete the sentence] A) increasing the variance of the data. B) increasing the mean difference between the two groups. C) decreasing the mean difference between groups. D) reducing the effects of extraneous sources of variablity to give you a smaller standard error of the difference in group means.
D
When we conduct a two-sample t-test, we assume that we have a random sample from each population, that the populations have the same variances, and... A) that the variable which we are measuring is discrete. B) that the variable follows a chi-squared distribution. C) that the variance is less than 5% of the mean D) that the variable is normally distributed
D
When we examine a normal-quantile plot to detect deviations from normality, we interpret _____________________ as an indication that the statistic that we are examining is not normally-distributed. A) tight clumping of points B) symmetry of points C) points that fall along a line D) points scattered in a way that cannot be described by a straight line
D
Background: We are testing two fertilizers by random sampling. We find that Brand X has a significant effect on plant mass. We cannot reject the hypothesis that brand Y has no effect on plant mass. Answer True or False for the following statement: Based on that information we can conclude that Brand X has a larger effect on mass than brand Y. True OR False
False
The non-parametric equivalent of a two-sample t-test that uses just the rank order of the measurements is the _____________ log-transformed t-test. sign test Mann-Whitney U-test Z-test.
Mann-Whitney U-test
If we want to approximate the tail probabilities for a binomial distribution for a large number of trials, we can use a... Z-statistic t-statistic chi-square statistic U-statistic
Z-statistic
A correlation coefficient was calculated for data on two variables: the length of a butterfly wing (cm) and the length of the butterfly body (cm). Which is true about the correlation coefficient? a) It should be reported with units of cm. b) It should be reported without any units. c) It should be reported with units of cm-squared.
b
The following are true statements about logistic regression EXCEPT: A) Each red circle in Figure 17.9-1 refers to the outcome (life or death) of a fish in the study. B) The black line in Figure. 17.9-1 shows the location of the data for the guppy study. C) In logistic regression, the response varable has only two possible values (i.e. Y is either 0 or 1).
b
In a sign test, we disregard the magnitude of the differences in means between two groups, and we simply perform a _____________ test on the sign of the difference. binomial t-test Z-test U-test
binomial
You calculate a correlation coefficient with a data set. The number you obtain could be: a) any positive number b) 0, or any positive number less than or equal to 1 c) -1 or +1 or any number in between these values
c
Which of the following is NOT true of the t distribution with 4 degrees of freedom? It has a mean of 0 The mode is 0. It is symmetrical It is right-skewed
it is right-skewed
If the variable that you are measuring has a right-skewed distribution (long tail to the right), what technique could we use if you wanted to apply a t-test? a chi-squared test. log-transformation a paired design
log-transformation
Regression toward the mean is a challenging topic that can apply to: Observational studies Observational and experimental studies Experimental studies
observational studies
A log-transformation is often useful for creating a normally distributed sample if the distribution of the "raw" data is... multimodal invariant right-skewed. symmetrical
right-skewed
We are interested in the effect of college on net worth. We identify 15 cases of twins in which one twin went to college, and the other did not. Unfortunately, the difference in net worth values do not appear to follow a normal distribution, and it is not clear how we could transform the differences to make them normally-distributed. What is the best test to use? a two-sample t-test a sign test. a Mann-Whitney U test a paired t-test
sign test
The null hypothesis for the Analysis of Variance is simply an extension of a null hypothesis for the _______ test. Fill in the blank. Chi-square goodness of fit test Chi-square contigency table test Z test Two sample t test
two sample t-test
When we are comparing means between two populations and our samples from each population are drawn completely independently from one another, then we should use a ... paired t-test Z-test binomial test two-sample t-test Correct
two sample t-test
You want to know if a fertilizing a pasture affects the pH of nearby ponds. You identify twenty ranches with ponds, 11 of which use fertilizer and 9 of which do not. If the pH measurements are normally distributed, what form of hypothesis test would you use to address your question? a sign test paired t-test two-sample t-test Z-test
two-sample t-test
type II error
when the null is false but not rejected (type II increases as type I decreases and alpha decreases - the more power, the less chance of type II error)
type I error
when the null is true but is rejected (type I decreases as alpha decreases)
