Midterm 2- Biology 180

Ace your homework & exams now with Quizwiz!

Which of the following is NOT true about the Akaike Information Criterion (AIC) used by default for automated stepwise regression in R? The AIC balances model quality with model complexity The AIC depends on both the number of data points and the number of model degrees of freedom A smaller AIC means a better model A larger AIC means a better model

A larger AIC means a better model

Answer the questions 2 and 3 according to the following nested model: the quality of two different baker's work is investigated by selecting two cakes made by each of them, and then taste-testing 10 slices from each cake. How the F ratio is calculated for the null hypothesis H0: Does taste vary between cakes made by the same baker? F = Cake EMS/Slice EMS F = Baker EMS/Slice EMS F = Baker EMS/Cake EMS F = Slice EMS/Cake EMS

F = Cake EMS/Slice EMS

Independence is the one and only assumption underlying all general linear models.

FALSE

It's not possible for a population to exceed its carrying capacity. T/F

False

Automated stepwise regression is guaranteed to give you the best possible model for your data.

False because automated stepwise regression involves adding or subtracting a single explanatory variable at a time, it might fail to include the right variables if there are strong interactions.

Which of the following is an accurate description of the R assignment operator? In R you must use the "<-" assignment operator since "=" will not properly assign values to variables The official assignment operator is "<-" but "=" will usually work The assignment operator is R is "assign()" The assignment operator in R is "=" just like in almost every other language.

The official assignment operator is "<-" but "=" will usually work

Which of the following is not a way to program in R? Type the program into the R script editor and then type Ctrl-R Edit the R program file using a plain text editor and then read the program into R using the source() command Type your R instructions onto a computer punched card and read the program into the computer using a card reader. Type the program into an editor and then cut and paste all of the instructions into the R console

Type your R instructions onto a computer punched card and read the program into the computer using a card reader.

Which of the following conditions would result in the Malthusian model predicting exponential decay in a population (i.e. the population is dying off)?

r < 0

Which of the following is not a legitimate R variable name? Filled_with_rain_water SoMuchDependsUpon a.red.wheelbarrow 7up

7up

Consider the following experiment to better understand the relationship between honey production in beehives and the amount of pollen-producing plants available to the bees in that region: The number of pollen-producing plants in the region were counted every month for 12 months, and the amount of honey produced in the beehive was also measured every month. The experiment was repeated on a total of three beehives. Which of the following statements are correct? (More than one answer may be select

2 and 4

Suppose a population originally consists of 20 individuals, and the net reproduction rate, r, is 3.0 per week. The carrying capacity of the system is K=40. What is the prediced change in the population size (delta P) over the coming week?

30

A population initially has a size of 500 individuals. The per capita birth rate (f) is 3 per year, and the per capita death rate (d) is 1 per year. What is the predicted population size after 2 years?

4,500

Suppose a population originally consists of 20 individuals, and the net reproduction rate, r, is 3.0 per week. The carrying capacity of the system is K=40. What is the predicted population size after one week?

50

If A and B are R vectors containing 10 elements each, what does the operation A*B produce? A 10x10 matrix A scalar (single) value Another vector with 20 elements Another vector with 10 elements

Another vector with 10 elements

Which of the following is not one of the possible pitfalls in using automated model selection methods? The fact that slightly different automated model selection methods can pick different models. Automated model selection methods are limited to models which just a few explanatory variables The temptation to just let the computer do the thinking and not consider other information relevant to the model (like whether it makes sense) The risk of multiplicity of p-values since many models are evalua

Automated model selection methods are limited to models which just a few explanatory variables

Answer the questions 2 and 3 according to the following nested model: the quality of two different baker's work is investigated by selecting two cakes made by each of them, and then taste-testing 10 slices from each cake. What will be the formula to calculate the EMS for Bakers if σB2 is the variation between bakers, σC2 is the variation between cakes and σE2 is the variation between slices? Baker EMS = 2 σB2 + 2 σC2 + 10 σE2 Baker EMS = σB2 + σC2 + σE2 Baker EMS = 20 σB2 + 10 σC2

Baker EMS = 20 σB2 + 10 σC2 + σE2

What does the function log(x) do in R? This isn't a log(x) function in R, the logarithm function is log10(x) Calculates the log base 10 of x Calculates the log base 2 of x Calculates the natural logarithm of x

Calculates the natural logarithm of x

Which of the following strategies is not a legitimate approach to dealing with the multiplicity of p-values in complex models? Combine explanatory variables Reduce the number of explanatory variables by combining multiple terms into a single term Reduce cutoff p-value Eliminate data points until just a few models have low p-values

Eliminate data points until just a few models have low p-values

Which of the following is true about applying both forward and backward stepwise regression to find the best model for a data set? Forward and backward stepwise regression will never yield the same final model Forward and backward stepwise regression may or may not yield the same final model It's not possible to use both forward and backward stepwise regression on the same data set Forward and backward stepwise regression will always yield the same final model

Forward and backward stepwise regression may or may not yield the same final model

If you don't know anything about the experimental design that was used to collect data, which of the following might be an indication of nonindependence in the data? Having a multiple R squared value that's very close to 1. Having p-values that are all extremely low Having p-values that are all extremely high Having an unusually large number of residual degrees of freedom

Having an unusually large number of residual degrees of freedom

What constitutes the "best" statistical model you can build? It depends on your goal in building the statistical model. The model with the highest R2 value without regard for the number of explanatory variables. There are no useful criteria for what constitutes the best model. The model with the fewest explanatory variables that still have a reasonable R2 value

It depends on your goal in building the statistical model.

The model choice principle of "Economy of Variables" means: Hierarchies must be respected in model formulae Models should have the simplest parameters possible while still retaining good statistics Only work with orthogonally designed studies The number of possible models grows quickly as the number of explanatory variables increases

Models should have the simplest parameters possible while still retaining good statistics

Which of the following is a major concern in analyzing Type 4 observational datasets? Multiplicity of p-values Too many data points to handle with current computers Too few explanatory variables The requirement that all explanatory variables are orthogonal

Multiplicity of p-values

Which of the following is NOT a common cause of nonindependence? Nested data Nonorthogonality in a dataset Heterogeneity in a dataset Replicate measures on the same test subject

Nonorthogonality in a dataset

Which of the following is not true about computer programming? Programming involves almost any form of giving instructions to a computer Until the 1980's there we relatively few jobs involving computer programming Modern programming languages like R have simplified the process of programming Only people with degrees in Computer Science should try programming

Only people with degrees in Computer Science should try programming

Which of the following is NOT one of the three principles of model choice: Multiplicity of p-values Economy of variables Orthogonality Considerations of marginality

Orthogonality

Which of the following is an advantage of using the adjusted R2 for observational data sets? The adjusted R2 value is automatically diminished if the model is using up too many of the available model degrees of freedom. The adjusted R2 is guaranteed to be higher that the unadjusted, multiple R2 value. The adjusted R2 ensures that the explanatory variables are orthogonal. The adjusted R2 eliminates the risk of non-independence in the data set.

The adjusted R2 value is automatically diminished if the model is using up too many of the available model degrees of freedom.

Which of the following is true about the adjusted R2 value? The adjusted R2 will always be greater than or equal to the multiple R2. The adjusted R2 can be less than or greater than the multiple R2. The adjusted R2 will be approximately equal to the multiple R2 whenever you have very few data points. The adjusted R2 will always be less than or equal to the multiple R2.

The adjusted R2 will always be less than or equal to the multiple R2.

What is the "carrying capacity" of a system? The maximum number of individuals that can be sustained in the system The average weight of the adults in a system divided by the average weight of newborns in the system The maximum amount of food that each individual in a system can carry (on average)

The maximum number of individuals that can be sustained in the system

Which of the following is true for the random effects in ANOVA models? Ca content in multiple leaves from the same plant have fixed effects in a model with data from different plants. In the nested ANOVA models with random effects the denominator for the F-ratio is always the error variance. Variance of the random effects is reflected in the expected mean squares in the nested models. Random effects come from totally independent data.

Variance of the random effects is reflected in the expected mean squares in the nested models.

Which of the following is the best strategy for checking for independence in a data set? Check whether the data was collected on or near the 4th of July. Graph the data and look for relationships in the residuals Watch out for issues in the experimental design that might indicate nonindependence of the data. Check the p-value for the independence term in the anova output

Watch out for issues in the experimental design that might indicate nonindependence of the data.

Which of the following is a risk of including higher polynomial powers of the continuous explanatory variables in your models? There is no risk; more polynomial powers always makes a better model It's a waste of time since adding polynomial powers never yields a better model. Including more polynomial powers in the model makes it very slow to calculate the fitted parameters You may over fit your data which can lead to a perfect, but meaningless fit.

You may over fit your data which can lead to a perfect, but meaningless fit.

Which one of the following models violates the principle of consideration of marginality? Y~A+B+C+A:B+B:C+A:C+A:B:C Y~A+B+A:B Y~A+B+A:B+B:C+A:C+A:B:C Y~A+B+C+A:B+A:C

Y~A+B+A:B+B:C+A:C+A:B:C


Related study sets

Organization - Employee and Labor Relations

View Set

Chapter 18 - Neurologic Emergencies

View Set

GERIATRIC ASSESSMENT B #2 RACIEL

View Set

Development of empathy and theory of mind

View Set

Teratogens, Mutagens & Carcinogens

View Set