CSU FW 370 Exam
cluster sampling
-In a first step, primary sampling areas are located at random within an overall study area. -In a second step, secondary sampling units are located at random within each primary sampling area. -Each group of secondary sampling units is called a cluster
Yi = b0 + b1Xi + Ei
-Yi - dependent variable -b0 - intercept coefficient; the estimated average value of y when the value of x is 0 -b1 - slope coefficient; the estimated change in the average value of y as a result of a 1 unit change in x -Xi - independent variable -Ei - error/residuals
one-way ANOVA
-an extension of the t-test to 3 or more samples -testing for group differences -one independent variable, with more than two "levels", to explain variability in the dependent variable -e.g., habitat quality (low, medium, high) impacts on sage grouse nesting success -is the same as a t-test but more than 2 groups -H0: no difference in means among groups Ha: difference in means between groups
ANOVA
-analysis of variance - statistical technique for comparing means across multiple (usually 3, or more) independent treatments/groups -partition the total variation in the data into: variability between groups (treatment) and variability within groups (error)
normal distribution
-bell-shaped and symmetric/asymptotic to the x-axis -one standard deviation (SD) from the mean in both directions yields 68% of the data distribution; 2 SDs yields 95%; 3 SDs yields 99.99%
advantages of observational studies and natural experiments
-broad temporal scale (tough to maintain field experiment long enough to catch dynamics; natural experiment - 1000s to millions of years) -broad spatial scale (field experiments limited to area you can enclose) -appropriate for large scale ecological and evolutionary questions
advantages of manipulative experiment in the field
-compared to lab experiments (more realism; larger scale and broader scope) -compared to true natural experiments (greater control)
advantages of manipulative experiment in the lab
-complete control -can randomize and match all other conditions exactly -replication easy (usually) -only way to fully examine cause/effect hypotheses
observational study
-consist of measurements or observations of biological systems in the absence of manipulations by the investigator
continuous probability distributions
-continuous probability distributions functions can take many forms; these include continuous uniform, normal, exponential, student-t, wilcoxon, etc.
probability distributions
-discrete (binomial) distributions (e.g., clutch size) - a discrete distribution can take on a finite or countable number of possible outcomes (e.g., 1, 2, 3, 4, 5, 6...); includes discrete uniform, bernoulli, binomial, poisson, etc. -continuous (normal) distributions (e.g., birth weight)
quantitative (numeric) data
-discrete data - the variable can only take integer values (0, 1, 2, etc.); ex. number of eggs in a nest -continuous data - any real number (often within a certain range); ex. body weight
cluster sampling
-divide population into groups (clusters), random sample of the clusters, sample all units in cluster -sampling clusters are helpful when a random sample is not possible because some units are too far away, too expensive, inaccessible, etc.
fixed effect - two-way ANOVA
-extension of one-way ANOVA with 2 or more independent variables; also called a factorial design -e.g., does bird diversity vary by habitat and elevation in Poudre Canyon
two-way ANOVA
-focuses on the interaction of factors -does the effect due to one factor change as the level of another factor changes? -e.g., do gender and smoking status affect your probability of getting lung cancer -allows to test for interactions
manipulative experiments
-lab experiments (not conducted in "natural" setting): directly manipulate the predictor variable(s) in a laboratory setting -field experiments (conducted in a "natural" setting): directly manipulate the predictor variable(s) in a "natural" setting -manipulative experiment - requires manipulation of the predictor variable(s) in the lab or in the field
what 3 things will increase the value of t?
-large difference between x1 and x2 -small standard deviation (s1 and s2) -large sample size (n1 and n2)
limitations of observational studies and natural experiments
-limited site matching because treatments provided by nature -limited replication (nature) -correlation, not causation
parametric correlation (Pearson's r)
-measures "linear" associations -the value of r is independent of units used to measure the variables -R tells us what proportion of variation in Y is explained by the linear relationship with X -the correlation coefficient r takes on values between -1 and +1 (-1: perfect negative linear relationship; 0: no linear relationship; +1: perfect positive linear relationship) -hypotheses - H0: r=0, no correlation; Ha: r doesn't = 0, correlation exists
observational studies & natural experiments
-natural environmental variation or large-scale perturbations -e.g., wildfires and other natural disasters, disease outbreaks -compare groups before and after or with and without the perturbation
qualitative (non-numeric) data
-nominal data - unordered categories; ex. blood group, eye color, breeding status -ordinal or ordered data - ordered categories; ex. behavior (subordinate, neutral, aggressive), attitudes (good-moderate-bad)
the Spearman rank correlation
-non-parametric alternative -the null hypothesis is that the ranks of one variable do not co-vary with the ranks of the other variable
limitations of manipulative experiment in the lab
-often lacks realism -limited spatio-temporal scale -limited applicability to all species (ok for small species and difficult to house enough animals long enough to mimic nature)
limitations of manipulative experiment in the field
-replication is difficult -site matching is imperfect -limited spatiotemporal scale
systematic sampling
-samples selected at regular intervals throughout study area -advantages: convenient, easy, and provides full coverage -problems: if organisms distributed uniformly then sampling unit characteristics could consistently be hit or missed
adaptive cluster sampling
-sampling rare resources -sample the stratum containing the rare resources much more intensely -selection of sample units is based on the results of the previous sample -helpful for spatially clustered (and rare) organisms
criteria for causality
-strength of the association -consistency of the association -known mechanism that can explain the relationship -neither correlation nor regression can indicate causation, but a proper experimental design can help us get close (limiting confounding factors, control the "environment", can help imply causation)
stratified random sampling
-subpopulations (strata) from which a simple/systematic random sampling of units are taken -benefits: enables separate estimates of mean and variance in different strata (e.g., population density in different habitats); allows for different number of sample units in different strata
median
-the 'middle value' of the distribution of x -50th percentile of the distribution of x
binomial distribution
-the Bernoulli process - outcome classified as a success or a failure (2 possible outcomes: 0/1) (e.g., heads/tails, right/wrong, functional/defective, alive/dead) -the Binomial process (extension of Bernoulli process to n trials) - number of successes (X) in n Bernoulli trials is called a binomial random variable
coefficient of determination (R squared)
-the portion of the total variation in the dependent variable that is explained by the variation in the independent variable -R^2 = SSR/SST; where 0<=Rsq.<=1 -R^2=1: perfect linear relationship between x and y; 100% of the variation in y is explained by the variation in x -0<=Rsq.<=1: anywhere from good to weak linear relationship between x and y; some, but not all the variation in y can be explained by the variation in x -R^2 = 0: no relationship between x and y; values of y does not depend on values of x; none of the variation in y is explained by the variation in x
sample mean
-the sample mean is the arithmetic average of the data -it can be calculated by summing all of the data values and dividing the sum by the total sample size
control vs. treatment
-the treatment group (also called the experimental group) receives the treatment whose effect the researcher is interested in -the control group receives either no treatment, a standard treatment whose effect is already known, or a placebo (a fake treatment to control for placebo effect)
equality (or "homogeneity") of variances (i.e., homoscedasticity)
-the variance across groups should be the similar -note the antonym of homoscedasticity or heteroscedasticity
3 elements of experimental design
1) randomization (plots selected randomly) 2) replication (5 plots per treatment level) 3) control
hypothetico-deductive process
1) recognize a problem 2) integrate and synthesize relevant outside info 3) establish hypotheses 4) articulate predictions 5) test predictions with experiments /studies 6) collect, organize, and summarize data 7) analyze the data 8) interpret the results 9) communicate your findings
Based on the ANOVA table provided in question 18, the sums of squares within groups equals:
200
Based on the ANOVA table provided in question 18, the F-statistic equals:
7.2
The table below shows partial results of an ANOVA testing for differences in small mammal species richness among several different habit treatment types, where the number of observations is denoted as N, and the number of treatments is denoted as k. Df SS MS F-stat B/t group (treatment) k-1=3 270 ? ? W/i group (residuals) N-k=16 ? 12.5 Total N-1=19 The mean sums of squares between groups equals?
90
chi square random variables
= sums of squared normal random variables
Referring back to question 16, what would be the most appropriate statistical test for analyzing these data? a) a one-tailed t-test b) a two-tailed t-test c) a two-way ANOVA
A) a one-tailed t-test
Deduction or Induction? 1. All of the ants in the Harvard forest belong to the genus myrmica 2. I sampled this particular ant in the harvest forest 3. This ant is in the genus myrmica
Deduction
True or False: Sampling bias induces a lack of precision.
False
systematic sampling
In systematic sampling, a spatial grid is used to generate equal sized sampling regions. A sample is then taken within each grid cell, either aligned to the center of the cell or randomly placed
Deduction or Induction? 1. All 25 of these ants are in the genus myrmica 2. All 25 of these ants were collected in the Harvard forest 3. All of the ants in the Harvard forest are in the genus myrmica
Induction
Based on the design described above (question 9), which is your "control"?
Island A
SSB mean square
MSB = SSB/k-1
F-statistic observed
MSB/MSW
SSW mean square
MSW = SSW/N-k
Does least square regression systematically imply causation between the independent and dependent variables?
No
Based on the design described above (question 9), what kind of general experimental design is this? a) manipulative lab experiment b) manipulative field experiment c) observational study/natural experiment d) none of the above
Observational Study/Natural Experiment
Your lab has been conducting a long-term study on Kirtland's warbler, a rare songbird native to Michigan. Brood parasitism by brown-headed cowbirds has been highlighted as a potential cause for declining Kirtland's warbler populations. You sample 20 fledglings from parasitized nests and 20 fledglings from non-parasitized nests and measure their body condition (i.e., weight-to-culmen ratio, the larger and the better their body condition). H0: the mean body condition is the same for fledglings from parasitized and non-parasitized nests. The results are from a two sample t-test of these data with the output based on an analysis in R. Two sample t-test: t = -3.3511, df = 37.517, p-value = 0.001846 Based on the two-tailed test results in the table above, what should we conclude from the test results (assume alpha = 0.05)?
Reject the Null hypothesis
explained and residual variation
SST = SSR + SSE
True or False: A manipulative experiment is a direct manipulation of the independent (or explanatory) variable(s) in a lab or field setting, leading to causality between the dependent and independent variables involved.
True
True or False: Replicates are multiple observations within the same treatment group that help improve precision in the dependent variable of interest.
True
True or False: Sampling error decreases as sample size increases.
True
True or False: The t-statistic is the ratio of the variability between groups to the variability within groups, which we compare to the t-critical value.
True
sample
a collection of subjects selected within the study population; a sample should be representative of the target population
level
a particular value / state of a factor (e.g., hot, cold)
dataset
a set of values for all variables of interest measured across all individuals in the study
prediction
a statement about what you expect to find from your experiment
factor
a variable of interest (e.g., temperature)
The Arkansas River, near Leadville, CO, has been contaminated by heavy metals from mining activities for > 150 years. If organisms in the Arkansas River have adapted to mining pollution, we would expect them to be more tolerant of metals than organisms from unpolluted streams. The data reported here show results of microcosm experiments that compared effects of heavy metals (controls vs. treatment) on mayfly numbers for animals collected from the Cache la Poudre River (unpolluted stream) and the Arkansas River (metal polluted stream). Clements 1999. Ecological Applications 9:1073-1084. Which of the following is the dependent variable? a) mayfly numbers b) heavy metals (control vs. treatment) c) stream type (polluted vs. unpolluted)
a) Mayfly numbers
Defined SSR (Regression Sums of Squares): a) SSR measures the variability attributable to the relationship between the dependent and independent variable within a regression framework b) SSR measures the variability attributable to factors other than the relationship between the dependent and independent variable within a regression framework c) none of the above d) all of the above
a) SSR measures the variability attributable to the relationship between the dependent and independent variable within a regression framework
Within what range do the possible values of the Pearson Correlation Coefficient (r) fall? a) [-1; +1] b) [0, +1] c) [-1; 0] d) none of the above
a) [-1; +1]
If we wanted to test whether mean body condition is less in fledglings from parasitized nests than in fledglings from non-parasitized nests, what test could we conduct? a) a one-tailed t-test b) a two-tailed t-test c) a one-way ANOVA
a) a one-tailed t-test
In an ANOVA test, the null hypothesis generally states that: a) all group means are equal b) all group means are different c) none of the above
a) all group means are equal
Based on the design described above (question 9), which of the following could be considered confounding factors? a) bird predation b) slope c) elevation d) aspect e) field techniques
a) bird predation
As participation in outdoor recreational activities escalates, land managers struggle to develop management policies that ensure coexistence of wildlife with recreation. Miller et al. (2001) measured responses of mule deet (Odocoileus hemionus) to one of 3 treatments pedestrian on trail, pedestrian off trail, and pedestrian off trail accompanied by a dog. If animals were disturbed (i.e., moved away from the treatment), the researchers measured flush distance (the distance between the disturbance and the animal when it flushed). Miller et al. 2001. Wildlife Society Bulletin 29:124-132. Which of the following is the dependent variable? a) flush distance b) pedestrian on trail c) pedestrian off trail d) pedestrian off trail accompanied by a dog
a) flush distance
A group of wildlife biologists working for the Department of Natural Resources want to test whether habitat improvements can help increase mule deer abundance. They conduct a field experiment in northern Colorado and estimate mule deer abundance by sampling 10 plots (200 hectares each), which are randomly assigned to two treatments: control (5 plots) and nutrient addition (5 plots). Note that the treatments are applied from fall 2019 to fall 2020, with counts conducted in the fall of 2020. What is the dependent variable? a) mule deer counts b) treatment effect (control vs. nutrient supplementation) c) both mule deer counts or the treatment effect could work as dependent variables
a) mule deer counts
Florida grasshopper sparrow habitat quality populations of the Florida grasshopper sparrow (Ammodramus savannarum floridanus) are small and declining. To better understand habitat requirements of this species, Shriver and Vickery (2001) measured percent grass cover within territories where successful breeding occurred and within adjacent territories where no birds were breeding. (Reference: Shriver and Vickery 2001. Journal of Wildlife Management 65:470-475.) Which of the following is the dependent variable? a) percent grass cover b) occurrence/absence of successful breeding birds
a) percent grass cover
Referring to the previous question, what would be an appropriate test to investigate the effect of the independent variable on the dependent variable? a) t-test b) one-way ANOVA c) least square regression
a) t-test
Which of the following is not a discrete probability distribution: a) the normal distribution b) the binomial distribution c) the poisson distribution
a) the normal distribution
sample space
all possible outcomes for that set of trials (ex. female has 3 pups, what is the probability of having 3 female pups)
weighted mean
allows weighting data points by importance
population
an aggregate of subjects, individuals, animals, plants, cars, things; the population is usually what we wish to describe and draw conclusion about
mean
arithmetic average of the data
The coefficient of determination R2 = (Notes: SST stands for Total Sums of Squares; SSR stands for Regression Sums of Squares; SSE stands for Error Sums of Squares) a) SSE/SST b) SSR/SST c) SSE/SSR
b) SSR/SST
The coefficient of determination R2 ranges from: a) [-1; +1] b) [0; +1] c) [-1; 0]
b) [0; +1]
Same example (lizards and spiders). What could ruin your experiment? a) replicates b) confounding factors c) controls d) independence of observations e) all of the above f) none of the above
b) confounding factors
Less than 25% of the bottomland forest in the Mississippi Alluvial Valley remains today (Twedt and Loesch 1999). If managed appropriately, agroforests (i.e., forests that are managed for timber harvest) can supplement these remaining bottomland forests and provide habitat for birds. Previous investigations by Tomlinson (1977) and Twedt (et al. 1999) found that the number of bird species was significantly lower in young agroforests than in mature bottomland forest. The project described here examined the relationship between the age of a parcel of agroforest and the diversity of bird species (species/13.5 ha). Twedt et al. 1999. Forest Ecology and Management 123:261-274. Which of the following is the dependent variable? a) age of a parcel b) diversity of bird species
b) diversity of bird species
Say you don't have enough resources to run the experiment above properly, you decide to visit different islands where lizard densities are known to vary and where the same communities of both spiders and lizards exist. You select 4 islands: -Island A: no lizard -Island B: low density of lizards -Island C: medium density of lizards -Island D: high density of lizards Within each island, you randomly select 10 plots of 1 meter x 1 meter where you measure spider density. You re-sample each plot three times: at the beginning, in the middle, and at the end of the field season that spans 3 months. Plots are selected so that they are similar in slope, aspect, and elevation. Plots are sampled at about the same time of day using consistent field techniques. What are your treatments? a) spider density levels b) lizard density levels c) the plots d) none of the above
b) lizard density levels
Referring to the previous question, what would be an appropriate test to investigate the effect of the independent variable on the dependent variable? a) t-test b) one-way ANOVA c) two-way ANOVA d) least square regression
b) one-way ANOVA
Based on the design described above (question 9, what are our spatial replicates within each treatment (as opposed to temporal replicates)? a) islands b) plots c) visits to each plot d) none of the above
b) plots
Say that, like Spiller and Schoner 1998 (see G&E textbook), you are interested in addressing the potential impact of lizard predation on spider densities. You set up an experiment with multiple enclosures where lizard density has been manipulated, and where spider density is measure after one day of predator exposure. Which of the following is true: a) lizard density is the dependent variable (i.e., the response) and spider density is the independent variable (i.e., the explanatory variable) b) spider density is the dependent variable (i.e., the response) and lizard density is the independent variable (i.e., explanatory variable) c) lizard density could serve as either the dependent or independent variable d) spider density could serve as either the dependent or independent variable
b) spider density is the dependent variable (i.e., the response) and lizard density is the independent variable (i.e., the explanatory variable)
Which of the following is an appropriate statistical model to test for a relationship between a continuous dependent variable and a categorical independent variable: a) correlation or regression b) t-test or ANOVA c) none of the above
b) t-test or ANOVA
Creet et al. (2002) found that wolves in Yellowstone that are exposed to snowmobiles have 872 ng GC/g in their blood, whereas wolves that are not exposed to snowmobiles have 1468 ng GC/g (GC stands for Glucocorticoid hormones). They propose to conduct a two-tailed t-test to address the scientific hypothesis that snowmobiles may be responsible for the observed increase stress levels in the group of wolves that is exposed to that disturbance. Accordingly, they formulate their null and alternative hypothesis: (H0): The difference in mean GC between groups is no greater than what we would expect to find by chance alone. What would be an appropriate alternative hypothesis (HA)? a) the presence of snowmobiles does not explain the difference between GC levels across the two groups b) the difference in GC levels between the 2 groups is too large to be accounted for by chance alone and could be due to the presence of snowmobiles c) the difference in GC levels between the 2 groups has nothing to do with the presence of snowmobiles
b) the difference in GC levels between the 2 groups is too large to be accounted for by chance alone and could be due to the presence of snowmobiles
Referring to the previous question, what is your dependent variable? a) barred owls: presence/absence b) timber harvest: control/treatment c) Northern spotted owl reproductive success
c) Northern spotted owl reproductive success
Say you are interested in the probability of a breeding success in black legged kittiwake breeding pairs. Breeding success can be categorized as either a failure (none of the chicks survive to independence) or a success (at least one chick successfully fledges). You sample a total of 30 nests at the end of the breeding season and classify each nest as a success or a failure. Which probability distribution will most likely fit the data you have collected? a) an exponential distribution b) a normal distribution c) a binomial distribution d) a poisson distribution
c) a binomial distribution
You are interested in whether difference fishing strategies affect fish foraging success after release. You place fish (n=10 per treatment) captured via fly rod, lure, electroshocker, and seine net in a large tank and time how long it takes them to consume insect prey. What would be the most appropriate statistical model to test for the effect of fishing method on insect consumption? a) a one-tailed t-test b) a two-tailed t-test c) a one-way ANOVA
c) a one-way ANOVA
How can one decide on whether to reject, or fail to reject, the Null hypothesis (H0): a) compare the p-value to significant level alpha b) compare a test statistic to a critical value c) all of the above d) none of the above
c) all of the above
The scientific method is: a) a technique used to decide among hypotheses on the basis of observations and predictions b) a technique that involves both deduction and induction c) all of the above d) none of the above
c) all of the above
Which of the following is an appropriate statistical model to test for a relationship between a continuous dependent variable and a continuous independent variable: a) correlation b) least square regression c) all of the above d) none of the above
c) all of the above
Based on the design described above (question 9), which measures of spider density may be pseudoreplicated? a) measurements taken from the 10 plots within the same island b) measurements taken from the 40 plots across all islands c) measurements taken from a given plot at 3 points in time (early, middle, late in the field season) d) there are no psuedo replicates in this design
c) measurements taken from a given plot at 3 points in time (early, middle, late in the field season)
Referring to the previous question, which element(s) of a proper experimental design have not been addressed? a) presence of a control b) replication c) randomization d) all of the elements of a proper experimental design have been considered
c) randomization
Which of the following is a continuous probability distribution: a) the binomial distribution b) the poisson distribution c) the exponential distribution
c) the exponential distribution
Imagine you work for USFWS and a lumber company would like to harvest in North Western California. You are concerned about the effects of timber harvest on northern spotted owl reproduction, and are also worried about how that might facilitate barred owls invasion (they compete with northern spotted owls). You decide to test the following null hypothesis: H0 = Northern spotted owl nest success is unaffected by the timber treatment and barred owls occurrence. You set up an experiment where 20 plots are visited to establish northern spotted owl reproductive success. 10 plots are controls where timber harvest is excluded, 10 plots are selected within areas where timber harvest is allowed. Within the control and the treatment, 5 plots are selected in areas where barred owls are known to co-occur and 5 plots are selected in areas that barred owls don't utilize. Which of the following statistical models would you choose to test H0? a) t-test b) one-way ANOVA c) two-way ANOVA d) correlation e) least square regression
c) two-way ANOVA
Referring to the previous question, what would be an appropriate test to investigate the effect of the independent variables on the dependent variable? a) t-test b) one-way ANOVA c) two-way ANOVA d) least square regression
c) two-way ANOVA
two-sample paired t-test
compare means taken from same sample two times; also called repeated measures t-test (ex. bird abundance before and after burn)
one-sample t-test
compare population mean and some fixed value (ex. marmot emergence is less than it was reported 50 years ago)
two-sample unpaired t-test
compare two population means (ex. mean heights of trees in burned vs. unburned forests)
conditional probabilities
conditional probability of an event, B, in relation to another event, A, is the probability that B occurs given that A has already occurred
inference: linear relationship
correlation (lack of predictive power); simple linear regression (predictive power)
According to Gotelli and Ellissson, what distinguished the hypothetico-deductive process from other types of science (i.e., descriptive, induction, deduction) a) falsification b) the development of competing hypotheses c) repeated attempts to falsify original hypotheses d) all of the above e) none of the above
d) all of the above
Referring to the previous question, what would be an appropriate test to investigate the effect of the independent variable on the dependent variable? (Note that the age of the parcel of agroforest is a continuous variable that ranges between 0 and 20 years). a) t-test b) one-way ANOVA c) two-way ANOVA d) least square regression
d) least square regression
descriptive statistics
describes patterns in population (e.g., means and standard deviation)
SST df
df = N-1
SSW df
df = N-k
SSB df
df = k-1
stratified sampling
divides the population into groups called strata For instance, the population might be separated into habitat types. A sample is taken from each of these strata using either random, systematic, or convenience sampling
simple random sampling
each sampling unit in the population has an equal chance of being selected
SSE
error sum of squares; variation attributable to factors other then the relationship between x and y
simple random sampling (SRS)
every possible sample unit (or population) in the study area has an equal chance of being sampled during each sampling event
confounding factors
example: do snowmobiles impact wolves negatively? suppose the group of wolves exposed to snowmobiles had also been chased by hunting dogs, how do you know the stress response is due to snowmobile or dogs? In this case, the treatment effect is confounded with other differences between the control and treatment groups (exposure to hunting dogs) that are potentially related to stress levels
fixed factor
finite possible levels for the factor (e.g., hot vs. cold)
geometric mean
for average rates of change, growth, or ratios; popular in economics and population ecology
statistical models for categorical (discrete) dependent variable and continuous independent variable
generalized linear model (e.g., logistic regression)
precision
how close are repeated measures to each other
accuracy
how close is the measured value to the true value
discrete uniform
if the random variable X assumes the values x1, x2, xk with equal probabilities (ex. when you roll a dice, each element of the sample space
one-tailed test (right tail)
if the t-calculated is greater than the t-critical, we reject the null
one-tailed test (left tail)
if the t-calculated is less than the t-critical, we reject the null
MANOVA
if there are more than one dependent (response) variable (e.g. pelvis width and body mass), you can test them simultaneously using multivariate analysis of variance
one-tailed
indicates directionality H0: Mx = Mo Ha: Mx >= or <= Mo
statistical models for continuous dependent variable and continuous independent variable
linear or multiple regression
inferential statistics
make inference about populations (e.g., are these 2 populations different? t-test; was the treatment effect significant? ANOVA; is there a relationship between the response and the explanatory variables? regression)
sample unit or observation
measurement unit (e.g., subjects, individuals, animals, plants, cars, things)
correlation analysis
measures the strength and direction of the linear relationship between 2 variables
pseudo-replication
multiple observations within a single unit are pseudo-replicates of one another (they are not independent of one another)
error
natural variation WITHIN a population
null hypothesis
no difference between compared groups
judgment sample
obtained at the discretion of someone who is familiar with the relevant characteristics of a population
inference: compare more than 2 means, 1 factor/treatment (multiple levels)
one-way ANOVA
regression analysis
prediction and estimation of a dependent variable based on the values of an independent variable
independent variable
predictor or explanatory variable, x-axis
type-II error
probability of not rejecting the null hypothesis although it is false
type-I error
probability of rejecting the null hypothesis although it is true
variable
qualitative or quantitative, measured or recorded for each subject in the sample (e.g., age, sex, height, weight)
Poission distribution
random variable = count data / bounded by 0, no upper limit / one-parameter distribution (ex. in ecology: abundance estimation, based on counts)
SSR
regression sum of squares; explained variation attributable to the relationship between x and y
t-statistic > critical value
reject the null hypothesis
what do large t-values mean
reject the null hypothesis
dependent variable
response variable, y-axis
convenience sample
results when most convenient units are chose from a population
normality
sample size is large enough that the distribution of the residuals are normally distributed
distributions
symmetric (mean, median, and mode all equal), skewed left (-) (mode is more than median and mean is less than median), and skewed right (+) (mean is greater than median and mode is less than median)
inference: compare 2 means
t-tests
statistical models for continuous dependent variable and categorical (discrete) independent variable
t-tests; ANOVA (one-way, two-way)
statistical models for categorical (discrete) dependent variable and categorical (discrete) independent variable
tabular; contingency tables/GOF tests
sampling bias
tendency to favor the selection of units having particular characteristics (e.g., food-conditioned bears)
two-tailed
testing for a difference; no directionality H0: Mx = Mo Ha: Mx not equal to Mo
ANCOVA
the analysis of covariance - used to compare two or more regression lines; testing the effect of a categorical factor (independent variable: e.g., group 1 and 2) on a dependent variable while controlling for the effect of a continuous independent variable
p-value
the area under the standard distribution curve representing the probability of the sample statistic occurring if the null hypothesis is true p-value <= alpha (reject null at level alpha) p-value > alpha (do not reject null at level alpha) p-value area under the curve to the right of the t-calculated
sampling error
the difference between the sample and the population that is due solely to the incomplete enumeration of all elements of the population (chance)
outcome
the result of a random experiment (ex. pup was caught or escaped; male or female pup)
independence of observation
the value of one observation does not influence or affect the value of other observations
alternative hypothesis
there is a difference between compared groups
SST
total sum of squares; measures the variation of yi around the mean y
random factor
you have no control over temperature in the field (e.g., nest survival: in the field every day for a month)
probability
the long-run relative frequency of occurrence of each possible event
mode
the most common value of x observed in the sample
trial
the number of events (ex. each birth of a pup)
event
the occurrence of a phenomenon of interest (ex. the birth of a sea lion pup)
True or False: A two-sample t-test is a statistical test that compares the means of two samples.
True
True or False: A two-tailed t-test is a statistical test that tells the investigator if there is a significant difference between two sample means.
True
True or False: Sampling bias induces a lack of accuracy.
True
exponential distribution
continuous single-parameter distribution (lambda)
F random variable
the ratio of two independent chi square random variables divided by their respective degrees of freedom
sample space
the set of possible outcomes
inference: compare 2 means, 2 factors/treatments
two-way ANOVA
harmonic mean
used as a measure of central tendency for data consisting of rates of changes (e.g., speed)
simple linear regression analysis
used to predict value of a dependent variable based on the values at least one independent variable and explain the impact of changes in an independent variable on the dependent variable -relationship between X and Y is described by a linear function; changes in the dependent variable Y are assumed to be caused by changes in the independent variable X
if f-statistic > f-critical
variability between groups is large compared to variation within groups; reject null hypothesis
if f-statistic < f-critical
variability between groups is negligible compared to variation within groups; fail to reject null hypothesis
treatment effect
variation BETWEEN populations