Stats exam 1
Bird banding stations band birds with different color bands, in different orders, so that they can identify each individual bird. One bird might have the band green over blue, while another has the band yellow over orange. What type of variable is bird band?
categorical-nominal
degrees of freedom formula
df = n-1
Standard Error
the standard deviation of the sampling distribution of all the possible sample means based on samples of the same size from the same population that make up the sampling distribution
z-test
-sample is large -sigma is known Conditions: 1. random sample 2A. If n is large (n>1000), then the sample is good 2B. if n is small, sigma is known and population is normally distributed
t-test
-sigma is unknown Conditions: 0. sigma is unknown 1. random sample 2A. normally distrubuted 2B. n is large (n > 30)
z test codes
1-pnorm(z) ----->right tailed 2*(1-pnorm(z)) ------> two tailed and z statistic is positive pnorm(z) ------->left tailed 2*(pnorm(z)) -------> two tailed and z statistic is negative
t test codes
1-pt(t score , df) ----->right tailed 2*(1-pt(t score , df)) ------> two tailed and t statistic is positive pt(t score , df) ------->left tailed 2*(pt(t score , df)) -------> two tailed and t statistic is negative
interpret the p-value
There is a [p-value] probability of getting [z or t statistic value, df value] or more extreme , assuming that the [null statement] is true
Fail to reject the null hypothesis
when the p-value is greater than the significance level alpha
Reject the null hypothesis
when the p-value is less than the significance level alpha
Left-Skewed Distribution
xbar < M
Symmetric/Unimodal Distribution
xbar ~ M
Right-Skewed Distribution
xbar > M
TRUE or FALSE if we take another sample to test the hypothesis, the null value for the population mean could change
false
If I increase my sample size from 30 to 75, then I can expect
the standard error to decrease
TRUE or FALSE if we take another sample to test the hypothesis, the observed sample mean could change
true
TRUE or FALSE it is impossible to have a different conclusion if we collect another sample of the same size again
true
TRUE or FALSE using a t-distribution takes into account the uncertainty of using sample statistics in the test statistic calculation
true
Quantitative Variables (numeric)
1. Discrete Variable: countable values, countably infinite values; increases by a whole # ex: number of pages in a book; number of cars in a drive thru 2. Continuous Variable: infinite number of possible values over an interval; limited by technology able to measure; continuous growth ex: height, weight, tree
Qualitative Variables (categorical)
1. Nominal Variable: name, category as value ex: major, gender, occupation 2. Ordinal Variable: ranked variable, specific order to the values of the variable ex: class level
Types of Sampling Designs
1. Simple Random Sampling: same probability for all individuals in the pop to be in a sample; no bias 2. Stratified Random Sampling: division of a population into smaller groups of members that share characteristics; simple random sampling then occurs within each group 3. Volunteer Sampling: people volunteer to be part of the sample; can be biased if the volunteers have different characteristics than the non-volunteers 4. Convenience Sampling: sampling method is chosen based on researcher's accessibility and convenience (biased results)
characteristics of normal distribution
1. bell shaped, unimodal, symmetric 2. symmetric about the mean, mu 3. mean=median=mode; single-peaked at the highest point 4. the area under the curve is 1 and represents the probability of observing a particular value on the x-axis or more extreme 5. the area under the curve to the right of mu is equal to the area to the left of mu and each is equal to 50% 6. the standard deviation, sigma, is the best measure of spread for the normal distribution 7. if you know the mean, mu, and sd, sigma, then you know exactly what the normal distribution for the variable will look like
benefits of histograms
1. can see the frequency/details 2. can see the modality 3. changes with binwidth
benefits of boxplots
1. outliers are clear 2. good overview especially with small numbers (sample size) 3. based on resistant statistics 4. doesn't change with binwidth
Categories of description for the distribution of a variable
1. shape: the symmetry and modality of the distribution (mean/median, mode, quartiles) 2. center: the location of the data that most of the data is grouped around (central tendency) (mean, median, mode) 3. spread: the dispersion of the values of the data (range, IQR, sd) 4. outliers: data values that are not a part of the overall pattern (IQR, mean/sd)
properties of the t-distribution
1. t-distribution is the sampling distribution of the t-test statistic 2. t-distribution is different from degrees of freedom (df=n-1) 3. t-distribution is centered at 0 and is symmetric about 0 4. the are under the curve is 1; the area under the curve to the right and left of 0 is equal to 50%; the distribution is positive (always above the x-axis 5. the area in the tails of the t-distribution is greater than the area in the tails of the standard normal distribution because we are using the sample sd as an estimate of the pop sd 6. as sample size n increases, the density curve of the statistic t gets closer to the standard normal density curve. As sample increases, the values of sample sd get closer to the values of pop sd
5 part conclusion
Based on the sample (maybe random), with a significance level of alpha= 0.05, we found evidence [p-value, t-value/z-value, df value] to [fail to reject/reject] the null hypothesis that the [restate the null hypothesis] is true
experimental study vs observational study
ES: if a researcher randomly assigns seats OS: is subjects select their own seats
Suppose there is a severe skew to the right present in a histogram of a sample. Which statistics would be the best representation of dispersion/spread?
IQR
If n > ______ , then the sampling distribution of the mean will have a shape of ___________ , the mean will be equal to _________ and the standard error will be equal to __________
If n > 30 , then the sampling distribution of the mean will have a shape of approximately normal , the mean will be equal to mu and the standard error will be equal to sigma/sqrt(n)
Choose all the statements that apply to the Standard Error Select one or more: a. The standard error decreases as the sample size increases b. The standard error increases as the sample size increases c. The standard error is the standard deviation of all possible sample means from samples of the same size d. The standard error represents the spread of the sampling distribution e. The standard error represents the center of the sampling distribution f. The standard error represents the shape of the sampling distribution
a. The standard error decreases as the sample size increases c. The standard error is the standard deviation of all possible sample means from samples of the same size d. The standard error represents the spread of the sampling distribution
We are interested in finding the mean height of college students (a normally distributed variable). We take a sample of 10 college students on our basketball team. We want to test our sample using the CLT. Which of the following is true? Select one: a. We have violated the condition of a random sample. b. We have violated the condition of a sufficiently large sample. c. We have not violated any conditions of the CLT.
a. We have violated the condition of a random sample.
Which of the following best describe the purpose of stratified random sampling? Select one: a. To make sure every member of the population has an equal chance of being selected for the sample. b. To make sure the sample proportionately represents individuals from different categories of the population. c. To make sure the participants chosen for the study are the ones most likely to react to a particular treatment. d. None are correct. e. All are correct.
b. To make sure the sample proportionately represents individuals from different categories of the population.
If the assumptions of the Central Limit Theorem are satisfied, then Select one or more: a. we can use a standard normal curve as a null model because the distribution of means follows the standard normal curve. b. the standard normal distribution is the null model because it has the same properties as if we created a true sampling distribution of a test statistic c. we can use a standard normal curve to calculate the p-value for our test statistic d. we can use a normal distribution as a null model only if the population distribution is unimodal symmetric.
b. the standard normal distribution is the null model because it has the same properties as if we created a true sampling distribution of a test statistic c. we can use a standard normal curve to calculate the p-value for our test statistic
characteristics of sampling distributions of means
bell shaped symmetric unimodal as n increases, sd decreases
A standardized normal curve has which of the following properties? (Choose all that apply) Select one or more: a. Bell-shaped, uniform and symmetric b. It is symmetric about the mean, μ c. It has a mean of 0 and a standard deviation of 1 d. The area under the curve is one and represents the probability of observing a particular value on the x-axis or more extreme
c. It has a mean of 0 and a standard deviation of 1 d. The area under the curve is one and represents the probability of observing a particular value on the x-axis or more extreme
An experiment is conducted to ask people to taste five different chiles and then rate their level of hotness with the choice of the following responses: Mild. Medium. Hot. Very Hot. Firey. What type of variable are these responses?
categorical-ordinal
A voluntary survey was designed and distributed through online social media platforms dedicated to scuba diving to determine the prevalence of first aid incidents during a recreational scuba dive. Fifty-two percent of the certified diver respondents reported diving injuries (Beckett & Kordick 2007). What type of sampling strategy did the researchers use?
convenience sampling
Which type of plot is most useful for evaluating the mode?
histograms
which statistics are resistant to outliers?
median and IQR
what is a case in a study?
one row in a data set; one species
in fall 2016, 20.5 million students attended colleges and universities in the United States. The average age of all of these college students was 20.5. What type of quantity is the average age?
parameter
Symbols to know
population mean/parameter: mu sample mean : x bar null hypothesis: mu = _____ alternative hypothesis: mu < _______ population standard deviation: sigma sample standard deviation: s sample median: M population size: N sample size: n population variance: sigma squared sample variance: s squared
Which sampling strategy is a researcher using when every member of the accessible population has an equal chance of being selected to participate in a study?
simple random sampling
A researcher wants to estimate the average orchard size in the Central Valley. From a simple random sample of 40 orchards, the researcher obtains a mean orchard size of 65 acres. What type of quantity is the 65 acres?
statistic
Sampling distribution of a sample statistic
the distribution of values of a statistic in all possible samples of the same size from the same population
In a hypothesis test of a single mean, the t-distribution is used instead of the normal (z) distribution because
the population standard deviation is unknown