AP Stats Mid-Term Review
Good sampling design
-Examines a part of the whole -Has no bias -Randomized -Good sample size
The principle of a small elementary school wants to select a simple random sample of 24 students. The school has 12 classrooms with 18 students in each class. She decided to randomly select two students from each classroom. Is this a simple random sample?
No, because not all combination of 24 students could have been chosen.
A factory has 20 assembly lines producing a popular toy. To inspect a representative sample of 100 toys, quality control staff randomly selected 5 toys from each line's output. Was this a simple random sample?
No, because not all combinations of 100 toys could have been chosen.
Can watching a movie temporarily raise your pulse rate? Researchers have 50 volunteers check their pulse rates. Then they watch an action film, after which they check their pulse rates once more. Which aspect of experimentation is present in this research?
None
Normal cdf
Normal cdf(lower, upper, mean, standard deviation) = %
Box and Whisker plot
Numerical (quantitative data)
Ogive
Numerical (quantitative data)
Dot plot
Numerical (quantitative) with small sets
What is s (standard error)?
The standard error summarizes the typical residual (or error) size.
Z-score equation
(x minus mean) divided by standard deviation
The relationship between the number of hours a person practices a task and the time it takes them to complete the task is calculated to have R-sq = 56.7%. The value of the correlation coefficient is
-0.753
Good experimental design
-Control group -Randomization -Able to replicate
You record the age, marital status, and earned income of a sample of 1463 women. The number and type of variables you have recorded are:
2 quantitative, 1 categorical
If the heights of a population of men are approximately normally distributed, and the middle 99.7% have heights between 5'0" and 7'0", what is the standard deviation of the heights in this population?
4"
What is the 68-95-99.7 rule?
68% is between one standard deviation of the mean, 95% is between two standard deviations of the mean, 99.7% is between three standard deviations of the mean
Systematic sampling
A procedure in which the selected sampling units are spaced regularly throughout the population; that is, every n'th unit is selected.
Experiment
A research method in which an investigator manipulates one or more factors to observe the effect on some behavior or mental process
What is a residual plot?
A residual plot is the difference of the actual minus the predicted point. (good fit = randomization)
stratified sampling
A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group
How does adding or subtracting values from a data set affect the measures of center and spread? How about multiplying?
Adding or subtracting values shifts the data set up or down by the same amount, multiplying increases each by a different amount.
What is an influential point?
An influential point changes the slope and gives you a different model. (residuals can be small or large and are not guaranteed vertical or horizontal)
How do you find a score from a percentile?
By using invnorm.
Voluntary response bias
Bias due to the manner in which people choose to respond to voluntary surveys.
CUSS (for data)
C-Center U-Unusual features S-Spread S-Shape
Bar graph (and segmented bar graph)
Categorical
Mosaic Plot
Categorical (two way table data)
Pie Chart
Categorical (useful for displaying parts of a whole)
If we wish to compare the average PSAT scores of boys and girls taking AP Statistics at this high school, which would be the best way to gather these data?
Census
FODS (for scatterplots)
F-Form O-Outliers D-Direction S-Spread
Which statement about residuals plot is true? I. A curved pattern indicates nonlinear association between the variables. II. A pattern of increasing spread indicates the predicted values become less reliable as the explanatory variable increases. III. Randomness in the residuals indicates the model will predict accurately.
I and II only
Which is true about randomized experiments? I. Randomization reduces the effects of confounding variables. II. Random assignment of treatments allows results to be generalized to the larger population. III. Blocking can be used to reduce the within-treatment variability.
I and III
Which statement about correlation is true? I. Regression based on data that are summary statistics tend to result in a higher correlation. II. If r 2 = 0.95, the response variable increases as the explanatory variable increases. III. An outlier always decreases the correlation.
I only
Which is true about sampling? I. An attempt to take a census will always result in less bias than sampling. II. Sampling error is usually reduced when the sample size is larger. III. Sampling error is the result of random variations and is always present.
II and III
Finding z-scores
Invnorm(area, 0, 1) = z-score
Invnorm
Invnorm(area, mean, standard deviation) = upper value
What is a z-score and why is it used?
It is how many standard deviations you are from the mean.
When is median more appropriate than mean?
Median is more appropriate when the data is skewed.
Does regular exercise decrease the risk of cancer? A researcher finds 200 women over 50 who exercise regularly, pairs each with a woman who has a similar medical history but does not exercise, then follows the subjects for 10 years to see which group develops more cancer. This is a
Prospective study
What does r and R-squared mean?
R-squared is the percentage (from r) which tells you often the point can be explained by the line and r is the correlation coefficient which tells you the strength and direction.
SOCS (for data)
S-Shape (uniform, symmetrical, tailed, bimodal) O-Outlier (gaps) C-Center (mean, median) S-Spread (IQR, range, standard deviation)
What does the slope mean in a linear model?
Slope shows how the line moves and the direction.
Blocking
The arranging of experimental units in groups (blocks) that are similar to one another (Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter)
Treatments
The combination of factors and levels.
Placebos
The fake treatment
Researchers plan to conduct a block design experiment using fifth and sixth grade students who will be assigned a traditional paper book or an electronic reader. The students will then be tested for reading comprehension. What are the blocks in this design?
The fifth grade students and the sixth grade students
What is a percentile?
The ith percentile is the number that falls above i% of the data.
Why re-express data? (1)
To make the distribution of a variable more symmetrical.
Why re-express data? (4)
To make the form of a scatterplot more nearly linear.
Why re-express data? (3)
To make the scatter in a scatterplot spread out evenly rather than thickening at one-end.
Why re-express data? (2)
To make the spread of several groups more alike, even if their centers differ.
Describe distributions: uniform, skewed, unimodal, bimodal
Uniform = symmetrical Skewed = tailed to the right or left Unimodal = One hump Bimodal = Two humps
What does normal mean?
Unimodal and symmetrical
When is a linear model a good fit?
When r is a strong negative or positive and the residual plot is random.
How to define outliers with a box plot and with a mean?
With mean anything two standard deviations past the mean is an outlier, with a box plot anything past 1.5*IQR + Q3 or - Q1 is an outlier(IQR+ Q3 - Q1)
How do you compare scores from data sets with different means and standard deviations?
You can compare scores by converting them into z-scores.
Confounding variable
a factor other than the independent variable that might produce an effect in an experiment
Census
a sample that consists of the entire population
Cluster sampling
a sampling technique in which clusters of participants that represent the population are used
Lurking variable
a variable that is not among the explanatory or response variables in a study but that may influence the response variable
Which is important in designing a good experiment? I. Randomization in assigning subjects to treatments. II. Control of potentially confounding variables. III. Replication of the experiment on a sufficient number of subjects.
all three
Response bias
anything in a survey design that influences responses
Nonresponse bias
bias introduced to a sample when a large fraction of those sampled fails to respond
Convenience sampling
choosing individuals who are easiest to reach
simple random sample
every member of the population has a known and equal chance of selection
Another farmer has increased his wheat production by about the same percentage each year. His most useful predictive model is probably...
exponential
Voluntary sampling
individuals are self-selected by responding to an incentive
Over the past decade a farmer has been able to increase his wheat production by about the same number of bushels each year. His most useful predictive model is probably...
linear
Observational study
observes individuals and measures variables of interest but does not attempt to influence the responses
Undercoverage bias
occurs when some groups in the population are left out of the process of choosing the sample
In order to see which variety of apple tree produces more fruit, a farmer sets up an experiment. He has three plots of land with different soil and natural water availability. Each plot has room for eight trees. The farmer randomly selects four locations in each plot for the first variety of tree and the other four get the second variety. This experiment is...
randomized block, blocked by plot of land
Twenty dogs and 20 cats were subjects in an experiment to test the effectiveness of a new flea control chemical. Ten of the dogs were randomly assigned to an experimental group that wore a collar containing the chemical, while the others wore similar collars without the chemical. The same was done with the cats. After 30 days veterinarians were asked to inspect the animals for fleas and evidence of flea bites. This experiment is...
randomized block, blocked by species
In an experiment the primary purpose of blocking is to
reduce the within-treatment variation.
Standard error equation
s= square root( (sum of (y-mean of y)squared) divided by n-2)
Survey
sampling a part of the population
Factors
the explanatory variables in an experiment
Sampling variability
the natural tendency of randomly drawn samples to differ, one from another
A residual plot that has no pattern is a sign that...
the original data is straight and the regression line is a good model.
R-sq is a measure of...
the proportion of the variability in the response variable that is explained by the explanatory variable.
Blinding
when participants do not know whether they belong to the experimental or control group, or which treatment they are receiving
If a data point is influential it...
will change the slope of the regression equation
