AP Stats Mid-Term Review

Ace your homework & exams now with Quizwiz!

Good sampling design

-Examines a part of the whole -Has no bias -Randomized -Good sample size

The principle of a small elementary school wants to select a simple random sample of 24 students. The school has 12 classrooms with 18 students in each class. She decided to randomly select two students from each classroom. Is this a simple random sample?

No, because not all combination of 24 students could have been chosen.

A factory has 20 assembly lines producing a popular toy. To inspect a representative sample of 100 toys, quality control staff randomly selected 5 toys from each line's output. Was this a simple random sample?

No, because not all combinations of 100 toys could have been chosen.

Can watching a movie temporarily raise your pulse rate? Researchers have 50 volunteers check their pulse rates. Then they watch an action film, after which they check their pulse rates once more. Which aspect of experimentation is present in this research?

None

Normal cdf

Normal cdf(lower, upper, mean, standard deviation) = %

Box and Whisker plot

Numerical (quantitative data)

Ogive

Numerical (quantitative data)

Dot plot

Numerical (quantitative) with small sets

What is s (standard error)?

The standard error summarizes the typical residual (or error) size.

Z-score equation

(x minus mean) divided by standard deviation

The relationship between the number of hours a person practices a task and the time it takes them to complete the task is calculated to have R-sq = 56.7%. The value of the correlation coefficient is

-0.753

Good experimental design

-Control group -Randomization -Able to replicate

You record the age, marital status, and earned income of a sample of 1463 women. The number and type of variables you have recorded are:

2 quantitative, 1 categorical

If the heights of a population of men are approximately normally distributed, and the middle 99.7% have heights between 5'0" and 7'0", what is the standard deviation of the heights in this population?

4"

What is the 68-95-99.7 rule?

68% is between one standard deviation of the mean, 95% is between two standard deviations of the mean, 99.7% is between three standard deviations of the mean

Systematic sampling

A procedure in which the selected sampling units are spaced regularly throughout the population; that is, every n'th unit is selected.

Experiment

A research method in which an investigator manipulates one or more factors to observe the effect on some behavior or mental process

What is a residual plot?

A residual plot is the difference of the actual minus the predicted point. (good fit = randomization)

stratified sampling

A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group

How does adding or subtracting values from a data set affect the measures of center and spread? How about multiplying?

Adding or subtracting values shifts the data set up or down by the same amount, multiplying increases each by a different amount.

What is an influential point?

An influential point changes the slope and gives you a different model. (residuals can be small or large and are not guaranteed vertical or horizontal)

How do you find a score from a percentile?

By using invnorm.

Voluntary response bias

Bias due to the manner in which people choose to respond to voluntary surveys.

CUSS (for data)

C-Center U-Unusual features S-Spread S-Shape

Bar graph (and segmented bar graph)

Categorical

Mosaic Plot

Categorical (two way table data)

Pie Chart

Categorical (useful for displaying parts of a whole)

If we wish to compare the average PSAT scores of boys and girls taking AP Statistics at this high school, which would be the best way to gather these data?

Census

FODS (for scatterplots)

F-Form O-Outliers D-Direction S-Spread

Which statement about residuals plot is true? I. A curved pattern indicates nonlinear association between the variables. II. A pattern of increasing spread indicates the predicted values become less reliable as the explanatory variable increases. III. Randomness in the residuals indicates the model will predict accurately.

I and II only

Which is true about randomized experiments? I. Randomization reduces the effects of confounding variables. II. Random assignment of treatments allows results to be generalized to the larger population. III. Blocking can be used to reduce the within-treatment variability.

I and III

Which statement about correlation is true? I. Regression based on data that are summary statistics tend to result in a higher correlation. II. If r 2 = 0.95, the response variable increases as the explanatory variable increases. III. An outlier always decreases the correlation.

I only

Which is true about sampling? I. An attempt to take a census will always result in less bias than sampling. II. Sampling error is usually reduced when the sample size is larger. III. Sampling error is the result of random variations and is always present.

II and III

Finding z-scores

Invnorm(area, 0, 1) = z-score

Invnorm

Invnorm(area, mean, standard deviation) = upper value

What is a z-score and why is it used?

It is how many standard deviations you are from the mean.

When is median more appropriate than mean?

Median is more appropriate when the data is skewed.

Does regular exercise decrease the risk of cancer? A researcher finds 200 women over 50 who exercise regularly, pairs each with a woman who has a similar medical history but does not exercise, then follows the subjects for 10 years to see which group develops more cancer. This is a

Prospective study

What does r and R-squared mean?

R-squared is the percentage (from r) which tells you often the point can be explained by the line and r is the correlation coefficient which tells you the strength and direction.

SOCS (for data)

S-Shape (uniform, symmetrical, tailed, bimodal) O-Outlier (gaps) C-Center (mean, median) S-Spread (IQR, range, standard deviation)

What does the slope mean in a linear model?

Slope shows how the line moves and the direction.

Blocking

The arranging of experimental units in groups (blocks) that are similar to one another (Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter)

Treatments

The combination of factors and levels.

Placebos

The fake treatment

Researchers plan to conduct a block design experiment using fifth and sixth grade students who will be assigned a traditional paper book or an electronic reader. The students will then be tested for reading comprehension. What are the blocks in this design?

The fifth grade students and the sixth grade students

What is a percentile?

The ith percentile is the number that falls above i% of the data.

Why re-express data? (1)

To make the distribution of a variable more symmetrical.

Why re-express data? (4)

To make the form of a scatterplot more nearly linear.

Why re-express data? (3)

To make the scatter in a scatterplot spread out evenly rather than thickening at one-end.

Why re-express data? (2)

To make the spread of several groups more alike, even if their centers differ.

Describe distributions: uniform, skewed, unimodal, bimodal

Uniform = symmetrical Skewed = tailed to the right or left Unimodal = One hump Bimodal = Two humps

What does normal mean?

Unimodal and symmetrical

When is a linear model a good fit?

When r is a strong negative or positive and the residual plot is random.

How to define outliers with a box plot and with a mean?

With mean anything two standard deviations past the mean is an outlier, with a box plot anything past 1.5*IQR + Q3 or - Q1 is an outlier(IQR+ Q3 - Q1)

How do you compare scores from data sets with different means and standard deviations?

You can compare scores by converting them into z-scores.

Confounding variable

a factor other than the independent variable that might produce an effect in an experiment

Census

a sample that consists of the entire population

Cluster sampling

a sampling technique in which clusters of participants that represent the population are used

Lurking variable

a variable that is not among the explanatory or response variables in a study but that may influence the response variable

Which is important in designing a good experiment? I. Randomization in assigning subjects to treatments. II. Control of potentially confounding variables. III. Replication of the experiment on a sufficient number of subjects.

all three

Response bias

anything in a survey design that influences responses

Nonresponse bias

bias introduced to a sample when a large fraction of those sampled fails to respond

Convenience sampling

choosing individuals who are easiest to reach

simple random sample

every member of the population has a known and equal chance of selection

Another farmer has increased his wheat production by about the same percentage each year. His most useful predictive model is probably...

exponential

Voluntary sampling

individuals are self-selected by responding to an incentive

Over the past decade a farmer has been able to increase his wheat production by about the same number of bushels each year. His most useful predictive model is probably...

linear

Observational study

observes individuals and measures variables of interest but does not attempt to influence the responses

Undercoverage bias

occurs when some groups in the population are left out of the process of choosing the sample

In order to see which variety of apple tree produces more fruit, a farmer sets up an experiment. He has three plots of land with different soil and natural water availability. Each plot has room for eight trees. The farmer randomly selects four locations in each plot for the first variety of tree and the other four get the second variety. This experiment is...

randomized block, blocked by plot of land

Twenty dogs and 20 cats were subjects in an experiment to test the effectiveness of a new flea control chemical. Ten of the dogs were randomly assigned to an experimental group that wore a collar containing the chemical, while the others wore similar collars without the chemical. The same was done with the cats. After 30 days veterinarians were asked to inspect the animals for fleas and evidence of flea bites. This experiment is...

randomized block, blocked by species

In an experiment the primary purpose of blocking is to

reduce the within-treatment variation.

Standard error equation

s= square root( (sum of (y-mean of y)squared) divided by n-2)

Survey

sampling a part of the population

Factors

the explanatory variables in an experiment

Sampling variability

the natural tendency of randomly drawn samples to differ, one from another

A residual plot that has no pattern is a sign that...

the original data is straight and the regression line is a good model.

R-sq is a measure of...

the proportion of the variability in the response variable that is explained by the explanatory variable.

Blinding

when participants do not know whether they belong to the experimental or control group, or which treatment they are receiving

If a data point is influential it...

will change the slope of the regression equation


Related study sets

Accounting practice problems and notes

View Set

Réponds aux questions suivantes

View Set

Economics of Money (2154) Chapter 1

View Set