AP Stats Semester 1 Final Review

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Using Scenario 9 (crime rate chart), calculate the conditional distribution of temperature by above average crime rate is

0.12, 0.56, 0.33 (conditional distribution is calculated by using the percentage of each temperature by calculating the total of ONLY the above average crime rate)

Using Scenario 3, the number of home runs made by most National League teams were in the 90s the number of home runs made by teams overall were in the 80s

True (Look at which place has the most numbers tacked on the end of the the graph)

Using Scenario 1, in the survey of your neighbors described above, the only discrete quantitative data you're collecting about your neighbors is family size.

True (discrete quantitative data can only take specific numeric values, annual income can change)

A residual:

is how much an observed y-value differs from a predicted y-value

Which of the following is a characteristic of a census?

it gathers data from every member of a population

Histograms are most useful in displaying:

large numeric data sets (Histograms are used to display frequencies across intervals of numeric data.)

Which measure of center and spread should be used with a Normal Distribution?

the mean and standard deviation (normal distribution is when a graph is entirely symmetric)

All of the following statements are true about normal distribution are true except

the normal curve crosses the x-axis at z-scores at 3.0 and below -3.0

Replication is best described as:

the policy of repeating the experiment on different subjects to reduce chance variation and to determine the generalizability of the findings.

all of the following statements about sample standard deviation are true, EXCEPT

the standard deviation is negative when there are extreme values in the sample (standard deviation will NEVER be negative)

Which of the following activities is NOT an example of data collection?

Reaching a conclusion about the results of a reading program

Which of the following is the best representative sample of the adult population in the United States

Simple Random Sample of 1,000 adults from across the country

Suppose that all sample data points are the same on a lie with a positive slope. What would be the r value?

+1.0

Find the z-score for the lower quartile of any normal curve. Round your answer to the nearest hundredth.

-0.67 (uses the 25 and puts into chart)

Calculate the least squares regression line for {(8,4), (4,3), (7,3), (5,2)}

y^=1.2+0.3x (use calculator)

You measure the weights of a hockey team and found the weights to be normally distributed. The distribution has a population mean weight of 160 lbs and a population standard deviation of 25 lbs. Find the Z-SCORE associated with a weight of 120 lbs.

z = -1.6 (find z-score by taking the variable - mean/ standard deviation)

A group of women with normally distributed heights has a population mean of 66 inches and a population standard deviation of 2.5 inches. What is the z-score, to the nearest tenth of an inch, associated with the height of 65 inches?

z=-0.4 (take variable - mean/ s.d.)

Using Scenario 2, the relative frequency of students who study 5-6 hours a day is

0.223 (relative frequency is found by frequency / total)

Estimate the relative frequency of countries with 20-22 days of no rainfall in the month of January

0.32 (relative frequency: frequency/total)

Using Scenario 9 (crime rate chart), calculate the marginal distribution for the crime rate (below, normal, above)

0.34, 0.37, 0.29 (calculate the marginal distribution for each column) (divide individual total of crime rate, ex. below, by the overall total)

You measure the weights of a hockey team and found the weights to be normally distributed. The distribution has a population mean weight of 160 lbs and a population standard deviation of 25 lbs. What is the probability that a randomly selected subject will weigh between 140 and 180 lbs?

0.576

Using Scenario 7 (the regression line for the effect of streetlights), calculate the residual for a block with 10 streetlights and 1 crime a month

0.6 (actual - predicted value)

Using Scenario 2, the cumulative relative frequency of students who study 4 hours or fewer a day is

0.615 (add all the previous relative frequencies to the current row)

Calculate the correlation for {(8,4), (4,3), (7,3), (5,2)}

0.6708203932 (Calculator)

Consider a complete table of relative frequencies. The sum of a relative frequency column in such a table must be:

1 (all relative frequency columns should equal 1)

Using Scenario 7 (the regression line for the effect of streetlights), how w many crimes a month are predicted when there are 7 streetlights on a block

1.0 (insert value into the equation)

Use the 68-95-99.7 rule. A population of bolts has a mean thickness of 20 mm, which a population standard deviation of 0.01 mm. What is the minimum and maximum thickness that includes 95% of the population of bolts?

19.98 to 20.02 mm (95% of the data is within 2 standard deviations away from the mean)

Use the 68-95-99.7 rule. A population of bolts has a mean thickness of 20 mm, which a population standard deviation of 0.01 mm. What is the minimum and maximum thickness that includes 68% of the population of the bolts?

19.99 to 20.01 mm (68% of the data is concentrated to one standard deviation away from the mean)

In a normal distribution with a mean of 30 and a standard deviation of 5, you'd find the largest proportion of of cases between

25 and 35 (full 68%)

Using Scenario 5 (babies born in the U.S.), what is the interquartile range for the data in the table?

25.5

Using Scenario 5 (babies born in the U.S.), what is the largest number of births in a month you could possibly have and still have a lower outlier?

255

To the nearest whole number, what is the percentile associated with z=-0.68?

25th percentile (take -0.68 to the z-score chart to match it up with a value. In this case it is 0.2483, which is multiplied by 100 to get a percent)

Using Scenario 5 (babies born in the U.S.), what is the lower quartile for the data in the table?

294

The following hypothetical data set shows the purchase prices (in thousands) for a sample of 3-bedroom, 2-bathroom homes in Essex County, MA, over the past year. Compute the five-number summary and create a modified box-and-whisker plot. How many outliers are present in this distribution? 250, 254, 320, 342, 221, 235, 210, 426, 210, 298, 231, 254, 278, 234, 236, 235, 300, 401, 129, 234, 235, 235, 245

3 (401, 426, 129)

Using Scenario 5 (babies born in the U.S.) , what is the median for the data in the table

306.5

Using Scenario 5 (babies born in the U.S.), what is the upper quartile for the data in the table (top 25%)

319.5

Calculate, to the nearest whole number, the sample standard deviation of this data set, which is a sample from a larger population: {71, 75, 65, 73, 69, 77, and 67}.

4 (take the mean, subtract mean from all numbers, square new numbers, and then take the mean of the squared numbers for answer)

Use the 68-95-99.7 rule, you can assume what percent of the normal distribution is outside two standard deviations of the mean in both directions

5% (take 100% and subtract the 95% to see what's still left outside)

Consider theses eight observations: {11, 6, 2, 5, 8, 4, 4, 9}, what is the median?

5.5

What area to the nearest whole percent, of the normal curve located between z=-0.6 and z=1.4?

50%

You measure the weights of a hockey team and found the weights to be normally distributed. The distribution has a population mean weight of 160 lbs and a population standard deviation of 25 lbs. What is the percentile for the weight 160 lbs?

50th percentile (find percentile by mean + variable * s.d.)

Consider theses eight observations: {11, 6, 2, 5, 8, 4, 4, 9}, what is the mean?

6.125

Using scenario 6 (college psychology department), what percentage of applicants between 500 and 700?

60% (most of them are within the 68% of the normal distributions, but not exactly, so can use a guestimate)

Using scenario 6 (college psychology department), Find the GRE score at the upper quartile, Q3

613 (find Q1 by putting numbers into L1 and L2 and then going to Stats -> Calc -> 1-Var Stats) (Find box-plot by going 2nd stat plot -> 1 Plot1...On -> set type box plot -> graph)

Using scenario 6 (college psychology department), what is the GRE score at the 77th percentile?

620

A group of women with normally distributed heights has a population mean of 66 inches and a population standard deviation of 2.5 inches. What is the height, to the nearest tenth of an inch, of the 70th percentile?

67.3 inches ( find z -score by variable - mean/ s.d., then take z-score to table and convert to a percentage)

Using scenario 6 (college psychology department), what percentage of applicants had a GRE score below 625?

78% (can use calculations of where it must be based on 68-95-99.7 rule & normal distribution)

Using scenario 6 (college psychology department), what percentage of applicants scored above 450 on the GRE?

82% (slightly less than 95% of the normal distributions, so again using logic to determine from the options what it must be)

Using Scenario 8 (LSRL for test squares and studying), which is the predicted test score for an individual who studies 8 hours a day?

85 (plug in values to the equation)

A bivariate scatterplot has an r squared of 0.85. This means:

85% of the variation in y is explained by the changes in x

To the nearest whole number, what percentile is associated with the z= 1.2?

88th percentile (use z-score chart)

Consider a normal distribution with a mean of 65 and standard deviation on 4. A sample of 950 is drawn from this population. Approximately how many of the 950 cases would you expect to find between 57 and 73.

903 (between the top like 95%, can calculate)

Double-blind is best described as:

a design in which neither the experimenter nor the subject knows who is in the treatment group and who is in the control group.

Many statisticians say that the U.S. Census, which attempts to count every population member directly, is significantly less accurate than a count estimated by random sampling. Why might a count estimated from random samples be more accurate than a census? (Choose the best answer.)

A census often can't find every population member so some groups (such as the homeless) are often underrepresented

A symmetric distribution can't have which of the following characteristics?

A long tail on one side (must be actually symmetric)

Why is the IQR considered to be a resistant statistic?

Adding a new extreme observation has very little effect on it (resistant statistics are resistant to outliers)

A normal curve table tells you that the probability lying below z = -1 is .1587. This can be interpreted as:

All of the above

Which of the following would most likely be graphed as a bar chart rather than a histogram?

All of the above (the number of cars, the number of students, the ethnic distribution of a major city, the number of people in various management positions) (histograms use quantitative data, bar graphs display categorical data)

Which of the following can stemplots show?

All of them (symmetry, clusters, gaps, and outliers)

Using Scenario 3, which league has the team that had the most home runs by midseason, and how many home runs and that team hit? Which league had the team with the fewest home runs?

American, 149, American

You have two normally distributed populations: Population A: mean = 50 standard deviation = 12 Population B: mean = 75 standard deviation = 15 Te area under the normal curve is greatest in which scenario

Area above 80 in population B

Using Scenario 4, create a box-and-whisker plot for each of the three data sets. then use the plot and the values of the five-number summary to answer the following question: Which team has the greatest variation in the points per game for the middle 50% of their observations?

Boston

Using scenario 4, the median points per player per game for each team are

Boston = 10.0, Chicago = 9.0, Seattle = 8.9

Assume that normal curve A and normal curve B have identical population means. Assume further that A has a greater population standard deviation than B. Which curve is taller, and why??

Curve B is taller because smaller standard deviations produce thinner curves

Using Scenario 1, what are the categorical variables in your survey?

Everything but average annual income and family size (these ones are determined by numerals)

An r of -1.0 proves a strong cause and effect relationship between x and y

False

If the population of interest is all day care centers in the United States, a sample of day care centers could be either all day care centers in New York City or a randomly selected group of day care centers throughout the U.S. Either sample is equally good.

False (NYC day care centers is too narrow of a sample and doesn't give a large enough scope)

Which of the following is true about the are as described under the normal curve?

Fewer than one percent of the cases are located three standard deviations or below the mean

a linear regression line indicates the amount of grams of the chemical CuSo4 (the response variable, y) that dissolve in water at various temperatures in Celsius (the explanatory variable, x). The least-squares regression line is y= 10.14 + 0.51x. Give the meaning of the slope of the regression line in the context of the problem.

For each one degree rise in the temperature, you can dissolve 0.51 more CuSo4.

Using the regression equation in Scenario 7 (crime & streetlights), correctly interpret the slope

For every additional streetlight per block, the crimed per month decrease by 0.2

The kind of sample strategy LEAST likely to produce statistics that are good estimated of population parameters is a:

Haphazard sample (haphazard samples basically guarantee bias)

Which of the following can outliers affect significantly?

I, III, and IV (mean, standard deviation, and range)

Since the distribution of housing prices in a community is usually skewed right, which measure of center should you use for housing prices?

Median (is the exact middle number so it best represents the price without being too skewed by the high house prices)

The proper notation for a normal distribution with a mean of 250 and standard deviation of 25 is:

N(250, 25)

When you have a normal distribution and you know that the area above a given value of x is 0.35, you also know that

None of the above (there's no way to really know anything based on this information alone)

Using Scenario 1, which of the variables collected are continuous data?

Only average annual income (can be determined over a long period of time)

An observational study based on survey data concluded that individuals who took more vitamin C were able to recover from the flu faster. You want to replicate this study using an experimental approach. The treatment in this experiment might be:

The amount of vitamin C taken per day: 0 mg, 1000mg, 2000mg, or 3000 mg

Which of these are categorical data?

The different types of anteaters. (other options were weight, length, etc) (The different types of anteaters cannot be expressed as a number, so this is an example of qualitative data.)

Six radio listeners are surveyed. Their favorite FM stations are: 89.1, 89.1, 89.1, 94.7, 94.7, and 104.3. Based on these data, you want to name the favorite station of a typical listener. You should name:

The mode, which is 89.1 (use the mode because it's the most common answer rather than the middle one. best with CATEGORICAL variables)

Consider the data set A: (2,8), (3,6), (4,9), and (5,9). Which of the following is the proper interpretation of the coefficient of determination

Thirty percent of the variation of the y-values can be explained by variation in the x-values

Let's say you're interested in the effects on boys of different dosage levels of a new drug for the treatment of ADD. You set up an experiment to consider the factor of dosage with two levels (300 mg vs 500 mg). What would be the different treatment groups of the experiment within each block?

Three groups: placebo drug, 300 mg, 500 mg

For a semester project, a student needs to select a random sample of 10 students from his senior class of 250. He carefully numbers the class list from 000 to 249 and then uses a random number generator to obtain 3-digit random numbers. The 10 unique numbers are his sample. He notices that they all belong to the same honors AP Calculus class. Another student claims that this could not be a random sample. Which of the following is true?

Whether a sample is a random sample or not is determined by the sampling method, not the results. The method used here is valid.

Which of the following determine the sign of r?

Whether the value of y increases or decreases as the value of x increases (the direction of the slope)

You're going to test two new varieties of fish food vs. a commonly used fish food. You set up an experiment as follows: 60 fish are randomly assigned to each of three different tanks. One tank is randomly selected to receive one of the new foods, another to receive the other new food, and the third tank to receive the common food. Fish growth is measured over tie. This is an example of:

a completely randomized design with a control group

A block is best described as:

a group of subjects that are similar in some way known to affect the response to the treatment.

An experiment is conducted in which a series of tests are performed on pairs of identical twins who were raised separately. A comparison of scores on each pair of twins is used for analysis. this is best described as:

a matched-pairs design

When looking at a scatterplot of two variables, the variable along the horizontal axis is typically referred to as the:

explanatory variable

Using Scenario 8 (LSRL for test scores and and studying). Suppose the maximum number of hours of study amount of students in your sample is 6. If you used the equation to predict the test score of a student who studied 8 hours a day, your prediction would be considered an

extrapolation

All positive correlations indicate stronger relationships than all negative correlations.

false

A normal probability plot

graphs raw scores against z-scores of percentile ranks

You've read a story in the New York Times claiming that individuals who engage in aerobic exercise for at least an hour a day demonstrate fewer symptoms of depression. You read that an experiment was conducted in which a researcher first administered a survey on depression and self-esteem to 100 individuals and then taught them some proper techniques of aerobic exercise. The 100 individuals were then sent off to exercise at least one hour a day. After two months, the depression and self-esteem survey was administered again and showed that depression symptoms declined and self-esteem increased. The experimental design used here is a:

matched pairs deign before and after design (matched pairs: when participants are matched in pairs based on shared characteristics)

The goal of the least-squares regression is to compute a line that:

minimizes the sum of the squared residuals

You believe normal probability plot of 30 dataa points provides evidence that the original data is normally distributed. If this is true, then the pattern of points on this normal probability plot:

none of the above (normal probability plots are entirely linear)

Which of the following would indicate the strongest relationship between two variables?

r squared = 0.23 (23%, look for highest r squared value)

The box plot below represents a distribution that is:

right skewed (use the appearance and see which whisker is the longest)

Influential points and outliers

should be examined carefully to determine if they're part of the data set.

the placebo effect is best described as:

the tendency of subjects to respond favorably to any treatment

On the least squares regression line, the point (x mean, y mean) always has a residual of 0

true

An outlier:

usually has a strong effect on the correlation coefficient and regression line and can be an influential point

As primary research for one of her books, Shere Hite distributed 100,000 questionnaires to women's groups. 4,500 women responded. Hite found that 96% of the women felt they give more emotional support to than they get from their husbands or boyfriends. Which of the following best describes her sample?

voluntary response sample


Kaugnay na mga set ng pag-aaral

Speech Chapter 7 & 8 Definitions

View Set

Biology Section 2-1 Review: Composition of Matter

View Set

Chapter 41: Management of Patients With Musculoskeletal Disorders 4

View Set

GNRS 555: Exam 2 Review Questions M.H

View Set

MCC Unit 5 loss, grief and dying

View Set

Ch. 10 GI Tract & Abdominal Wall

View Set