AP Statistics Semester 1 Quiz/Checkpoint Questions GOD SEND

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

You have two normally distributed populations: Population A: mean = 50 standard deviation = 12 Population B: mean = 75 standard deviation = 15

Area above 80 in Population B (In this case you want the area above z = 0.33, or roughly .37.)

An experiment was conducted in which a researcher first administered a survey on depression and self-esteem to 100 individuals and then taught them some proper techniques of aerobic exercise. The 100 individuals were then sent off to exercise at least one hour a day. After two months, the depression and self-esteem survey was administered again and showed that depression symptoms declined and self-esteem increased. You're a bit skeptical of the results. You believe there are two problems. One, you really know the effects of the treatment without a control group, and two, you suspect there's interviewer bias. Which type of experimental design might best reduce these problems?

Completely randomized design comparing those who exercise vs. those who don't exercise, and a blind procedure

Which of the following is an example of using anecdotal rather than available data to support a conclusion?

Concluding that a particular brand of fish food leads to faster fish growth, based on your friend's observations

41, 2, 45, 43, 41, 54, 51 You'd expect 95% of these observations to be between what two values?

(35.7, 57.7) The sample mean is 46.7, and the sample standard deviation is 5.5 for this data set. In a normal distribution 95% of the observations are ±2 standard deviations from the mean. For this data set 2 standard deviations from the mean is 11. So, ±11 from the mean, 46.7, is (35.7, 57.7).)

We want to test a large data set for normalcy using the empirical rule. The sample mean is 110, and the sample standard deviation is 12. The data set is large enough that we think these are good estimates for the population mean and standard deviation. If the data are normally distributed, we'd expect to find 68% of the observations between which two values?

(98, 122)

Find the z-score for the lower quartile of any normal curve. Round your answer to the nearest hundredth.

-0.67

You have the following least-squares regression equation: ŷ = 25 + 6x. The residual associated with the observed data point of (5,20) is:

-35

You have the following hypothetical information on passenger complaints about domestic and international airlines from 1998: What's the probability that a passenger will issue a complaint about service on a domestic airline flight?

.035

In John's closet we find: 3 shirts (1 blue, 1 red, 1 black) 2 pairs of pants (1 blue, 1 black) 2 pairs of shoes (1 black, 1 blue) Estimate the probability that John wears an outfit of all the same color.

.167 (There are two possible outcomes in which John wears all the same color (blue shirt, blue pants, and blue shoes or black shirt, black pants, and black shoes). Thus, there are two successful outcomes out of 12 possible outcomes, which is a probability of 2/12, or .167)

If you randomly select one woman, what's the probability that she does aerobics?

.169

You have two specially created dice. Each die has six sides. The first die has the numbers {1, 3, 5, 7, 9, 11} on the sides. The second die has the numbers {2, 4, 8, 10, 11, 15} on the sides. Create a sample space of the outcomes, and estimate the probability that the sum of the two dice is greater than or equal to 20.

.194 (There are a total of 36 possible outcomes. If you compute the sum for each combination of dice, you should find that in 7 instances this sum is greater than or equal to 20. The probability is 7/36, or .194.)

A special six-sided die has the numbers {2, 4, 8, 10, 11, 15} on its sides. Estimate the probability of getting an odd number on a single roll of this die.

.333

You have a bag of red, white, and blue marbles. You know there are 11 red marbles, 8 blue marbles, and 3 white marbles in the bag. What's the probability of randomly selecting a blue marble from the bag?

.364

In John's closet we find: 3 shirts (1 blue, 1 red, 1 black) 2 pairs of pants (1 blue, 1 black) 2 pairs of shoes (1 black, 1 blue) Create a sample space of the possible outcomes of clothes combinations that John might wear (assuming he always wears a shirt, pair of pants, and pair of shoes). What is the complement of the P(John wears something red).

.667

If you randomly select one man, what's the probability that he doesn't do exercise walking?

.769

If the least-squares linear regression line explained the same amount of variation as the line ŷ = y̅, what would be the value of r²?

0

The empirical rule indicates that roughly 47.5% of the observations in a normal distribution are located in the range between a z-score of __ and a z-score of __.

0, -2

In Sausha's pocket, she has: 7 pennies 3 nickels 3 dimes 5 quarters 2 half-dollars 1 Susan B Anthony dollar coin If Sausha selects one coin from her pocket, what's the probability that it's divisible by $0.10? (Note: Ignore the fact that it's possible to distinguish the coins by the size and shape of the coin)

0.286

Consider the sample data set s = (8,4), (4,3), (7,3) and (5,2). What is the slope of the linear regression line?

0.3

Consider the data set A: (2,8), (3,6), (4,9), and (5,9). What is the value of r²?

0.30

You have the following regression equation for the effect of streetlights per block (x), on crimes per month (y) : y = 2.4 - 0.2x. Calculate the residual for a block with 10 streetlights and 1 crime a month (10, 1).

0.6

Consider the sample data set s = (8,4), (4,3), (7,3) and (5,2). What is the correlation coefficient?

0.6708203932

Consider a complete table of relative frequencies. The sum of the relative frequency column in such a table must be:

1

Consider a normal distribution of frog weights with µ = 500 grams and σ = 65 grams. A sample of size 2,000 is drawn from this population. Approximately how many of the 2,000 cases would you expect to find between 435 and 565?

1,360

Suppose you're given bivariate data (x, y) that produces a scatterplot with an exponential relationship. Which of the following are likely transformations to consider in order to straighten the data? I. (x, ln y) II. (ln x, y) III. (ln x, ln y) IV. (x, √y) V. (x, e^y)

!, !!!, !V

For this question, refer to the histogram you created for this Self Check. (The data you need for creating this histogram is at the end of your Study Guide.) Change your histogram so that $5,000 is added to the Xmin and Xmax. Change Xmin to $45,000 Change Xmax to $185,000 Keep Xscl at $10,000 You now have different classes that have no frequencies. What are their class limits?

$105,000 ≤ x < $115,000 $135,000 ≤ x < $145,000 $145,000 ≤ x < $155,000 $165,000 ≤ x < $175,000

In a sample's distribution of income, the modal income (mode, or most frequently occurring observation of income) is $27,000 a year, the median income is $35,000 a year, and the mean income is $45,000 a year. Which statistic do you think is the best estimate of average income, and would you say that income is normal, skewed to the left, or skewed to the right?

$35,000, skewed right (The median value, $35,000, is the best average to use in a skewed distribution. You can tell the distribution is skewed right because the order of the statistics from left to right is mode, median, and mean.)

P(black die shows a 2 and sum of the faces is 5) =

(1,6)(1,6)

One card is randomly selected from a standard 52-card deck. Which of the following gives the probability that the card is a black ace?

(1/2)(1/13)

We want to test a large data set for normalcy using the empirical rule. The sample mean is 25, and the sample variance is 25. The data set is large enough that we think these are good estimates of the population parameters. If the data are normally distributed, we'd expect to find 68% of the observations between which two values?

(20, 30) (take the sqrt. of the variance to find the standard deviation)

For the next 3 questions, refer to the following table of outcomes for the roll of a black and a white die. Ordered pairs represent (black, white). (1,1)(1,2)(1,3)(1,4)(1,5)(1,6) (2,1)(2,2)(2,3)(2,4)(2,5)(2,6) (3,1)(3,2)(3,3)(3,4)(3,5)(3,6) (4,1)(4,2)(4,3)(4,4)(4,5)(4,6) (5,1)(5,2)(5,3)(5,4)(5,5)(5,6) (6,1)(6,2)(6,3)(6,4)(6,5)(6,6) Which entry (entries) in the table represent(s) the event (black die shows a 3 | white die shows a 4)?

(3,4)

A survey is conducted at a local mall. An interviewer stops every 20th person that walks by and asks them to answer some questions about attitudes on transportation use. Of the 200 individuals who complete the survey, 80 say they prefer the bus as a transportation option. From this analysis, you have reason to conclude that:

About 40% of the mall goers prefer the bus as a transportation option

Why is the interquartile range (IQR) considered to be a resistant statistic?

Adding a new extreme observation has little effect on it. (The IQR is based on the median, which is resistant. That is, a new extreme value added to the data set will have a much larger effect on the mean than on the median.)

Which of the following is a sample of the population of all high school students? (HINT: A sample is not always a simple random sample.) - High school students taking chemistry - Your math class - All high school students in your state - High school students in Cook County, Illinois - All of the above

All of the above (Each of these is a subset of the population, which is all high school students. Any subset of a population is a sample of that population, though it may not be random.)

Say you're using the process of interpolation to predict values of y with the following regression line: ŷ = 32 + 1.7x, and a line is a reasonable model for the data. You know both the minimum observed value of x and the maximum observed value of x in your data set. Which of the following is true about the values of x you could use in this process of interpolation?

All values of x used in this prediction are between the minimum and maximum x-values.

Why would neighborhoods with more station wagons tend to have more playgrounds?

Families with children tend to own station wagons and move to neighborhoods with playgrounds.

Which of the following is true about the areas described under the normal curve?

Fewer than one percent of the cases are located three standard deviations above or below the mean.

Say you're designing an experiment that looks at the effect of different types of newly developed fish food on the growth rate of the fish. The treatments are new food A and new food B; the control is the old food. You believe that the effects of the different types of food on fish growth will vary for goldfish vs. tiger fish vs. guppies. How could you design an experiment so that you're blocking on the species of fish?

First separate the fish into the three species. Then within each species, randomly assign each fish to a treatment. (A block design first distinguishes the subjects based on some characteristic that might a confounding variable (in this case species of fish). After separating the fish by species, units within each block are randomly assigned to a treatment or control group. The design resembles doing three separate experiments: one for just guppies, one for just goldfish, and one for just tiger fish.)

You have the following regression equation for the effect of streetlights per block (x), on crimes per month (y) : y = 2.4 - 0.2x. Correctly interpret the regression coefficient.

For each additional streetlight per block, the number of crimes per month decreases by 0.2.

Suppose you've computed the following least-squares regression line, ŷ = 8.5 - 0.25x, where: x = degrees in Fahrenheit y = number of miles of jogging in a day Interpret the regression coefficient of this line.

For each increase of one degree in temperature, we'd predict that an individual would jog 0.25 fewer miles.

A linear regression line indicates the amount of grams of the chemical CuSO4 (the response variable, y) that dissolve in water at various temperatures in Celsius (the explanatory variable, x). The least-squares regression line is ŷ = 10.14+0.51x. Give the meaning of the slope of the regression line in the context of the problem.

For each one-degree rise in the temperature, you can dissolve 0.51 more grams of CuSO4.

Which of the following indicates how many times every value in a distribution appears?

Frequency table (The phrase how many times implies the count for each data value, which is best shown through a frequency table.)

You want to know something about your neighbors, so you give them a survey. The survey collects the following data about each family on your block: family size, the kind of pets they have, the grade of the youngest child in the family, the family's annual income in dollars, what the dad does for a living, whether the mom works, and their phone number. Each kind of data you collect about a family is a variable. Which of the variables you collect are continuous data?

Only annual income

You have six different cards with colors on both sides of the card: Card 1: Green and Green Card 2: Green and Blue Card 3: Green and Red Card 4: Red and Red Card 5: Green and Green Card 6: Green and Blue You add the following cards to the deck: Card 7: Green and Black Card 8: Black and Black Card 9: Red and Red Card 10: Green and Red Card 11: Black and Green Which probability is greatest across the following events?

Probability that the other side of a green card isn't green (Although there are 8 cards with green faces, you know that in 6 instances the other side isn't green. Thus, this probability is 6/8, or 75%)

Which one of the following activities is not an example of data gathering?

Reaching a conclusion about the results of a reading program (Reaching a conclusion about the results of a reading program)

You have the following information about voters attending a political meeting: You randomly choose one attendee to speak with about his or her political views. Among the choices listed below, which one is the most likely to be chosen?

Republican from Washington, DC

Which of the following is true once you've constructed the best linear model for a set of data?

Roughly 68% of the residuals should fall within 1 standard deviation of 0. (In the best linear model the residuals should be normally distributed around the line y = 0. Thus, the empirical rule suggests that roughly 68% should be within 1 standard deviation of 0.)

A scientist is testing certain sampling strategies to see which one is best. She's gathered data from an entire population and calculates a population mean µ (mu) = 14 and a population standard deviation σ(sigma) = 5. She draws five different samples from this population using five different strategies and gets the following sample means and standard deviations: Sample 1: (x-bar) = 16.9, s = 6 Sample 2: (x-bar) = 14.5, s = 4.7 Sample 3: (x-bar) = 10.5, s = 3.3 Sample 4: (x-bar) = 17, s = 4.9 Sample 5: (x-bar) = 14.1, s = 8.4 Which sampling strategy does she conclude provides the best sample?

Sample 2 (The statistics from Sample 2 are the best estimates of both population parameters, indicating that this is probably the best sampling strategy.)

Which of the following is the best representative sample of the adult population in the United States? - Simple random sample of 10,000 adults from different city phone books - Simple random sample of 10,000 voters from across the country - Sample of 50,000 individuals at the Super Bowl (which draws from all over the country) - Simple random sample of 1,000 adults from across the country - Sample of 50,000 members of AARP (American Association of Retired Persons)

Simple random sample of 1,000 adults from across the country

What do positive residuals indicate?

That the regression equation may underestimate the y variable.

Which of these are categorical data?

The different types of anteaters. (other options were weight, length, etc) (The different types of anteaters cannot be expressed as a number, so this is an example of qualitative data.)

Which measure of central tendency and which measure of variation should be used with a heavily skewed distribution?

The median and inter-quartile range (The median and inter-quartile range are used because they're less likely to be influenced by outliers in a skewed distribution.)

A hockey team has completed 35 games. The team's median goals per game is 2. Which of the following must be true about the team's goal total so far?

The median doesn't allow us to infer the exact goal total, AND it is at least 36.

You have a normal curve with a mean of mu. You increase the standard deviation of this curve. How does the curve change?

The new normal curve seems"flatter" and"wider."

All of the following statements are true about the normal distribution except:

The normal curve crosses the x-axis at z-scores above 3.0 and below negative 3.0. (The normal curve never crosses the x-axis. Theoretically, there can be observations in the distribution located well beyond 3 standard deviations in either direction of the mean.)

Consider this scenario: Surgery patients in Hospital A and Hospital B are classified as being in either good condition or poor condition. When looking separately at survival rates among patients in good conduction and in poor condition, Hospital B has higher rates for both groups. Yet when the two groups are combined, Hospital A has a higher rate. What's the most likely lurking variable?

The number of patients in good or poor condition (The lurking variable is one that is lost when the data from both hospitals are brought together.)

The correlation between variables x and y is .50. You compute a least-squares regression line with this bivariate data. Which of the following must be true?

The regression coefficient is positive.

Which of the following is true when an observed data point lies below the least-squares regression line?

The residual for this point is negative.

You observe that the distribution of weights of individuals drawn from a random sample of the United States appears bimodal; there are peaks at 130 lbs. and 165 lbs. You were under the impression that weights follow a normal distribution. Why do you think this distribution differs so much?

The sample is comprised of males and females.

In the formula for calculating r, what does sx mean?

The sample standard deviation of x

A telephone research agency has been contracted to collect survey information on buying habits of teenagers, aged 16 and 17. The agency conducts phone calls between 3 PM and 4:30 PM each weekday. The survey finds that the majority of teenagers aged 16 and 17 have access to a credit card. Which of the following is the most likely problem with this analysis?

The sample suffers from undercoverage due to the hours when polling took place

The probability that a White adult man with a high white blood cell count contracts leukemia is .35. A proper interpretation of this probability is:

We'd expect that in a sample of 100 White adult men with high white blood cell counts, 35% will contract leukemia.

Imagine a study of surgery survival rates in two hospitals: Hospital A and Hospital B. Surgery patients in these hospitals are classified as being in either good condition or poor condition. Which of the following would be an example of Simpson's paradox?

When looking separately at survival rates among patients in good condition and in poor condition, Hospital B has higher rates for both groups. Yet when the two groups are combined, Hospital A has a higher rate

Which of the following is an accurate description of Simpson's paradox?

When separate groups of data are combined, an association can reverse direction because of a lurking variable that was lost when the different groups of data were lumped together.

For a semester project, a student needs to select a random sample of 10 students from his senior class of 250. He carefully numbers the class list from 000 to 249 and then uses a random number generator to obtain 3-digit random numbers. The 10 unique numbers are his sample. He notices that they all belong to the same honors AP Calculus class. Another student claims that this could not be a random sample. Which of the following is true?

Whether a sample is a random sample or not is determined by the sampling method, not the results. The method used here is OK.

Which of the following determines the sign of r?

Whether the value of y increases or decreases as the value of x increases

What can you do with a calculator- or computer-generated histogram that you can't do with a hand-drawn stem-and-leaf plot?

You can change the class interval.

The normal curve is:

a bell-shaped density curve that models a normal distribution.

You're a wildlife biologist who wants to measure the top speed of a particular species of dragonfly. You set up a measurement apparatus in a portable wind tunnel, where the insect is suspended in place while air blows over it. The wind tunnel blows air at an increasing speed, and you track the point at which the beating of the insect's wings can't keep pace with the rate of air flowing over them. You plan to travel all over South America, setting up your portable wind tunnel in exotic locations, capturing insects, and measuring their maximum speeds. What is a potential source of bias in this study?

a biased sample

You're going to test two new varieties of fish food vs. some commonly used fish food. You set up an experiment as follows: 60 fish are randomly assigned to each of three different tanks. One tank is randomly selected to receive one of the new foods, another tank to receive the other new food, and the third tank to receive the common food. Fish growth is measured over time. This is an example of:

a completely randomized design with a control group. (This is the simplest type of experiment. Units (the fish) are completely randomly assigned to treatment conditions: regular food (the control) or one of the types of new food (the two treatments). The responses (growth) of both treatment groups can be compared to each other and to the control (the group that receives the common food). If this were a blocked design, the subjects would be divided according to particular characteristics, such as male or female, and then subjects from each division (each block) would be randomly assigned to a treatment.)

Double-blind is best described as:

a design in which neither the experimenter nor the subject knows who is in the treatment group and who is in the control group.

The normal distribution is:

a distribution that can be modeled by a bell-shaped curve, with the mean, median, and mode all the same value and the proportion for any range of values defined in a table of normal curve areas (or probabilities).

A block is best described as:

a group of subjects that are similar in some way known to affect the response to the treatment.

An experiment is conducted in which a series of tests are performed on pairs of identical twins who were raised separately. A comparison of scores on each pair of twins is used for analysis. This is best described as:

a matched-pairs procedure.

Of the samples describe below, which would be least likely to produce the right-skewed distribution of ages in the data set below? (HINT: Use your common sense, and think about how old people usually are when they do certain things.) (25, 28, 28, 28, 29, 29, 30, 32, 32, 33, 33, 33, 34, 34, 36, 39, 39, 39, 40, 43, 44, 45, 50, 55) - a sample of retired men - a sample of ages of people who have their first child - a sample of med school grads - a sample of ages people married for the first time

a sample of retired men

You're measuring the weights of a group of dogs. You get a mean weight of x-bar open parentheses top enclose straight x close parentheses equals 48.8 pounds and a standard deviation of s = 17.5 pounds. The dogs are:

a sample. (The symbols used for the mean and standard deviation are symbols for statistics, not parameters. This is a clue that the group of dogs is a sample from some population of interest.)

All of the following are examples of a sample that suffers from an undercoverage bias except:

a simple random sample of licensed drivers, taken from driver-licensing records.

You believe that a normal quantile plot of 30 data points provides evidence that the original data is normally distributed. If this is true, then the pattern of points on this normal quantile plot is:

a straight line

In a cluster sample:

a subset of naturally occurring groups are randomly sampled.

Suppose that you have a bivariate data set and the correlation coefficient for the relationship between x andy is strong. Suppose also that you compute a least-squares regression line that seems to fit the linear pattern in this bivariate data. You decide to plot the observed x-values (along the x-axis) against the residuals (along the y-axis). If the pattern in the data is truly linear, you should find that the residuals:

are randomly distributed around the line y - ŷ = 0.

If subjects in an experiment are separated into groups of Whites and Non-Whites prior to the random assignment stage, we should think of race as a (an):

blocking variable

Which of the following combinations of data types is not possible? - discrete and categorical - continuous and categorical - discrete and numeric - continuous and numeric - all combinations of data types are possible

continuous and categorical are not possible

To transform this relationship you can take the natural logarithms of the response variable to find the regression equation for these data. The resulting regression equation, r and r2 is: ln y = -35.4173 + 0.02078x r = 0.9835 and r2 = 0.9672. This equation would predict the population for 1915 to be:

e^4.3764 million people (Good. You remembered to use the result from your equation as the exponent for base e!)

A friend claims that education (years of school) is normally distributed. You believe she's wrong. A reasonable argument for your point of view is that:

education has several modes (for example, at 12 and 16 years).

You aren't sure if a line is a good model for a set of bivariate data that contains 15 points. You use your TI-83 to find the linear regression of y on x. Then you take the natural logarithm of each y-value and find the linear regression of ln y (natural log of y) on x. The list RESID in the LIST menu:

contains the residuals for the regression of ln y on x.

If there is a clear pattern in the residual plot, is it safe to extrapolate?

no, because there is a pattern

All the following would be evidence that you do not have a normal distribution, except:

none of the above (Multiple modes, skewed or uniformly distributed data, or a quantile plot that suggests an obvious curve are not characteristics of the normal distribution.)

When you have a normal distribution and you know that the area above a given value of x is .35, you also know that:

none of the above (The area indicates the percentage of observations equal to OR above that value of x.)

Which of the following are true statements? I. Under-coverage bias can be overcome by greatly increasing the sample size. II. Well-designed probability sample always eliminate bias. III. Response bias results only from the wording of survey questions.

none of these

Suppose that all sample data points are on the same line with a positive slope. What would r be for this sample?

r would be +1.0

Which of the following would generate an SRS of 50 integers from 5 to 25 on the TI-83?

randInt(5,25,50)

The following diagram describes an experiment in which subjects have been divided into three groups. Each of the three groups is randomly assigned to one of two treatments.

randomized block design, blocked on race (Notice that the subjects are first blocked by race)

50 boys and 50 girls with ADD (Attention Deficit Disorder) were selected for an experiment to test a new drug for the treatment of ADD. Half of the boys and half of the girls were selected at random to receive the new drug, and the other half of each group received a placebo. A reduction in symptoms of ADD was measured for each subject. The basic design of this experiment is:

randomized block, blocked by gender (Because the design makes sure that exactly half the boys and half the girls are assigned to a treatment or control group, this experiment isn't a completely randomized design. The subjects are blocked, or divided, by gender, and then each block is randomized into treatment and control groups.)

The process of replication in an experiment often involves:

repeating an experiment on different units or under different conditions.

n (for the size of the group), x̄ (for the mean), and s (for the standard deviation) are all measures calculated from what group?

sample

The set of all possible outcomes of a (probability) experiment is called the:

sample space

You do a linear regression on 1500 pieces of data, draw a normal quantile plot of the residuals, and note that it's very close to a straight line. You may conclude:

that the original data could be linear because this is an indication that the residuals are normally distributed. (The normal quantile plot does help examine the normality of the residuals, which in turn might help you make interpretations about the original data.)

An observational study based on survey data concluded that individuals who took more vitamin C were able to recover from the flu faster. You want to replicate this study using an experimental approach. The treatment in this experiment might be:

the amount of vitamin C taken per day: 0 mg, 1000 mg, 2000 mg, or 3000 mg

Finding the line of best fit is appropriate when:

the data points follow a straight line.

A basketball fan thinks the large salaries of NBA players will force the NBA to raise ticket prices. Here's how she came to this conclusion: since salaries are part of the total NBA operating costs, she used the NBA average salary to infer the total. Which measure of center did she use?

the mean

for a mound-shaped and symmetrical distribution, what measure of center and measure of variation should be used?

the mean and the standard deviation

The blood pressure reading for any age group is normally distributed. The normal distribution is symmetric and mound-shaped. Therefore, the correct measure of central tendency to use for blood pressure is:

the mean.

If you list and graph the dates of coins in people's pockets and purses you'd probably find that the graph's distribution is skewed left, because more recent dates are more common. If you want to express the average date of a coin you'd use:

the median

for a skewed distribution, what measure of center and measure of variation should be used?

the median and the IQR (When the data are numeric and the distribution is skewed either to the right or the left, the median is usually the best choice for the measure of central tendency. This is because the mean is too heavily influenced by extreme observations that fall on only one side of a skewed distribution.) & (The IQR gives the values of the observations at the 25th and 75th percentiles (or the average of the two values closest to these percentiles if there is not a single value), Q1 and Q3. These values are less influenced by outliers than the standard deviation and so are best used with skewed data.)

Six radio listeners are surveyed. Their favorite FM stations are: 89.1, 89.1, 89.1, 94.7, 94.7, and 104.3. Based on these data, you want to name the favorite station of a typical listener. You should name:

the mode, which is 89.1

Replication is best described as:

the policy of repeating the experiment on different subjects to reduce chance variation and to determine the generalizability of the findings.

All the following statements about the sample standard deviation are true, except:

the standard deviation is negative when there are extreme values in the sample. (variation can never be in negative numbers)

The placebo effect is best described as:

the tendency of subjects to respond favorably to any treatment.

What is the difference between these two residual plots? residual (RESI1) vs. the x-variable and residual (RESI1) vs. the predicted y-variable (FITS)

the x-axis scale

You have a distribution summarizing the number of days in the past two months (60 days) that an individual watched TV. The median number of days = 25, the lower quartile = 21, and upper quartile = 42. Given this information, which of the following statements are true?

there are no outliers in this distribution

Consider a survey in which 79% of the respondents said yes to the following question: Agree or disagree: Since our economy can't sustain itself without a healthy environment, it's important that Congress pass laws to protect the environment. What's a reasonable reaction to the survey results?

they're not valid, due to a leading question

A primary purpose of randomization is:

to eliminate bias between treatment groups.

A primary purpose of blocking is:

to isolate the separate effects of the treatment and another important variable.

An important reason a market researcher collects data using a stratified random sample rather than a simple random sample is:

to make a representative sample more likely than one produced by simple random sampling.

An experiment examines the effect of a new pill on reducing blood pressure. Half the subjects are given the new pill, the other half are given a sugar pill and told they're receiving a new medicine for reducing blood pressure. The new pill is considered a (an) _______, and the sugar pill is called a (an) _________.

treatment, placebo

True or False: A finding is statistically significant if it's unlikely that it could have occurred by chance.

true

True or False: Blocking in an experiment and stratifying in a survey accomplish the same thing. They control the amount of variation in key characteristics within the total sample, such as race or gender, that are likely to be important to the outcome of interest.

true

True or False: In a normal distribution, the mean, median, and mode all have the same value, and the graph of the distribution is symmetric

true

True or False: In a normal quantile plot that shows a generally straight line for a set of 10 data points, it is reasonable to assume that the data are normally distributed.

true

True or False: In a survey of your neighbors (asking for family size, the kind of pets they have, the grade of the youngest child in the family, the family's annual income in dollars, what the dad does for a living, whether the mom works, and their phone number), the only discrete, numerical data you're collecting about your neighbors is family size.

true

True or False: Interpolation is generally safer and more valid than extrapolation.

true

True or False: The area between ±2 standard deviations under the mean of a normal curve is approximately 95% of the total curve area. Also, there is approximately a 95% chance that an observation will be within ±2 standard deviations of the mean.

true

True or False: The x-values are the explanatory variables, and the y-values are the response variables in bivariate data scatterplots indicating a cause and effect relationship.

true

True or False: Theoretically, the more trials you conduct the closer the observed value is to the expected value.

true

True or False: When you decrease the Xscl value, you decrease the class interval. By doing this, you increase the number of classes or "buildings."

true

True or False: A simple random sample is not just a sample where every population member has an equal chance of being drawn.

true (A simple random sample also has the requirement that all possible samples are equally likely, meaning that every possible combination of population members has the same chance of occurring.)

True or False: For a symmetric, mound-shaped distribution, the mean, median, and mode are all the same.

true (A symmetric, mound-shaped distribution will have its mean as the most common value. Half of the values will be above the mean and half will be below it.)

True or False: Your population of interest is whatever you decide it is. A population can be anything as long as it's defined as a population.

true (If you're interested in trees in general, but elm trees in particular, especially those close to where you live, you could say that your population is not trees in general, but elm trees in the park next to your house. Then an appropriate sample would be a sample selected from the elm trees in the park.)

True or False: It's possible to determine the frequencies (counts) within each interval from a cumulative frequency plot.

true (In a cumulative frequency table, the difference between successive entries in a column is equal to the frequency of the lower entry. All frequencies can be "recaptured" by computing all such differences.)

True or False: The shape and standard deviation of a population distribution of a variable (such as income) can be estimated with a distribution of a sample of sufficient size

true (Just as sample statistics are used to estimate population parameters, distributions of samples can be used to estimate the shapes and standard deviations of population distributions.)

True or False: When you take a sample set of bivariate data, and reverse the explanatory and response variables, such that, for example, the point (2,3) becomes (3,2), and so on, the correlation coefficient r remains unchanged.

true (Pearson's r is a symmetric statistic describing the strength and direction of a relationship. It doesn't depend on how the variables are designated.)

Consider the sample data set s = (8,4), (4,3), (7,3) and (5,2). Using your calculator, calculate the least-squares linear regression line.

ŷ = 0.3x + 1.2

Population H is a group of women with normally distributed heights. Population H has a population mean (µ) of 66 inches and a population standard deviation (σ) of 2.5 inches. In population H, what is the z-score, to the nearest tenth, associated with the height 65 inches?

z = -0.4

You measured the weights of members of population W and found the weights to be normally distributed. The distribution has a population mean (µ) weight of 160 pounds and a population standard deviation (σ) of 25 pounds. For population W, find the z-score with a weight of 120 pounds?

z = -1.6

Suppose that all sample data points are on the same horizontal, in other words, a line with a slope of zero. What would r be for this sample?

zero

The proportions of any normal curve are defined by:

µ and σ

In a physics experiment, you time an object free falling from a platform. Your bivariate data (x,y) gives you a scatterplot that has a quadratic association. The form of the equation you find after transforming your data is:

√y = a + bx

What is the term for the width of a building in a histogram?

class interval

The entire group we're interested in is called a:

population

Estimate, to the nearest whole number, the sample standard deviation of this data set, which is a sample from a larger population: {71, 75, 65, 73, 69, 77, and 67}. The mean of the data in this sample is 71.

4

You have the following regression equation for the effect of streetlights per block (x), on crimes per month (y) : y = 2.4 - 0.2x. How many crimes a month are predicted when there are 7 street lights on a block?

1.0

You're given some bivariate (x, y) data. You use your calculator to find the linear regression line for the transformed data (ln x, ln y). Your y= screen shows this equation: y1 = 0.2 + 0.4x. When x = 2, the correct predicted value for y is:

1.6117 (You remembered to transform 2 to ln(2), and to transform your result to e^0.4773! You can solve this problem by entering e^(.2 + .4(ln(2))) into your calculator.)

You flip four coins. What's the probability of getting exactly four heads?

1/16

A jar contains five coins: 2 pennies, 1 nickel, and 2 dimes. You draw one coin at random, what's the probability of selecting the nickel? What's the probability of selecting a dime?

1/5, 2/5

P(black die shows 6 or white die shows 2) =

1/6 + 1/6 - 1/36

You know that Bob Jones plays for Chicago. Chicago is playing a game in Minneapolis. Use the table below to estimate the number of free throws Bob Jones will get in if he has 15 free-throw attempts.

10

How many degrees of freedom does a sample of 12 have when you calculate a standard deviation?

11 (n - 1, n = 12, 12 - 1 = 11)

You flip four coins. What's the probability of getting at least 2 heads?

11/16 (If you flip four coins, the sample space consists of 16 possible outcomes: {HHHH, HHHT, THHH, HTHH, HHTH, HHTT, HTTH, HTHT, TTHH, THTH, THHT, TTTH, HTTT, THTT, TTHT, TTTT}. You can see that you would get at least two heads (two or more) in 11 different outcomes: {HHHH, HHHT, THHH, HTHH, HHTH HHTT, HTTH, HTHT, TTHH, THTH, THHT}. Therefore, the probability is 11/16.)

If the range of a normally distributed data set is 80, what is a reasonable estimate for the standard deviation?

15 (We know that roughly all the cases are within 3 standard deviations. This range can be divided into 6 segments (that is, 3 standard deviations in each direction from the mean.)

Questions 1-5 refer to the following data. The table below shows the results of a hypothetical survey on participation in sports activities by men and women. Those surveyed answered yes to activities they had done at least twice in the previous 12 months. Assume that the sample is representative of the overall population: Men Women Total 109,059 115,588 Aerobic Exercising 3,717 19,535 Baseball 12,603 2,974 Hunting 18,512 2,343 Softball 11,535 8,541 Exercise Walking 25,146 46,286 If you randomly select one man, what's the probability that he hunts, expressed as a percentage?

16.97%

A population of bolts has a mean thickness of 20 millimeters, with a population standard deviation of .01 millimeters. Give, in millimeters, a minimum and a maximum thickness that will include 95% of the population of bolts.

19.98 to 20.02 millimeters

A population of bolts has a mean thickness of 20 millimeters, with a population standard deviation of .01 millimeters. Give, in millimeters, a minimum and a maximum thickness that includes 68% of the population of bolts.

19.99 to 20.01 millimeters

You measured the weights of members of population W and found the weights to be normally distributed. The distribution has a population mean (µ) weight of 160 pounds and a population standard deviation (σ) of 25 pounds. For population W how many standard deviations from the mean is the weight of 185 pounds?

Suppose a data set has a linear regression line of ŷ = 6 - 0.8x. If the mean of the x primes is 5, what is the mean of the y primes?

2 (since (x̅, y̅) is a point on the linear regression line, substitute 5 for x̅ and solve for y using the formula y̅ = b₀ + b₁x

Applicants to a college psychology department have normally distributed GRE (Graduate Record Exam) scores with a mean, µ, of 544 and a standard deviation, σ, of 103. What percent of applicants had a GRE scored below 625? Round your answer to the nearest whole percent.

78%

What is the probability that a randomly selected day falls on a Thursday or Friday? (Assume you don't already know.)

2/7

Say you have a sample of 1,000 mothers, and you know that the length of a woman's pregnancy is normally distributed, with a mean of 270 days and a standard deviation of 15 days. How many mothers will have their baby in the 10th month (that is, length of pregnancy above 300 days)?

25

In a distribution with many values, which of the percentiles is also known as Q1?

25th percentile

To the nearest whole number, what percentile is associated with z = 0.68?

25th percentile

Population H is a group of women with normally distributed heights. Population H has a population mean (µ) of 66 inches and a population standard deviation (σ) of 2.5 inches. What is the population proportion, to the nearest tenth of a percent, between 62 and 65 inches in height?

29.0%

You flip four coins. What's the probability of getting exactly 2 heads?

3/8

The midpoint of the interval whose boundaries are 27.5 and 38.5 is:

33

You have two specially created dice. Each die has six sides. The first die has the numbers {1, 3, 5, 7, 9, 11} on the sides. The second die has the numbers {2, 4, 8, 10, 11, 15} on the sides. If you roll the two dice, how many possible outcomes are there?

36

Consider the following four rows from a random number table: 55588 99404 70708 41098 46563 56934 48394 51719 12975 13258 13048 45144 72321 81940 00360 02428 Use the random number table above to draw a sample size of 10. Twenty percent of the population has a trait you wish find. A success is defined as a 0 or 1. Draw eight samples of size 10 by reading each row from left to right. What are the greatest and fewest number of successes you might expect in a sample of size 10?

4, 0 The winning blocks of ten are in bold: [55588 99404] *[70708 41098]* 4 instances [46563 56934] [48394 51719] 0 instances [12975 13258] [13048 45144] [72321 81940] *[00360 02428]* 4 instances

For a normal distribution with µ = 480 and σ = 32, find the x-values for Q1, Q2, and Q3. Round your answers to the nearest 1.

458, 480, 502

If the range of a normally distributed data set is 25, what's a reasonable estimate for the standard deviation?

5 (25/6 = 4.2)

Using the empirical rule, you can assume that what percent of the normal distribution is outside two standard deviations of the mean in either direction?

5%

Consider these eight observations: {11, 6, 2, 5, 8, 4, 4, 9}. What is the median?

5.5

In a distribution with many values, which of the following percentiles is equal to the median?

50th percentile

You measured the weights of members of population W and found the weights to be normally distributed. The distribution has a population mean (µ) weight of 160 pounds and a population standard deviation (σ) of 25 pounds. For population W, find the percentile with a weight of 160 pounds?

50th percentile

You measured the weights of members of population W and found the weights to be normally distributed. The distribution has a population mean (µ) weight of 160 pounds and a population standard deviation (σ) of 25 pounds. For population W, what is the probability, to the nearest tenth of a percent, that a randomly selected subject will weigh between 140 and 180 pounds?

57.6%

What is the normal curve area, to the nearest whole percent, between z = -0.38 and z = +1.5?

58%

Air Milano has 250 passengers on a flight from New York to Milan. Using the hypothetical information in the table below, estimate the expected number of complaints about service on this flight.

6

Look at the data set below. How many possible values are there for this variable? What kind of variable is it? red yellow blue yellow white red green blue green green blue yellow red white yellow orange

6, categorical

Consider these eight observations: {11, 6, 2, 5, 8, 4, 4, 9}. What is the mean?

6.125

If you randomly select one person from the study, what would be the probability that he or she plays baseball for exercise, expressed as a percentage?

6.93%

Applicants to a college psychology department have normally distributed GRE (Graduate Record Exam) scores with a mean, µ, of 544 and a standard deviation, σ, of 103. What percentage of applicants scored between 500 and 700? Round your answer to the nearest whole percent.

60%

You have six different cards with colors on both sides of the card: Card 1: Green and Green Card 2: Green and Blue Card 3: Green and Red Card 4: Red and Red Card 5: Green and Green Card 6: Green and Blue You're shown a card with a green face showing. What is the probability that the other side of the card is not green?

60%

Applicants to a college psychology department have normally distributed GRE (Graduate Record Exam) scores with a mean, µ, of 544 and a standard deviation, σ, of 103. Find the GRE score at the upper quartile, Q3. Round your answer to the nearest whole number.

613

Applicants to a college psychology department have normally distributed GRE (Graduate Record Exam) scores with a mean, µ, of 544 and a standard deviation, σ, of 103. What is the GRE at the 77th percentile? Round your answer to the nearest whole percent.

620

What area, to the nearest whole percent, of the normal curve is located between z = -0.6 and z = +1.4?

64%

Population H is a group of women with normally distributed heights. Population H has a population mean (µ) of 66 inches and a population standard deviation (σ) of 2.5 inches. In population H, what is the height, to the nearest tenth of an inch, of the 70th percentile?

67.3 inches

One card is randomly selected from a standard 52-card deck. Find the probability that the card is an ace or a black card.

7/13 (1/13 + 1/2 - 2/52 = 7/13)

What is the maximum length of a whisker in a modified box plot where the median = 120, Q1 = 100, Q3 = 150, the minimum = 20, and maximum = 270?

75

A proper interpretation for r2 = 0.754 is:

75.4% of the variation in the response variable can be explained by our knowledge of the explanatory variable.

Applicants to a college psychology department have normally distributed GRE (Graduate Record Exam) scores with a mean, µ, of 544 and a standard deviation, σ, of 103. What percent of applicants scored above 450 on the GRE? Use the normal curve area. Round your answer to the nearest whole percent.

82%

A bivariate scatterplot has an r² of .85. This means:

85% of the variation in y is explained by the changes in x

To the nearest whole number, what percentile is associated with z = +1.2?

88th percentile

You're designing an experiment that looks at the effect of different types of newly developed fish food on the growth rate of three species of fish: tiger fish, guppies, and goldfish. The treatments are new food #1, new food #2, and old food. For a randomized block design, blocked on species of fish, how many treatment groups would you have? (Think of "old food" as one of the treatments.)

9 (You've first separated the sample of fish into three species: guppies, tiger fish, and goldfish. The fish in each of these species are now randomly assigned within their block to one of the three treatments: new food #1, new food #2, and old food. This would give you a three by three design or nine possible conditions: guppy/new food 1, guppy/new food 2, guppy/old food, goldfish/new food 1, goldfish/new food 2, goldfish/old food, tiger fish/new food 1, tiger fish/new food 2, and tiger fish/old food.)

Consider a normal distribution with µ = 65 and σ = equals 4. A sample of size 950 is drawn from this population. Approximately how many of the 950 cases would you expect to find between 57 and 73?

903

You've computed the following least-squares regression line using a sample of college students: ŷ = 55 + 5x, where: x = hours of study per day y = test score (ranges from 0 to 100). Using this equation, which is the predicted test score for an individual who studies 8 hours a day?

95

Suppose a population of individuals has a mean weight of 160 pounds, with a population standard deviation of 30 pounds. According to the empirical rule, what percent of the population would be between 100 and 220 pounds?

95%

A doctor conducts blood tests on a random sample of 800 White adult men. He finds that 275 have high white blood cell counts. Using the probability that a white adult man with a high white blood cell count contracts leukemia is .35, compute the expected number of men in the sample who will contract leukemia.

96

If you randomly selected someone from a normal population, what is the probability, to the nearest whole percent, that his or her z-score would be above z = -1.8?

96%

Many statisticians say that the U.S. Census, which attempts to count every population member directly, is significantly less accurate than a count estimated by random sampling. Why might a count estimated from random samples be more accurate than a census? (Choose the best answer.)

A census often can't find every population member, so some groups (such as the homeless) are often under-represented.

The regression equation ŷ = 1278.5 - 0.5x shows the relationship between the number of calories consumed in a day (x) and marathon times in minutes (y) in a sample of world-class distance runners. Interpret the meaning of the slope in the equation stated above.

A one-calorie increase in consumption per day results in a predicted decrease of 0.5 minutes in marathon time.

If we wanted to gather a sample representing all residents in a town, what is the problem with drawing a simple random sample from the local phone book?

A phone book isn't a complete listing of all population members.

A company wants to obtain information on students aged 17 to 22. This company randomly selects 2,000 students from the mailing lists of four universities. (Every student attending one of these universities is on the respective mailing lists.) Then the company sends the questionnaires to the sample of 2,000. What type of sample is this?

A simple random sample of all students at these four universities.

Which of the situations below probably does not have a lurking variable operating in some way? Choose the best answer.

Beaches with more sand than rocks tend to be older.

You're interested in drawing some conclusions about the political attitudes of college undergraduates. Your objective is to administer a survey to a representative sample of students. You choose a group of all seniors to be in the sample.

By choosing only seniors, your sample won't be a representative sample of students.

Your class is participating in an Internet game show and must choose whether a prize is between door #1 or door #2. You take a vote. The numbers for the doors are an example of which kind of data?

Categorical

When your class participates in an Internet game show and counts the votes for door #1 and door #2, the counts are examples of what kind of data?

Counted numerical

Which of the following represents a plot or graph of the cumulative counts across each of the intervals or midpoints?

Cumulative frequency plot (This indicates the cumulative count of observations across each of the intervals.)

Assume that normal curve A and normal curve B have identical population means. Assume further that A has a greater population standard deviation than B. Which curve is taller, and why??

Curve B is taller because smaller standard deviations produce thinner curves.

Which phase of inferential statistics is sometimes considered to be the most crucial because errors in this phase are the most difficult to correct?

Data gathering (Data gathering is often considered the most critical phase of inferential statistics. It's crucial to have an unbiased and representative sample for a statistical study. It's also usually the most time-consuming phase.)

A family's phone number qualifies as what kinds of data?

Discrete and categorical

Which of the following distributions is the least likely to fit a normal distribution?

Distribution of the ages of automobiles in driveable condition.

What is the term for organizing and summarizing data without a particular question in mind?

Exploratory data analysis (In the term exploratory data analysis the word exploratory implies that researchers are looking at the data but not expecting to find a particular pattern.)

You have the following information on the free-throw attempts and successes of three basketball players on their home court and on visiting courts: Your friend Sal tells you that if you pick a successful free throw he'll buy you a free dinner. The free throw can be made by any of the players at a home or away court. Which of the following events would give you the greatest chance of winning a free dinner?

Fabrio Doe shooting a free throw on an AWAY court

Which of the following scenarios is consistent with the expectations of the law of large numbers?

Getting 100 sixes after 600 separate rolls of a single die.

You have two specially created dice. Each die has six sides. The first die has the numbers {1, 3, 5, 7, 9, 11} on the sides. The second die has the numbers {2, 4, 8, 10, 11, 15} on the sides. Which of the following would not represent an instance of independent events when using these dice. Any of the die can be rolled first.

Getting a 9 on the first rolled die AND getting a sum of 19 on the two rolled die combined.

Besides phone numbers, what are the other categorical variables in your survey? (The survey asks for family size, the kind of pets they have, the grade of the youngest child in the family, the family's annual income in dollars, what the dad does for a living, whether the mom works, and their phone number.)

Grade of youngest child, dad's occupation, whether mom works, and kind of pets

Mr. Thompson wants to curve student's exam scores based on the highest score in the class. He takes the highest score (which happens to be an outlier) and treats it as the perfect score. He then computes everyone else's score as a percentage of this perfect score. You're smart and complain that his method is not resistant. What would be a more resistant method of grading these exams?

Grading scores relative to the median score

The idea that subjects in an experiment change their performance on a task simply because they're being observed is potential bias known as the:

Hawthorne effect

Virginia is going to use a systematic sample to choose a sample of 1/10 of her sophomore class. She looks in the first ten names and sees the name of her friend Elena there. She figures that Elena would like to be in the survey, so she begins her systematic sample there. Then she selects every tenth name on the alphabetical list after that. Which of the following statements are true? I. Her sample is not a probability sample. II. By including Elena, she has increased the likelihood of a response bias. III. Her sample suffers from under-coverage.

I and II

A random sample of 15,000 people is selected from local telephone books. 3,000 return the questionnaire. This survey suffers from I. Sampling bias II. Response bias III. Question-wording bias

I and II only

A teacher wants to compare the mean GPA of statistics students to mean GPA of all students at her high school. She looks up the GPAs of the statistics students and finds the mean. She then compares this mean to the mean GPA of all students. Which of the following are true? I. This teacher will be relying on available evidence, or secondary data. II. This is an observational study III. This is an experiment, since there is a comparison of a sample to a population.

I and II only

Below is a residual plot from MINITAB: FITS1 = predicted y-values using regression equation RESI1 = residuals from regression model (randomly scattered plot w/ values below & above x-axis, no clear pattern at all) Based on this graph, you can say: I. The original relationship between the two variables could have been linear because of the random pattern in this scatterplot. II. It's not possible to know about the original relationship between the two variables because the x-axis is labeled with the predicted values of y instead of the x-values of the data. III. This scatterplot has the same pattern about y = 0 as it would if the horizontal axis were the x-variable.

I and III onli (The random pattern of the residuals suggests that the original relationship may be linear. In addition, using the x-variable on the horizontal axis instead of the predicted values of y will give the same scatterplot.)

A major reason to use a systematic random sample rather than a simple random sample would be: I. It's often easier to do than other types of random sampling. II. The sample is more likely to be random. III. It has a good chance of being representative of the population.

I and III only

Which of the following are true? I. Two events are mutually exclusive if they can't both occur at the same time. II. Two events are independent if they have the same probability. III. An event and its complement have probabilities that always add to 1.

I and III only

Which of the following is (are) true? I. If a linear model for a scatterplot is appropriate, the residuals will be more-or-less normally distributed about y = 0. II. If a linear model for a scatterplot is appropriate, there should be a distinctive linear pattern in the residual plot. III. If a linear model for a scatterplot is appropriate, the pattern in the residual plot should be random.

I and III only

Which of the following statements about influential points is true? I. Removing an outlier from a data set can have a major effect on the regression line. II. If you calculated the residual between the outlier and a regression line based on the rest of the data, it would probably be large III. You will typically find an outlier horizontally distant from the rest of the data along the x-axis.

I and III only

Which of the following are true? I. There's always a treatment in an experiment. II. An observational study is one type of experiment. III. Sample surveys are experiments.

I only

Which of the following is a sample of the population of bookstores on the West Coast of the United States? I. All bookstores in California II. Randomly selected children's bookstores III. Internet bookstores

I only (Only bookstores in California count as a sample of bookstores on the West Coast of the U.S. The others may overlap with the population of West Coast bookstores, but these groups probably have some bookstores that aren't on the West Coast.)

A surveyor for a vegetarian magazine asks the following question as part of a survey: "The U.S. Government subsidizes the cost of food and water for cattle. Given that most agricultural land is for feeding and grazing cattle, while people starve to death, do you favor taking away government subsidies, so the consumer pays the full price for beef?" The survey appeared in a national magazine and readers were requested to return their response to a provided address. This survey will most likely suffer from: I. Sampling bias II. Response bias III. Question-wording bias

I, II and III

Which of the following are considered to be important principles of experimental design? I. Control II. Randomization III. Replication

I, II and III

Each of the following data sets has a mean of 40. I {38, 43, 47, 27, and 45} II {41, 40, 39, 42, and 38} III {59, 41, 53, 17, and 30} Estimate their population standard deviations (represented by sigma: σ) and list them from smallest to largest according to standard deviation size.

II (1.41), ! (7.16), III (15.23)

Which of the following would be an example of the use of inferential statistics? I. You have your entire class's math grades, and you calculate the average math grade for your class. II. You have your entire class's math grades and you use the grades to find the average math grade for everyone in your school who's taken the same math course. III. You have your entire class's math grades, and you use these grades to estimate the average math grade for the same math course at another school.

II and III

Which of the following are true? I. If events A and B are independent, P(A and B) = P(A) x P(B) II. If events A and B are independent, P(A) = P(A|B) III. If events A and B are disjoint, P(A or B) = P(A) + P(B) - P(A and B)

II and III only (Statement II tells you that when the events are independent, the probability of a single event is equivalent to the probability of that event given a second event. This makes sense because, if B has no influence on A then P(A | B) should be the same as P(A). In statement III, you see the general rule for the union of events. This rule is applicable whether or not the events are disjoint.)

A poll found that 81% of U.S. parents say they have spoken with their teenagers about the dangers of drinking and driving. Only 64% of teens of the same families say they remember such a discussion. What type of bias is most likely at work here? I. Sampling bias II. Response bias III. Question-wording bias

II only

Which of the following statements about outliers is true? I. Removing an outlier from a data set can have a major effect on the regression line. II. If you calculated the residual between the outlier and a regression line based on the rest of the data, it would probably be large III. You will typically find an outlier horizontally distant from the rest of the data along the x-axis.

II only

The following hypothetical data set shows the purchase prices (in thousands) for a sample of 3-bedroom, 2-bathroom homes in Essex County, MA, over the past year. Compute the five-number summary and create a modified box-and-whisker plot. How many outliers are present in this distribution? 250, 254, 320, 342, 221, 235, 210, 426, 210, 298, 231, 254, 278, 234, 236, 235, 300, 401, 129, 234, 235, 235, 245

In this distribution Q1 = 234, Q3 = 298, IQR = 64, and IQR AP Statistics 1.5 = 96. Therefore, the threshold values for outliers are 138 and 394. You can see that three houses (129, 401, and 426) fall outside the threshold values.

In a normally distributed population of camshaft diameters, 70% of the camshafts are greater than 33.40 millimeters in diameter. State this as a percentile.

In this population, a 33.40 millimeter diameter camshaft is at the 30th percentile.

Which of the following could not be an example of secondary data?

Information about the effects of heavy metal music on violent behaviors, based on the stories of three juvenile offenders in New York City.

Which of the following statements is true for numerical data?

It can be measured.

Which of the following is a characteristic of a census?

It gathers data from every member of a population.

On your TI-83, you enter a set of x-values in list L4 and a set of y-values in list L6. The regression equation is stored in Y1. You move the cursor on top of L5 and write a formula that will produce the set of residuals for the ordered pairs begin ordered pair (x, y) ordered pair. That formula is:

L6 - Y1(L4)

One important advantage a MINITAB scatterplot has over a TI-83 scatterplot is:

MINITAB scatterplots have more detail than TI-83 scatterplots

What measure of central tendency do you use with standard deviation?

Mean

From left to right (smallest to largest), what is the order of the different measures of central tendency in a negatively (left) skewed distribution?

Mean, median, mode (The mode is the peak of the curve, left of that is the median, which is the middle value of the distribution, and furthest over on the left is the mean.)

Assume that you have data showing that small dogs bite more people nationwide. But when you separate the data into two sets, for urban areas and for rural areas, you find that large dogs are more likely to bite people in each of these types of areas. What would be the likely reason for your findings?

More dog bites occur in cities, and people living in the city who own a dog are more likely to keep a small dog.

Every ten years the United States takes a census, which is a survey of every person in the country. If you took the census data that told you the number of people in the United States, and if from all of those numbers you calculated the mean age, what symbols would you use to represent these numeric facts?

N and µ. (A census counts every member of a given population (though in practice it isn't always successful at reaching everyone). The symbol for the parameter population size is N, and the symbol for the parameter population mean is µ (mu). n and x-bar are the symbols for the sample statistics.)

The proper notation for a normal distribution with a mean of 250 and standard deviation of 25 is:

N(250,25)

In terms of standard deviations, where are the inflection points in a normal curve?

One standard deviation left AND one standard deviation right of the mean

A 9th grade class is chosen for an experiment. A random half of the students receive specialized one-on-one tutoring in the classroom twice a week. The other half of the students work in study groups in the same classroom as the one-on-one students. The experiment shows that test scores increased at the same rate for both the experimental and control conditions. This is surprising, given that almost every other study of this specialized training showed dramatic improvements in the treatment group. What might be going on in this experiment?

The treatment group was not separate enough from the control group

If you have a data set of 40 whole numbers, which of the following could be true about the five-number summary?

The upper quartile does not have to be a whole number.

An outlier point is added to a set of bivariate data. In what way must your analysis change with the addition of the outlier?

The value of the correlation coefficient will move closer to 0.

In which of the following scenarios would it be most acceptable to do an interpolation using a least-squares regression line?

There's a strong negative correlation, and residuals are randomly scattered around the line y - ŷ = 0.

Consider the data set A: (2,8), (3,6), (4,9), and (5,9). Which of the following is the proper interpretation of the r² value?

Thirty percent of the variation in the y-values can be explained by knowledge of the x-values.

Your good friend Bill tells you that he has proof that not wearing your seat belt is safer than wearing one. His friend, Darlene, was in a bad accident. She wasn't wearing her seat belt and was thrown clear of the car. The car immediately burst into flames. Darlene would have been killed if she hadn't been thrown clear. Which of the following statements is true?

This is an example of anecdotal evidence and shouldn't be used to make a decision about wearing seat belts.

You want to demonstrate that people are influenced by a cola's product name over product quality. You set up two tables on opposite sides of the quad of your school, so people can taste a name brand and a store brand cola and tell you which they prefer. At one table the tasters drink the cola's in unmarked cups, so that they don't know which cup contains which cola. At the other table, the subjects drink each cola from their original cans. You analyze the results and report your findings in the school newspaper. Which of the following is true:

This is an experiment, since you imposed a treatment on some subjects and left others untreated. (Good. This is an experiment. The treatment allowed the subjects to know which brand of cola they were drinking. Then you could measure these subjects' response against the untreated group who didn't know which cola they were drinking. The groups weren't selected randomly, but that doesn't mean it wasn't an experiment-although it may mean it wasn't a very good one!)

Let's say you're interested in the effects on boys of different dosage levels of a new drug for the treatment of Attention Deficit Disorder (ADD). You set up an experiment to consider the factor of dosage with two levels (300 mg vs. 500 mg). What would be the different treatment groups of the experiment within each block?

Three groups: placebo drug/300 mg of new drug/500 mg of new drug (If you're interested in whether the drug itself has an effect and whether the level of dosage also has an effect, you need three different treatment groups. Remember, one of these groups will still receive the placebo drug, though you're not interested in varying the dosage of the placebo drug. The two "new drug" groups simply vary on the dosage level.)

The capital sigma in the r formula means that you:

add all the products of the standardized values of x and y.

Which of the following is not an example of a leading question?

all are leading questions (this incl. that one w/ the statistic about the landfill, that confused me but it was def leading)

At the start of a Scrabble game you turn over the 100 lettered tiles so you can't see them. There are four S's and two blanks among the 100 tiles. If you pick one tile at random, what's the probability you will not get an S or a blank? - 94/100 - 94% - .94

all of the above

Which of the following would most likely be graphed as a bar chart rather than a histogram? - number of students that use windows laptops vs. macintosh ones - the number of cars in each color in a parking lot - (one other option I forgot, obv. categorical though)

all of the above (Each of these are examples of categorical data. They're counts of the members of a category rather than measured values of a numeric variable.)

A normal curve table tells you that the probability lying below z = -1 is .1587. This can be interpreted as:

all of the above (The numbers in a standard normal probability table (also called a table of normal curve areas) can be interpreted as areas or probabilities. They can also be interpreted as population proportions or relative frequencies. Also, since any single value has no area under the curve, saying"below" and"at or below" mean essentially the same thing.)

A histogram class is a collection of all the observations that fall between two:

class limits

Consider the following information about three populations: Red Salmon: mean = 12 lbs. standard deviation = 3 lbs. % of salmon > 18 lbs. = 20.0% % of salmon < 6 lbs. = 10.0% Blue Salmon: mean = 15 lbs. standard deviation = 5 lbs. % of salmon > 25 lbs. = 2.5% % of salmon < 10 lbs. = 16.0% White Salmon: mean =18 lbs. standard deviation = 4 lbs. % of salmon > 18 lbs. = 30.0% % of salmon < 10 lbs. = 30.0% Which of the following populations may be normally distributed?

blue salmon only

If two variables in a sample data set are positively associated, which component of the least-squares regression line must be positive?

b₁, or the regression coefficient

Let's say that a researcher administers a new type of vitamin supplement to a sample of 30 rats. Thirty other rats didn't receive the supplement. Later, he compares the weights of the supplement group with the non-supplement control group. In this case the rats' weights are an example of which type of data?

continuous data (Weights are measured, so the data are continuous.)

Inferential statistics is used in each of the following except:

creating a pictograph of the number of people struck by lightning each year. (In creating a pictograph, you haven't attempted to predict or compare anything. You've used descriptive statistics.)

Most statisticians use statistics instead of parameters because:

data from an entire population is almost always very difficult to obtain.

The newspaper uses a line graph to show the performance of stocks over the last month. This is an example of:

descriptive statistics (The data were gathered and organized into a graph, which is an example of descriptive statistics.)

You have a bag of different flavored candies. The probabilities of randomly selecting a particular flavored candy are shown below: Flavor Probability Cherry .245 Lime .325 Lemon .180 Orange .215 Grapefruit .035 The probability of selecting an orange candy OR a cherry candy is an example of:

disjoint events

A plant researcher identifies a sample of plots of land to examine the effect of a new fertilizer on the speed of growth of a particular crop. She treats half the plots of crops with the new fertilizer (NewFert), and the other half with the more traditional brand (FertiWhammo). She observes that crop growth in both sets of plots is almost identical. The data collection strategy she has used is a (an):

experiment

When looking at a scatterplot of two variables, the variable along the horizontal axis is typically referred to as the:

explanatory variable

If you're attempting to predict a value of the response variable using a value of x that is outside the range of observed x-values in your data set, you're conducting a process of:

extrapolation

Which of the following should often be avoided?

extrapolation

You've computed the following least-squares regression line using a sample of college students: ŷ = 55 + 5x, where: x = hours of study per day y = test score (ranges from 0 to 100). Suppose the maximum number of hours of study among the students in your sample is 6. If you used the equation to predict the test score of a student who studied 8 hours a day, your prediction would be considered a (an):

extrapolation

True or False: A doctor gives a pill to a patient, and the patient gets better. This must be an example of the placebo effect.

false

True or False: A normal quantile plot is always a straight line.

false

True or False: A randomly selected sample is made up of any group of population members that's easy to find.

false

True or False: All positive correlations indicate stronger relationships than all negative correlations.

false

True or False: An r of -1.0 proves a strong cause and effect relationship between x and y.

false

True or False: Degrees of freedom are used to calculate both the population standard deviation and sample standard deviation formulas.

false

True or False: For all normal distributions you can use a standard normal probability table to find the x-value and convert it into a proportion of the normal curve.

false

True or False: If a researcher uses a sample to make inferences about a population, the most important consideration is that the sample be large enough to make generalizations from it.

false

True or False: If you want to perform an interpolation, you'd need a residual plot that forms a clear, observable pattern such as a curve.

false

True or False: In a histogram, a single building or class contains all the values of the data set.

false

True or False: In this study, there are 18,512 male hunters. And, there are 12,603 male baseball players. If you add the male hunters and male baseball players together, you get 18,512 + 12,603 = 31,115. The total number of men surveyed was 109,059. Therefore, assuming the sample is random and representative of the U.S. population, the probability that a randomly selected man in the U.S. either hunts or plays baseball is 31,115/109,059 = 28.5%. (Hint: Look carefully at how the survey was answered.)

false

True or False: The bigger the sample, the smaller the bias.

false

True or False: We usually don't have to sample because we can always gather data from every population member.

false

True or False: When finding an area between two z-scores, sometimes the only table that gives the correct answer is one that gives the area below the z-score, and sometimes you must use a normal distribution table that gives the area between the mean and the z-score . If you use the wrong type of normal distribution table you'll get the wrong answer.

false

True or False: When using r² to assess a data set with three explanatory variables, r² measures how each explanatory variable can predict the other two explanatory variables.

false

True or False: You perform a study to see how long the grass in your yard will live if you don't water it all summer. You conclude that no one's yard can live for more than three weeks without water. This is an example of descriptive statistics.

false

True or False: You're looking at the results of a sample, which are several measurements all taken under the same conditions. By looking at the numbers you can tell if the sampling or the way the measurements were taken was biased.

false

True or False: You don't need to do a residual plot to determine the appropriateness of a linear model if the original scatterplot and regression line are clearly linear.

false (A linear scatterplot and regression line aren't sufficient evidence for concluding that you constructed the most appropriate linear model. An apparent linear pattern in the original data may in fact be very non-linear. You must look at the original scatterplot and do a residual analysis.)

True or False: If the population of interest is all day care centers in the United States, a sample of day care centers could be either all day care centers in New York City or a randomly selected group of day care centers throughout the United States. Either sample is equally good.

false (A simple random sample of day care centers in the U.S., rather than a sample that comes from only one city, is more likely to produce statistics that accurately estimate the population parameters you're interested in. This is because in an SRS each population member is equally likely to be chosen, regardless of its characteristics.)

True or False: If your sample is made up of power tools randomly selected from one hardware store, your population of interest is all power tools sold in hardware stores

false (If you randomly select the tools, but only from one hardware store, the relevant population is all power tools in that hardware store alone. You can't assume that all hardware stores carry the same kinds of power tools.)

True or False: A population contains 60% women and 40% men. To reflect the population group, we make sure that our sample also contains 60% women and 40% men. This is an example of a simple random sample.

false (In a simple random sample, any possible combination of people must be equally likely. This statement describes a sample in which there are restrictions.)

True or False: A large randomly selected sample always gives a better estimate of the population than a small randomly selected sample.

false (Remember though, smaller samples can still work very well if the sample is representative of the population. It's even possible for a very large simple random sample to give a less accurate estimate than a smaller simple random sample, since the sample members are drawn randomly and you never know exactly what you'll get.)

True or False: For large populations, 1,000 is the best sample size.

false (The best size for your sample depends on many factors, including the shape of the distribution and acceptable margin of error in your study. Sometimes you may need a sample size of fewer than 1,000, and sometimes you may need a size greater than 1,000.)

True or False: The mean and standard deviation are usually not used together because of outliers.

false (The mean and standard deviation should be used together since the standard deviation measures deviation from the mean. Keep in mind, however, that the mean is sensitive to the effect of outliers)

True or False: In a normal distribution where the mean is 50 and the standard deviation is 5, you can use a normal curve table to find the relative frequency of observations where x = 42.

false (The normal curve table can only give relative frequencies for ranges of values, not for individual values.)

True or False: You want to know how often residents of cold climates vacation in warm destinations. You randomly sample 50 residents and find out how many annual trips to warm destinations they've taken during their adult lives. True or False: This is an example of discrete data.

false (The number of trips is counted and therefore discrete.)

True or False: The sample size you need to estimate the population distribution should always be at least 10% of the population size.

false (The sample size you need isn't dependent on population size. A sample of 1,000 to 1,500 observation is usually enough to give a reliable estimate of the distribution of a variable in the population, no matter how big the population is. With as few as 50 observations you can start to get a general idea of the shape, the mean, and the standard deviation.)

True or False: The point where the tails of a normal curve reach the x-axis is the exact point where the upper and lower extremes of the population are located.

false (The tails of the curve never touch the x-axis)

True or False: The scatterplot of residuals vs. the original x-values is exactly the same as the scatterplot of residuals vs. predicted values of y.

false (The visual images are the same in that the residuals are the same distances above and below y = 0 for both graphs. However, they aren't exactly the same scatterplot, since the horizontal axis is labeled differently.)

True or False: The 68-95-99.7 rule is characteristic of all continuous distributions observed in statistics

false (this applies only to all normal distributions)

True or False: The age data set below has no outliers. (25, 28, 28, 28, 29, 29, 30, 32, 32, 33, 33, 33, 34, 34, 36, 39, 39, 39, 40, 43, 44, 45, 50, 55)

false, there is one outlier

Based on the data in the table below, what is the smallest number of births (in thousands) a month could possibly have and still be an upper outlier?

for a data set like births, apparently the upper + lower outliers need to be whole numbers. ie. 357.7 is the correct calculation, but 358 is the correct answer

A normal quantile plot:

graphs raw scores against z-scores of percentile ranks.

The kind of sampling strategy least likely to produce statistics that are good estimates of population parameters is a:

haphazard sample

The normal probability plot on a TI-83 graphing calculator is located:

in the stat plot section.

You have a bag of different flavored candies. The probabilities of randomly selecting a particular flavored candy are shown below: Flavor Probability Cherry .245 Lime .325 Lemon .180 Orange .215 Grapefruit .035 Your buddy Earl likes the lime candies. He eats a bunch of them and hands the bag back to you. The complement of the event P(selecting a Lime candy from the bag) would:

increase (Removing lime candies would reduce the probability of drawing a lime candy from the bag. In turn, the complement of the event (the probability of selecting a non-lime candy) will increase since the complement equals one minus the probability of the original event.)

You claim that you're healthier than your friends. To support your claim, you randomly select some of your friends and track their meals for a month. You also track your meals during the same month. What you are doing is:

inferential statistics (This is an example of inferential statistics. The data you collected is a sample used to infer whether you're healthier than your friends.)

A new observed data point is included in set of bivariate data. You find that the slope of the new regression line has changed from 1.7 to 1.1, and the correlation coefficient only changed from +.60 to +.61. This new data point is probably a (an):

influential point (Right. An influential point is a data point that has a strong effect on the slope of the least-squares regression line; typically it has a smaller effect on the correlation coefficient.)

A university involved in conducting social science experiments and surveys places an advertisement in the classifieds to hire a racially and ethnically diverse set of individuals to administer surveys. The surveys will assess opinions on welfare policy and discrimination. The University plans to use a random sample of these new employees to help conduct an upcoming experiment with a random sample of individuals in a major city. By hiring this diverse group of employees, the University is hoping to reduce the problems of _____________ in the experiment.

interviewer-induced bias

A police department commissioned a door-to-door survey of citizens on whether they commit minor law infractions, such as speeding or illegal parking. The interviewers told the interviewees that they would not be investigated or ticketed for any minor infractions they may have committed. An analysis suggested that there was bias in the responses. This would likely be an example of a (an):

interviewer-induced bias.

A residual:

is how much an observed y-value differs from a predicted y-value.

An important advantage of using a randomized block design in an experiment is:

it controls for the effects of factors that may confound your results.

Pearson's correlation coefficient (r) is considered a symmetric measure because:

it will be the same regardless of which variable is the x and which is the y. (Unlike the regression coefficient, it doesn't matter which is the x variable and which is the y variable. The correlation coefficient will be the same. This correlation simply measures the strength and direction of the relationship between two variables.)

Histograms are most useful in displaying:

large numeric data sets (Histograms are used to display frequencies across intervals of numeric data.)

The general equation for a function that would model data with an exponential relationship is y = a × b^x. If the data has been transformed by taking the logarithms of the response variable, then the general formula for the regression line is:

ln y = ln a + x ln b

If the association between two variables is exponential, which of the following is the general form of the regression equation for the transformed data?

ln ŷ = a + bx

In inferential statistics, variation is an essential measurement for:

making predictions.

(This question is based on question 10.38, page 413, in Introduction to Probability and Statistics, by William Mendenhall, Barbara Beaver, and Robert Beaver.) An experiment was conducted to compare the mean reaction times to two types of traffic signs: No Left Turn and Left Turn Only. Ten drivers were included in the experiment. Each driver was presented with 40 traffic signs - 20 No Left Turn and 20 Left Turn Only - in random order. The mean reaction time to each type of sign was recorded for each driver. So, for example, individual #1 reacted, on average, within 824 milliseconds to the No Left Turn and within 702 milliseconds to the Left Turn Only sign. The design of this experiment most closely resembles a:

matched pairs before and after design. ( In this design, you're measuring the same individual's performance on the two different tasks.)

You've read a story in the New York Times claiming that individuals who engage in aerobic exercise for at least an hour a day demonstrate fewer symptoms of depression. You read that an experiment was conducted in which a researcher first administered a survey on depression and self-esteem to 100 individuals and then taught them some proper techniques of aerobic exercise. The 100 individuals were then sent off to exercise at least one hour a day. After two months, the depression and self-esteem survey was administered again and showed that depression symptoms declined and self-esteem increased. The experimental design used here is a:

matched pairs before and after design. (This is a matched pairs before and after design. Information on depression and self-esteem is collected from the same individuals before and after the treatment (exercise), and the change in depression and self-esteem is measured in this group. In this design, each subject acts as its own control. While this isn't an ideal design (it's often better to have a separate control group to compare with the treatment group), it is an acceptable way to conduct a study if a separate control group isn't possible.)

If we collect experimental data on alcohol consumption from the same 100 individuals before and then after a course on the dangers of overdrinking, we are using a:

matched pairs design.

A researcher identifies a sample of teenagers and chooses two males, age 16 years old, from lower-income families. She randomly selects one of the males to be in an intensive tutoring course; she places the second male in a standard classroom. She continues doing this with pairs of males and females. This is an example of:

matching pairs for random assignment on gender, age, and income status.

What two population parameters determine the shape of a normal curve? (They make a normal curve tall and skinny or short and fat.)

mean and standard deviation

Since the distribution of housing prices in a community is usually skewed right, which measure of center should you use for housing prices?

median

The goal of the least-squares regression is to compute a line that:

minimizes the sum of the squared residuals.

What is the only measure of center that can be used with categorical, or non-numeric, data?

mode

A class has the following distribution of eye colors: 10 blue, 18 brown, 5 green. Which measure of central tendency should you use to find the eye color of the typical class member? What do you get when you use this measure?

mode; brown

If an observed y-value is below a line of best fit, then the residual is:

negative

You're interested in whether tutoring is more effective when the student volunteers to be tutored. You compare the test scores of two groups of students: those who chose to go to after-school tutoring sessions and those who are required to go to the sessions by the school administration. This is an example of a (an):

observational study

Consider the following four studies: I. A survey of newspaper editors examines their political views about foreign policy issues. The analysis and conclusion involves a summary of the survey results. II. Third graders are randomly sampled and assigned to an intensive language course. Their performance on a year-end language test is compared to students who did not take the course. III. A study compares the change in home values between 1990-1999 in ten different neighborhoods in Seattle. The results show that there's a higher percentage change over time in less affluent neighborhoods. IV. A study compares the durability of machine parts. The comparison looks at machines that were purchased with new metal alloy parts and at those purchased with old metal alloy parts. The results show that durability increases with the use of the new alloy. Which examples are experiments?

only II

A standard deviation calculated from data from an entire population would be called a(n):

parameter

We call any numerical fact about a population a:

parameter

Classify the correlation coefficient -1.0.

perfect negative

An experiment is conducted examining the effects of a new type of toothpaste on reducing cavities. Individuals in the treatment condition are given the new toothpaste; individuals in the control condition are given the standard toothpaste, but are told that they're receiving the new toothpaste. At the end of the experiment, the control group participants demonstrate a slightly higher decline in cavities. This is an example of a(n):

placebo effect

As primary research for one of her books, Shere Hite distributed 100,000 questionnaires to women's groups. 4,500 women responded. Hite found that 96% of the women felt they give more emotional support to than they get from their husbands or boyfriends. Which of the following best describes her sample?

self-selected sample

Influential points and outliers:

should be examined carefully to determine if they're part of the data set.

A synonym for variation is:

spread. (While distance is a component of calculating variation, it's not a synonym.)

What is "Stdev" in the x row of the predictor table mean?

standard error of the regression coefficient

You've drawn a simple random sample from a population. The standard deviation of this sample is a(n):

statistic

You construct a sample in which Whites, Blacks, and Latinos are randomly selected from the U.S. adult population. The composition of your final sample is 25% White, 30% Black, and 30% Latino. You've sampled from all over the country, but the actual composition of the population has a much higher proportion of Whites. This is likely an example of a:

stratified random sample. (Correct. We'd expect that the overall population from which the sample is chosen would be comprised mostly of Whites (much more than 25% of adult population nationwide). Since the sample has disproportionately high percentages of Blacks and Latinos, it's likely that the population was first stratified based on race before random selection. This approach is often used when we believe that a simple random sample will draw too few individuals of a certain characteristic (such as, Blacks or Latinos).)

Describe the strength and direction of a relationship with the correlation coefficient r = -0.8.

strong and negative

You're using a ruler to measure lengths of stick-bugs. The ruler is marked for every centimeter. You end up taking all lengths to the nearest centimeter. (In other words, you can have 5 centimeters and 6 centimeters, but not 5.5 centimeters.) True or False: These measured data are can be called discrete or continuous, depending on how you think of the data.

true (Sometimes the definitions get fuzzy. Measured data are usually thought of as continuous, but in this case the measured data are also discrete because they can only have certain values that are whole numbers. But you can also think of the data as rounded estimates of the true lengths, so in a sense the data are specific points along a continuous number line. Most people would call these lengths continuous data, since they're more measurements than counts.)

True or False: One reason to use a sample to estimate the shape of a population distribution is to determine which statistics are appropriate to use for that variable, since different statistics have different characteristics.

true (There are several different statistics you could use to measure central tendency, but different ones work better with different shapes of distributions.)

True or False: If the total area of all the bars in a histogram is 1, the area of each bar is proportional to the total number of data values.

true (Think about each bar as representing a proportion of the total area. A histogram can be thought of as an "area-picture" of a frequency table; the areas of the bars represent the frequencies, and a large area indicates a large frequency.)

True or False: On the least-squares regression line, the point (x̅, y̅) always has a residual of 0

true (the point (x̅, y̅) is always on the least-squares regression line, so the residual for this point is always 0)

You're shopping for a car, and you log on to a Web site that gives reviews for automobiles. Reviews are written by average people who bought the cars and want to share their experiences with others. You look up the reviews for a Toyhushusta Bearcat sport truck, and find that 79% of the reviewers checked the "I would not buy this truck again" box. Based on that and nothing else, you decide not to buy the truck. Why was your decision too hasty?

undercoverage and voluntary response bias

A z-score is called a standardized score because you can:

use them to compare x-values to a universal standard, in this case, the standard normal distribution.

An outlier:

usually does not have a strong effect on the regression line, can also be an influential point, and may be an error.

A firm administers a survey in the state of New York. The survey is mailed out to a random set of 5,000 households throughout the state, and responses are received from 200 households. The two key questions on the survey are: - Whom do you plan to vote for in the upcoming Senate election, the Democratic or Republican candidate? - Do you agree with the idea of imposing limits on the amount of time a Senator can spend in office? What is a possible problem with this survey?

voluntary response bias

In a normal distribution with a mean of 30 and a standard deviation of 5, you'd find the largest proportion of cases between:

x = 25 and x = 35

A researcher is conducting an experiment on the possible effects of a new pill on weight loss among obese men and women. He has given the new pill to a sample of women and an old pill to the men. The researcher is comparing which group demonstrates the greatest percentage decrease in weight. You believe that his design is flawed. The flaw in his design is:

you can't measure the effect of the new pill on the men, since none of them received this treatment.


Kaugnay na mga set ng pag-aaral

Psychology Quizzes Multiple Choice

View Set

Stigma - social aspects, understanding illness from patient's perspective

View Set

Module 3 Quiz: Computer Hardware

View Set