Stats Midterms Review
40. In a parking lot with 200 cars, 50 cars are white, 30 cars are red, and 20 cars are silver. One car will be selected at random from the parking lot. If each car in the parking has only one color, which of the following cannot be the probability that the selected car will be green?
0.6
Which of the following statistics is defined as the 50th percentile?
The Median
Data were collected on 100 United States coins minted in 2018. Which of the following represents a quantitative variable for the data collected?
The value of the coin
40. Ninety percent of the people who have a particular disease will have a positive result on a given diagnostic test. Ninety percent of the people who do not have the disease will have a negative result on this test. If 5 percent of a certain population has the disease, what percent of that population would test positive for the disease?
14%
The following list shows the number of video games sold at a game store each day for one week. 15, 43, 50, 39, 22, 16, 20 Which of the following is the best classification of the data in the list?
Quantitative and Discrete
The following table shows data that were collected from a random sample of people, who indicated their age and their favorite sporting event to watch on television. Based on the results above, what proportion of the randomly sampled people are over age 12 years?
2300/3500
24. Under which of the following conditions is it preferable to use stratified random sampling rather than simple random sampling?
The population can be divided into strata so that the individuals in each stratum are as much alike as possible.
6. The distribution of monthly rent for one-bedroom apartments in a city is approximately normal with mean $936 and standard deviation $61. A graduate student is looking for a one-bedroom apartment and wants to pay no more than $800 in monthly rent. Of the following, which is the best estimate of the percent of one-bedroom apartments in the city with a monthly rent of at most $800?
1.3%
The relationship between carbon dioxide emissions and fuel efficiency of a certain car can be modeled by the least-squares regression equation , where represents the fuel efficiency, in miles per gallon, and represents the predicted carbon dioxide emissions, in grams per mile. Which of the following is closest to the predicted carbon dioxide emissions, in grams per mile, for a car of this type with a fuel efficiency of 20 miles per gallon?
446
6. The distribution of assembly times required to assemble a certain smartphone is approximately normal with mean 4.6 minutes and standard deviation 0.6 minute. Of the following, which is closest to the percentage of assembly times between 4 minutes and 5 minutes?
59%
24. The SC Electric Company has bid on two electrical wiring jobs. The owner of the company believes that • the probability of being awarded the first job (event A) is 0.75; • the probability of being awarded the second job (event B) is 0.5; and • the probability of being awarded both jobs (event (A and B)) is 0.375. If the owner's beliefs are correct, which of the following statements must be true concerning event A and event B ?
Event A and event B are not mutually exclusive and are independent.
24. In a recent poll of 1,500 randomly selected eligible voters, only 525 (35 percent) said that they did not vote in the last election. However, a vote count showed that 80 percent of eligible voters actually did not vote in the last election. Which of the following types of bias is most likely to have occurred in the poll?
Response bias
A company determines the mean and standard deviation of the number of sick days taken by its employees in one year. Which of the following is the best description of the standard deviation?
Approximately the mean distance between the number of sick days taken by individual employees and the mean number of sick days taken by all employees
A school nutritionist was interested in how students at a certain school would feel after taking a nutritional supplement. The nutritionist selected a random sample of twenty students from the school to participate in the study. Participants were asked to keep a journal on how well they felt after taking the supplement each day. What possible source of bias is present in the method of data collection?
Response bias where responses are self-reported
Which of the following scatterplots could represent a data set with a correlation coefficient of r = -1?
Negative line, high to low
24. The computer output below shows the result of a linear regression analysis for predicting the concentration of zinc, in parts per million (ppm), from the concentration of lead, in ppm, found in fish from a certain river. Which of the following statements is a correct interpretation of the value 19.0 in the output?
On average there is a predicted increase of 19.0 ppm in concentration of zinc for every increase of 1 ppm in concentration of lead found in the fish.
Show all your work. Indicate clearly the methods you use, because you will be scored on the correctness of your methods as well as on the accuracy and completeness of your results and explanations. Hurricane damage amounts, in millions of dollars per acre, were estimated from insurance records for major hurricanes for the past three decades. A stratified random sample of five locations (based on categories of distance from the coast) was selected from each of three coastal regions in the southeastern United States. The three regions were Gulf Coast (Alabama, Louisiana, Mississippi), Florida, and Lower Atlantic (Georgia, South Carolina, North Carolina). Damage amounts in millions of dollars per acre, adjusted for inflation, are shown in the table below. a. Sketch a graphical display that compares the hurricane damage amounts per acre for the three different coastal regions (Gulf Coast, Florida, and Lower Atlantic) and that also shows how the damage amounts vary with distance from the coast.
see pic on phone
A moving average for data collected at regular time increments is the average of data values for two or more consecutive increments. The 4-year moving averages for the typhoon data are provided in the table below. For example, the Eastern Pacific 4-year moving average for 2000 is the average of 22, 16, 15, and 21, which is equal to 18.50. c. Show how to calculate the 4-year moving average for the year 2010 in the Western Pacific. Write your value in the appropriate place in the table.
The four-year moving average for the year 2010 in the Western Pacific Ocean is (28+27+28+18)/4 = 25.25 The value is written in the table as follows.
Zucchini weights are approximately normally distributed with mean 0.8 pound and standard deviation 0.25 pound. Which of the following shaded regions best represents the probability that a randomly selected zucchini will weigh between 0.55 pound and 1.3 pounds?
The one with u-o, u, u+2o
A local arcade is hosting a tournament in which contestants play an arcade game with possible scores ranging from 0 to 20. The arcade has set up multiple game tables so that all contestants can play the game at the same time; thus contestant scores are independent. Each contestant's score will be recorded as he or she finishes, and the contestant with the highest score is the winner. After practicing the game many times, Josephine, one of the contestants, has established the probability distribution of her scores, shown in the table below. Crystal another contestant, has also practiced many times. The probability distribution for her scores is shown in the table below. C. Find the probability that the difference (Josephine minus Crystal) in their scores is -1.
The probability is 0.045 + 0.12 + 0.06 = 0.225
6. The histogram shown summarizes the responses of 100 people when asked, "What was the price of the last meal you purchased?" Based on the histogram, which of the following could be the interquartile range of the prices?
$5
A researcher in Alaska measured the age (in months) and the weight (in pounds) of a random sample of adolescent moose. When the least-squares regression analysis was performed, the correlation was 0.59. Which of the following is the correct way to label the correlation?
0.59
The least-squares regression line summarizes the relationship between velocity, in feet per second, and depth, in feet, in measurements taken for a certain river, where represents velocity and represents the depth of the river. What is the predicted value of , in feet, when?
0.8
6. The dotplot below displays the total number of miles that the 28 residents of one street in a certain community traveled to work in one five-day workweek. Which of the following is closest to the percentile rank of a resident from this street who traveled 85 miles to work that week?
70
Hurricane damage amounts, in millions of dollars per acre, were estimated from insurance records for major hurricanes for the past three decades. A stratified random sample of five locations (based on categories of distance from the coast) was selected from each of three coastal regions in the southeastern United States. The three regions were Gulf Coast (Alabama, Louisiana, Mississippi), Florida, and Lower Atlantic (Georgia, South Carolina, North Carolina). Damage amounts in millions of dollars per acre, adjusted for inflation, are shown in the table below. E. One thousand simulated values of this test statistic, □, were calculated, assuming no difference in the distributions of hurricane damage amounts among the three coastal regions. The results are shown in the table below. These data are also shown in the frequency plot where the heights of the lines represent the frequency of occurrence of simulated values of □.
A Q value of 6.4 or larger occurred in 39/1000 (or 3.9 percent) of the 1,000 repetitions. All 1,000 repetitions of the simulation assumed there was no difference in the distribution of damage amounts among the three regions. This is a fairly small (approximate) p-value (less than 0.05), indicating that a test statistic as large or larger than the observed test statistic of Q = 6.4 would be fairly unlikely to occur by chance alone if there really was no difference among the regions for each distance category. The sample data therefore provide reasonably strong evidence that there is a difference in the distributions of hurricane damage amounts among these three regions.
24. The transportation department of a large city wants to estimate the proportion of residents who would use a system of aerial gondolas to commute to work. The gondolas would be part of the city's effort to relieve traffic congestion. The department asked a random sample of residents whether they would use the gondolas. The residents could respond with yes, no, or maybe. Which of the following is the best description of the method for data collection used by the department?
A sample survey
44. A store owner reports that the probability that a customer who purchases a lawn mower will also purchase an extended warranty is 0.68. Which of the following is the best interpretation of the probability 0.68 ?
For all customers who purchase a lawn mower, 68% will also purchase an extended warranty.
The boxplots above summarize two data sets, A and B. Which of the following must be true? 1. Set A contains more data than Set B. 2. The box of Set A contains more data than the box of Set B. 3. The data in Set A have a larger range than the data in Set B
III. Only
Researchers studying a pack of gray wolves in North America collected data on the length x, in meters, from nose to tip of tail, and the weight y, in kilograms, of the wolves. A scatterplot of weight versus length revealed a relationship between the two variables described as positive, linear, and strong. C. One wolf in the pack with a length of 1.4 meters had a residual of -9.67 kilograms. What was the weight of the wolf?
In general, a residual is equal to actual weight minus predicted weight, or equivalently, actual weight = predicted weight + residual For the wolf with length 1.4 meters and residual of -9.67, the predicted weight is -16.46 + 35.02(1.4) = 32.568 kilograms. Therefore, the actual weight of the wolf is 32.568 + (-9.67) = 22.898 kilograms.
40. The probability that a randomly selected visitor to a certain website will be asked to participate in an online survey is 0.40. Avery claims that for the next 5 visitors to the site, 2 will be asked to participate in the survey. Is Avery interpreting the probability correctly?
No, because 0.40 represents probability in the long run over many visits to the site.
A local arcade is hosting a tournament in which contestants play an arcade game with possible scores ranging from 0 to 20. The arcade has set up multiple game tables so that all contestants can play the game at the same time; thus contestant scores are independent. Each contestant's score will be recorded as he or she finishes, and the contestant with the highest score is the winner. After practicing the game many times, Josephine, one of the contestants, has established the probability distribution of her scores, shown in the table below. Crystal another contestant, has also practiced many times. The probability distribution for her scores is shown in the table below. D. a. The table below lists all the possible differences in the scores between Josephine and Crystal and some associated probabilities. Complete the table and calculate the probability that Crystal's score will be higher than Josephine's score.
P(difference = -1) = 0.225 (from part c) P(difference = -2) = 1 - 0.015 - 0.225 - 0.325 - 0.260 - 0.90 = 0.085 The probability that Crystal's score is higher than Josephine's score is P(difference < 0) = 0.015 + 0.085 + 0.225 = 0.325
24. A new type of fish food has become available for salmon raised on fish farms. Your task is to design an experiment to compare the weight gain of salmon raised over a six-month period on the new and the old types of food. The salmon that you will use for this experiment have already been randomly placed in eight large tanks in a room that has a considerable temperature gradient. Specifically, tanks on the north side of the room tend to be much colder than those on the south side. The arrangement of tanks is shown on the diagram below. Describe a design for this experiment that takes account of the temperature gradient.
Shows four blocks of two tanks each, with the pairs of tanks being in nearly identical conditions with regard to temperature. The blocks are (1,4), (2,3), (5,8), and (6,7). States clearly that this is the potentially most effective arrangement of blocks since these pairs should be the most homogeneous. Explains a correct method for randomly assigning treatments to blocks.
A bank surveyed all of its 60 employees to determine the proportion who participate in volunteer activities. Which of the following statements is true?
The bank does not need to use an inference procedure to determine the proportion of employees who participate in volunteer activities because the survey was a census of all employees.
A taxicab company in a large city charges passengers a flat fee to enter a cab plus an additional fee per mile. There is also a charge for time spent stopped in traffic. The company wants to develop a new method for determining fares based on mileage and a flat fee only, not on time spent stopped in traffic. A random sample of 10 recent cab fares was selected, and the distance, in miles, and the fare, in dollars, were recorded. A regression model was fit to the data, and the output, scatterplot, and residual plot are given below. c. The company wants to know if charging a flat fee of $3.00 and a per-mile charge of $1.50 will maintain its current revenue. Based on the information in part (b), is a flat fee of $3.00 a reasonable value? Explain.
The flat fee of $3.00 is lower than any of the plausible values from the 95% confidence interval for the intercept ($3.61 to $4.98) which was based on the taxi company's original method of calculating fares. Thus, $3.00 appears too low to maintain current revenue.
A taxicab company in a large city charges passengers a flat fee to enter a cab plus an additional fee per mile. There is also a charge for time spent stopped in traffic. The company wants to develop a new method for determining fares based on mileage and a flat fee only, not on time spent stopped in traffic. A random sample of 10 recent cab fares was selected, and the distance, in miles, and the fare, in dollars, were recorded. A regression model was fit to the data, and the output, scatterplot, and residual plot are given below. a. State the equation of the least squares regression line for these data. Define any variables used in the equation.
The regression line equation is ˆy = 4.296+1.229x, where ˆy the predicted taxi fare (in dollars) and x = the distance traveled (in miles).
Researchers studying a pack of gray wolves in North America collected data on the length x, in meters, from nose to tip of tail, and the weight y, in kilograms, of the wolves. A scatterplot of weight versus length revealed a relationship between the two variables described as positive, linear, and strong. B. Interpret the meaning of the slope of the least-squares regression line in context
The slope of 35.02 indicates that two wolves that differ by one meter in length are predicted to differ by 35.02 kilograms in weight, with the longer wolf having the greater weight.
An environmental research agency conducted a study of a certain state's roadsides to estimate the mean number of discarded cans and bottles per mile of public road. The state's public roads were grouped into three types: Major highways: major paved roads designed for high traffic volume Minor highways: smaller paved roads designed for low traffic volume Unpaved roads: gravel and dirt roads There are about 100,000 miles of public roads in the state. The environmental research agency defined a sampling unit to be a one-mile segment of public road. Using a database supplied by the state's department of transportation, the agency randomly selected 30 one-mile road segments for each of the three types of roads. Researchers from the agency searched the roadsides along each of the selected one-mile road segments and recorded the number of discarded cans and bottles. Results are shown in the table below. a. What is the variable of interest in the study? What is the parameter of interest?
The variable of interest is the number of bottles/cans per one-mile segment. The parameter of interest is the mean number of bottles/cans per one-mile segment of all roads in the state
A local arcade is hosting a tournament in which contestants play an arcade game with possible scores ranging from 0 to 20. The arcade has set up multiple game tables so that all contestants can play the game at the same time; thus contestant scores are independent. Each contestant's score will be recorded as he or she finishes, and the contestant with the highest score is the winner. After practicing the game many times, Josephine, one of the contestants, has established the probability distribution of her scores, shown in the table below. Crystal another contestant, has also practiced many times. The probability distribution for her scores is shown in the table below. B. Suppose that Josephine scores 16 and Crystal scores 17. The difference (Josephine minus Crystal) of their scores is -1. List all combinations of possible scores for Josephine and Crystal that will produce a difference (Josephine minus Crystal) of -1, and calculate the probability for each combination
see pic on phone
Students at a local elementary school were shown a painting and asked which emotion—joy, happiness, love, oranger—they felt by looking at the painting. The students were classified by their age. The following table summarizes the responses of the students by age-group. One student from the school will be selected at random. What is the probability that the student is in the age-group of 6 to 8 years given that the selected student responded joy?
28/89
For a roll of a fair die, each of the outcomes 1, 2, 3, 4, 5, or 6 is equally likely. A red die and a green die are rolled simultaneously, and the difference of the outcomes (red - green) is computed. This is repeated for a total of 500 rolls of the pair of dice. Which of the following graphs best represents the most reasonable distribution of the differences?
Starts at -6 ends at 6, and is evenly distributed
Hurricane damage amounts, in millions of dollars per acre, were estimated from insurance records for major hurricanes for the past three decades. A stratified random sample of five locations (based on categories of distance from the coast) was selected from each of three coastal regions in the southeastern United States. The three regions were Gulf Coast (Alabama, Louisiana, Mississippi), Florida, and Lower Atlantic (Georgia, South Carolina, North Carolina). Damage amounts in millions of dollars per acre, adjusted for inflation, are shown in the table below. D. Consider testing the following hypotheses. H0 : There is no difference in the distributions of hurricane damage amounts among the three regions. Ha : There is a difference in the distributions of hurricane damage amounts among the three regions. If there is no difference in the distribution of hurricane damage amounts among the three regions (Gulf Coast, Florida, and Lower Atlantic), the expected value of the average rank for each of the three regions is 2. Therefore, the following test statistic can be used to evaluate the hypotheses above: where is the average rank over the five distance categories for the Gulf Coast (and and are similarly defined for the Florida and Lower Atlantic coastal regions). Calculate the value of the test statistic □ using the average ranks you obtained in part (c).
The calculated value of the test statistic Q is Q = 5[(2.0-2)2+(1.2-2)2+(2.8-2)2]= 5[0+0.64+0.64]=6.4
A moving average for data collected at regular time increments is the average of data values for two or more consecutive increments. The 4-year moving averages for the typhoon data are provided in the table below. For example, the Eastern Pacific 4-year moving average for 2000 is the average of 22, 16, 15, and 21, which is equal to 18.50. a. Consider graph B. i. What information is more apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons? ii. What information is less apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons?
(i) The overall trends across this time period were more apparent with the moving averages than with the original frequencies. The moving averages reduce variability, making more apparent the overall decreasing trend in number of typhoons in the Western Pacific Ocean and the slight increasing trend in the number of typhoons in the Eastern Pacific Ocean. (ii) The year-to-year variability in number of typhoons is less apparent with the moving averages than with the original frequencies
6. A box contains 10 tags, numbered 1 through 10, with a different number on each tag. A second box contains 8 tags, numbered 20 through 27, with a different number on each tag. One tag is drawn at random from each box. What is the expected value of the sum of the numbers on the two selected tags?
29.0
A split ticket is a voting pattern in which a voter casts votes for candidates from more than one political party. In a recent study, 1,000 men and women were asked whether they voted a split ticket in the last election. The totals are shown in the following table. What value of would indicate no association between gender and voting pattern for the people in the sample?
480
Dairy farmers are aware there is often a linear relationship between the age, in years, of a dairy cow and the amount of milk produced, in gallons per week. The least-squares regression line produced from a random sample is . Based on the model, what is the difference in predicted amounts of milk produced between a cow of 5 years and a cow of 10 years?
A cow of 5 years is predicted to produce 5.5 more gallons per week
The ELISA tests whether a patient has contracted HIV. The ELISA is said to be positive if it indicates that HIV is present in a blood sample, and the ELISA is said to be negative if it does not indicate that HIV is present in a blood sample. Instead of directly measuring the presence of HIV, the ELISA measures levels of antibodies in the blood that should be elevated if HIV is present. Because of variability in antibody levels among human patients, the ELISA does not always indicate the correct result. As part of a training program, staff at a testing lab applied the ELISA to 500 blood samples known to contain HIV. The ELISA was positive for 489 of those blood samples and negative for the other 11 samples. As part of the same training program, the staff also applied the ELISA to 500 other blood samples known to not contain HIV. The ELISA was positive for 37 of those blood samples and negative for the other 463 samples. B. Among the blood samples examined in the training program that provided positive ELISA results for HIV, what proportion actually contained HIV?
A total of 489 + 37 = 526 blood samples resulted in a positive ELISA. Of these, 489 samples actually contained HIV. Therefore the proportion of samples that resulted in a positive ELISA that actually contained HIV is 489/526 OR 489/526 ≈ 0.9297
Measurements of water quality were taken from a river downstream from an abandoned chemical dumpsite. Concentrations of a certain chemical were obtained from 9 measurements taken at the surface of the water, 9 measurements taken at mid-depth of the water, and 9 measurements taken at the bottom of the water. What type of study was conducted, and what is the response variable of the study?
An observational study was conducted, and the response variable is the concentration of the chemical.
24. A certain county has 1,000 farms. Corn is grown on 100 of these farms but on none of the others. In order to estimate the total farm acreage of corn for the country, two plans are proposed. Plan I: 1. Sample 20 farms at random. 2. Estimate the mean acreage of corn per farm in a confidence interval. 3. Multiply both ends of the interval by 1,000 to get an interval estimate of the total. Plan II: 1.Identify the 100 corn-growing farms. 2. Sample 20 corn-growing farms at random. 3. Estimate the mean acreage of corn for corn-growing farms in a confidence interval. 4. Multiply both ends of the interval by 100 to get an interval estimate of the total. On the basis of the information given, which of the following is the better method for estimating the total farm acreage of corn for the county?
Choose plan II over plan I
Hurricane damage amounts, in millions of dollars per acre, were estimated from insurance records for major hurricanes for the past three decades. A stratified random sample of five locations (based on categories of distance from the coast) was selected from each of three coastal regions in the southeastern United States. The three regions were Gulf Coast (Alabama, Louisiana, Mississippi), Florida, and Lower Atlantic (Georgia, South Carolina, North Carolina). Damage amounts in millions of dollars per acre, adjusted for inflation, are shown in the table below. C. In the table below, the hurricane damage amounts have been replaced by the ranks 1, 2, or 3. For each of the distance categories, the highest damage amount is assigned a rank of 1 and the lowest damage amount is assigned a rank of 3. Determine the missing ranks for the 10-to-20-miles distance category and calculate the average rank for each of the three regions. Place the values in the table below.
For the "10 to 20 miles" distance category: The Florida region has the most damage (3.0 million dollars per acre) and so has rank 1. The region with the second-most damage is the Gulf Coast (1.7 million dollars), obtaining rank 2. The Lower Atlantic region has the least damage (0.3 million dollars) and so has rank 3. The last columns of the table should be filled in as follows: (See pic on phone) The average ranks are computed (2+2+3+1+2)/5 = 2.0 for the five Gulf Coast damage ranks, (1+1+1+2+1)/5 = 1.2 for the five Florida damage ranks and (3+3+2+3+3)/5 = 2.8 for the five Lower Atlantic damage ranks.
24. A certain county school district has 15 high schools. The high school seniors' plans after graduation in each school vary greatly from one school to the next. The county superintendent will select a sample of high school seniors from the district to survey about their plans after graduation. The superintendent will use a cluster sample with the high schools as clusters. A random sample of 5 high schools will be selected, and all seniors at those high schools will complete the survey. What is one disadvantage to selecting a cluster sample to investigate the superintendent's goal?
The schools in the cluster sample might not be representative of the population of seniors.
The ELISA tests whether a patient has contracted HIV. The ELISA is said to be positive if it indicates that HIV is present in a blood sample, and the ELISA is said to be negative if it does not indicate that HIV is present in a blood sample. Instead of directly measuring the presence of HIV, the ELISA measures levels of antibodies in the blood that should be elevated if HIV is present. Because of variability in antibody levels among human patients, the ELISA does not always indicate the correct result. As part of a training program, staff at a testing lab applied the ELISA to 500 blood samples known to contain HIV. The ELISA was positive for 489 of those blood samples and negative for the other 11 samples. As part of the same training program, the staff also applied the ELISA to 500 other blood samples known to not contain HIV. The ELISA was positive for 37 of those blood samples and negative for the other 463 samples. C.. When a blood sample yields a positive ELISA result, two more ELISAs are performed on the same blood sample. If at least one of the two additional ELISAs is positive, the blood sample is subjected to a more expensive and more accurate test to make a definitive determination of whether HIV is present in the sample. Repeated ELISAs on the same sample are generally assumed to be independent. Under the assumption of independence, what is the probability that a new blood sample that comes into the lab will be subjected to the more expensive test if that sample does not contain HIV?
From part (a), the probability that the ELISA will be positive, given that the blood sample does not actually have HIV present, is 0.074. Thus, the probability of a negative ELISA, given that the blood sample does not actually have HIV present, is 1 - 0.074 = 0.926. P(new blood sample that does not contain HIV will be subjected to the more expensive test) = P(1st ELISA positive and 2nd ELISA positive OR 1st ELISA positive and 2nd ELISA negative and 3rd ELISA positive | HIV not present in blood) = P(1st ELISA positive and 2nd ELISA positive | HIV not present in blood) + P(1st ELISA positive and 2nd ELISA negative and 3rd ELISA positive | HIV not present in blood) = (0.074)(0.074) + (0.074)(0.926)(0.074) = 0.005476 + 0.005070776 = 0.010546776 ≈ 0.0105 OR P(new blood sample that does not contain HIV will be subjected to the more expensive test) = P(1st ELISA positive and not both the 2nd and 3rd are negative) = (0.074)(1 - 0.9262 ) = (0.074)(0.142524) = 0.010546776 ≈ 0.0105
Which of the following statements about a least-squares regression analysis is true? 1. A point with a large residual is an outlier. 2. A point with high leverage has a -value that is not consistent with the other -values in the set. 3. The removal of an influential point from a data set could change the value of the correlation coefficient.
III. Only
Hurricane damage amounts, in millions of dollars per acre, were estimated from insurance records for major hurricanes for the past three decades. A stratified random sample of five locations (based on categories of distance from the coast) was selected from each of three coastal regions in the southeastern United States. The three regions were Gulf Coast (Alabama, Louisiana, Mississippi), Florida, and Lower Atlantic (Georgia, South Carolina, North Carolina). Damage amounts in millions of dollars per acre, adjusted for inflation, are shown in the table below. b. Describe differences and similarities in the hurricane damage amounts among the three regions. Because the distributions of hurricane damage amounts are often skewed, statisticians frequently use rank values to analyze such data.
In all three regions (Gulf Coast, Florida, Lower Atlantic) the hurricane damage amounts tend to decrease as distance from the coast increases. For almost all given distances from the coast, the Florida region has the largest damage amounts. Also, for any given distance, the Gulf Coast and Lower Atlantic regions have similar damage amounts but with the Lower Atlantic damage amounts generally smaller.
Researchers studying a pack of gray wolves in North America collected data on the length x, in meters, from nose to tip of tail, and the weight y, in kilograms, of the wolves. A scatterplot of weight versus length revealed a relationship between the two variables described as positive, linear, and strong. a. For the situation described above, explain what is meant by each of the following words. i. Positive: ii. Linear: iii. Strong: The data collected from the wolves were used to create the least-squares equation ŷ = -16.46 + 35.02 x.
In the context of a scatter plot in which y represents weight and x represents length, the following are defined. (I) A positive relationship means that wolves with higher values of length also tend to have higher weights. (ii) A linear relationship means hat as length increases by one meter, weight tends to change by a constant amount, on average. (iii) A strong relationship means that the data points fall close to a line (or curve).
Traffic data revealed that 35 percent of automobiles traveling along a portion of an interstate highway were exceeding the legal speed limit. Using highway cameras and license plate registrations, it was also determined that 52 percent of sports cars were also speeding along the same portion of the highway. What is the probability that a randomly selected car along the same portion of the highway was a speeding sports car?
It cannot be determined from the information given.
A local employer asked for help selecting a new type of desk chair. Thirty employees volunteered, and each employee used the new desk chair for two weeks and the current desk chair for two weeks. To determine which chair was used first, a coin was flipped for each employee. Heads represented using the new chair first, and tails represented using the current chair first. At the end of each two-week period, the employees were asked to rate their satisfaction with the new chair. Which of the following best describes this study?
It is a well-designed experiment because there is random assignment, replication, and comparison of at least two treatment groups.
44. Mateo plays on his school basketball team. From past history, he knows that his probability of making a basket on a free throw is 0.8. Suppose he wants to create a simulation using random numbers to estimate the probability of making at least 3 baskets on his next 5 free throw attempts. Which of the following assignments of the digits 0 to 9 could be used for the simulation?
Let the digits from 0 to 7 represent making a basket and the digits 8 and 9 represent not making a basket.
A moving average for data collected at regular time increments is the average of data values for two or more consecutive increments. The 4-year moving averages for the typhoon data are provided in the table below. For example, the Eastern Pacific 4-year moving average for 2000 is the average of 22, 16, 15, and 21, which is equal to 18.50. d. Graph B below shows both yearly frequencies (connected by dashed lines) and the respective 4-year moving averages (connected by solid lines). Use your answer in part (c) to complete the graph.
See pic on phone
A taxicab company in a large city charges passengers a flat fee to enter a cab plus an additional fee per mile. There is also a charge for time spent stopped in traffic. The company wants to develop a new method for determining fares based on mileage and a flat fee only, not on time spent stopped in traffic. A random sample of 10 recent cab fares was selected, and the distance, in miles, and the fare, in dollars, were recorded. A regression model was fit to the data, and the output, scatterplot, and residual plot are given below. b. A 95 percent confidence interval for the intercept of the least squares regression line is (3.61, 4.98). Construct and interpret a 95 percent confidence interval for the slope of the least squares regression line. Assume the conditions for inference are met.
Step 1: Identifies the appropriate confidence interval by name or formula and checks appropriate conditions. The stem of the problem states to assume that conditions for inference are met. The confidence interval for the slope, , the rate per mile charged by the taxi cab company is (standard error of b). Step 2: Correct Mechanics The 95% confidence interval is (standard error of b)= (see pic on phone) Step 3: Interpretation in context At the 95% confidence level, an interval of plausible values for the true rate per mile charged by the taxi cab company is from $0.85 to $1.61.
An environmental research agency conducted a study of a certain state's roadsides to estimate the mean number of discarded cans and bottles per mile of public road. The state's public roads were grouped into three types: Major highways: major paved roads designed for high traffic volume Minor highways: smaller paved roads designed for low traffic volume Unpaved roads: gravel and dirt roads There are about 100,000 miles of public roads in the state. The environmental research agency defined a sampling unit to be a one-mile segment of public road. Using a database supplied by the state's department of transportation, the agency randomly selected 30 one-mile road segments for each of the three types of roads. Researchers from the agency searched the roadsides along each of the selected one-mile road segments and recorded the number of discarded cans and bottles. Results are shown in the table below. c. Two methods for estimating the mean number of discarded cans and bottles per mile along all public roads in the state are given below. Method 1 Method 2 Which of these methods gives a better estimate of this mean? Explain.
The Method I estimate is an unbiased estimate of the mean number of discarded cans and bottles per mile of public road in the state. The Method I estimate weights the sample means according to the proportion of each type of road in the population. The Method II estimate is biased in under-estimating the population mean. By giving equal weight to each type of road, the Method II estimate assigns too little weight to the minor highways and the minor highways appear to have more discarded cans and bottles than the other two types of roads.
Tropical storms in the Pacific Ocean with sustained winds that exceed 74 miles per hour are called typhoons. Graph A below displays the number of recorded typhoons in two regions of the Pacific Ocean—the Eastern Pacific and the Western Pacific—for the years from 1997 to 2010. b. For each region, describe how the yearly frequencies changed over the time period from 1997 to 2010.
The Western Pacific Ocean had a decreasing trend in number of typhoons per year over this time period, especially from about 2001 through 2010. In contrast, the Eastern Pacific Ocean was fairly consistent in the number of typhoons per year over this time period, with a slight increasing trend in the later years from 2005 through 2010.The four-year moving average for the year 2010 in the Western Pacific Ocean is
Tropical storms in the Pacific Ocean with sustained winds that exceed 74 miles per hour are called typhoons. Graph A below displays the number of recorded typhoons in two regions of the Pacific Ocean—the Eastern Pacific and the Western Pacific—for the years from 1997 to 2010. a. Compare the distributions of yearly frequencies of typhoons for the two regions of the Pacific Ocean for the years from 1997 to 2010.
The Western Pacific Ocean had more typhoons than the Eastern Pacific Ocean in all but one of these years. The average seems to have been about 31 typhoons per year in the Western Pacific Ocean, which is higher than the average of about 19 typhoons per year in the Eastern Pacific Ocean. The Western Pacific Ocean also saw more variability (in number of typhoons per year) than the Eastern Pacific Ocean; for example, the range of the frequencies for the Western Pacific is about 21 typhoons and only 10 typhoons for the Eastern Pacific.
Researchers will use a well-designed experiment to test the effectiveness of a new drug versus a placebo in relieving symptoms of the common cold. Which of the following will provide evidence that the new drug causes relief of symptoms?
The difference between the responses to the new drug and the placebo must be shown to be statistically significant to provide evidence that the new drug causes relief.
The ELISA tests whether a patient has contracted HIV. The ELISA is said to be positive if it indicates that HIV is present in a blood sample, and the ELISA is said to be negative if it does not indicate that HIV is present in a blood sample. Instead of directly measuring the presence of HIV, the ELISA measures levels of antibodies in the blood that should be elevated if HIV is present. Because of variability in antibody levels among human patients, the ELISA does not always indicate the correct result. As part of a training program, staff at a testing lab applied the ELISA to 500 blood samples known to contain HIV. The ELISA was positive for 489 of those blood samples and negative for the other 11 samples. As part of the same training program, the staff also applied the ELISA to 500 other blood samples known to not contain HIV. The ELISA was positive for 37 of those blood samples and negative for the other 463 samples. a. When a new blood sample arrives at the lab, it will be tested to determine whether HIV is present. Using the data from the training program, estimate the probability that the ELISA would be positive when it is applied to a blood sample that does not contain HIV.
The estimated probability of a positive ELISA if the blood sample does not have HIV present is 37/500 OR 37/500 = 0.074
A local arcade is hosting a tournament in which contestants play an arcade game with possible scores ranging from 0 to 20. The arcade has set up multiple game tables so that all contestants can play the game at the same time; thus contestant scores are independent. Each contestant's score will be recorded as he or she finishes, and the contestant with the highest score is the winner. After practicing the game many times, Josephine, one of the contestants, has established the probability distribution of her scores, shown in the table below. Crystal, another contestant, has also practiced many times. The probability distribution for her scores is shown in the table below. a. Calculate the expected score for each player.
The expected scores are as follows: Josephine μJ = 16(0.1) + 17(0.3) + 18(0.4) + 19(0.2) = 17.7 Crystal μC = 17(0.45) + 18(0.4) + 19(0.15) = 17.7
An environmental research agency conducted a study of a certain state's roadsides to estimate the mean number of discarded cans and bottles per mile of public road. The state's public roads were grouped into three types: Major highways: major paved roads designed for high traffic volume Minor highways: smaller paved roads designed for low traffic volume Unpaved roads: gravel and dirt roads There are about 100,000 miles of public roads in the state. The environmental research agency defined a sampling unit to be a one-mile segment of public road. Using a database supplied by the state's department of transportation, the agency randomly selected 30 one-mile road segments for each of the three types of roads. Researchers from the agency searched the roadsides along each of the selected one-mile road segments and recorded the number of discarded cans and bottles. Results are shown in the table below. b. Were the data in the study obtained by a simple random sample, a stratified random sample, or a cluster sample? Explain.
This survey was conducted as a stratified random sample with three strata corresponding to the three road types: major highways, minor highways, and unpaved roads. Separate random samples of 30 one-mile segments were taken within each of the three strata.