STAT's Exam 1
In 2008, a highway safety administration reported that the number of pedestrian fatalities in City A was 65 and that the number in City B was 45. Can we conclude that pedestrians are safer in City B? Why or why not?
No, in order to compare the fatalities the statistics must include the number of fatalities per pedestrian. There may be fewer pedestrians in City B causing the difference.
The process of representing categorical variables with numbers (such as letting a 1 represent "smoker" and a 0 represent "non-smoker") is called _______.
Coding
Data are more than just numbers, because data have _____.
Context
For what types of associations are regression models useful?
Linear
What two-step process is used to examine distributions?
See the data and summarize it
When describing two-variable associations, a written description should always include trend, shape, strength, and which of the following?
The context of the data
In an experiment studying the association between a treatment variable and an outcome variable, the group of people who do NOT receive the treatment are called what?
The control group
Suppose a doctor telephones those patients who are in the highest 10% with regard to their recently recorded blood pressure and asks them to return for a clinical review. When she retakes their blood pressures, will those new blood pressures, as a group (that is, on average), tend to be higher than, lower than, or the same as the earlier blood pressures, and why?
The new blood pressures will tend to be lower. Part of the high reading might be due to chance, and regression toward the mean predicts that a repeated measurement will be closer to the typical value.
A study concludes that the use of pesticides is associated with the development of Parkinson's disease, a neurological disease that causes people to shake. The study reported that exposure to bug killers and weed killers is "associated with" an increase of 33% to 80% in the chances of getting Parkinson's. Does this study show that pesticides cause Parkinson's disease? Why or why not?
The study does not show that pesticides cause Parkinson's disease. This was an observational study because researchers could not have deliberately exposed people to pesticides. Observational studies cannot conclude causation.
Why is random assignment used to assign people to treatment groups and control groups in a controlled experiment?
To make the groups as similar as possible, minimizing bias.
The existence of multiple mounds in a distribution is sometimes a sign of which of the following?
Two very different groups have been combined into a single collection
The circles shown to the right are similar, but not exactly the same. This is an example of?
Variation
Which of the following is not something that one looks for when studying scatterplots?
Variation
If the mean and the median of a distribution are approximately the same, then the shape of the distribution is likely to be _______.
symmetric.
When one has influential points in their data, how should regression and correlation be done?
Do regression and correlation with and without these points and comment on the differences
When examining a distribution of numerical data, which of the following is not a feature that needs to be considered?
Each individual value
Which of the following is NOT one of the criteria for the "gold standard" for experiments?
Equal sample sizes for control and treatment group
Attempting to use the regression equation to make predictions beyond the range of the data is called _______.
Extrapolation
What is extrapolation and why is it a bad idea in regression analysis?
Extrapolation is prediction far outside the range of the data. These predictions may be incorrect if the linear trend does not continue, and so extrapolation generally should not be trusted.
It has been noted that people who go to church frequently tend to have lower blood pressure than people who don't go to church. Does this mean you can lower your blood pressure by going to church? Why or why not? Explain.
Going to church may not cause lower blood pressure. Just because two variables are related does not show that one caused the other.
Suppose that the growth rate of children looks like a straight line if the height of a child is observed at the ages of 24 months, 28 months, 32 months, and 36 months. If you use the regression obtained from these ages and predict the height of the child at 21 years, you might find that the predicted height is 20 feet. What is wrong with the prediction and the process used?
Growth rates slow as people get older. One should not extrapolate. That is, one should not predict outside the range of the data.
When examining the shape of a distribution of numerical data, which of the following is not one of the three basic characteristics of a distribution's shape?
How many numbers are in the data set.
A study was conducted to see whether participants would ignore a sign that said, "Elevator may stick between floors. Use the stairs." Those who used the stairs were said to be compliant, and those who used the elevator were said to be noncompliant. There were three possible situations, two of which involved confederates. A confederate is a person who is secretly working with the experimenter. In the first situation, there was no confederate. In the second situation, there was a compliant confederate (one who used the stairs), and in the third situation, there was a noncompliant confederate (one who used the elevator). The subjects tended to imitate the confederates. What more do you need to know about the study to determine whether the presence or absence of a confederate causes a change in the compliance of subjects?
Identify the sample size of the study. Without enough participants to observe the full range of variability in subjects we cannot control for other relevant factors, so we cannot infer causation. Identify whether there was random assignment to groups. Without random assignment there is the possibility of bias, so we cannot infer causation.
In 1994, major league baseball players went on strike. At the time, the average salary was $1,049,589, and the median salary was $337,500. If you were representing the owners, which summary would you use to convince the public that a strike was not needed? If you were a player, which would you use? Why was there such a large discrepancy between the mean and median salaries? Explain.
If you were representing the owners, you would use the (average salary) to convince the public that strike was not needed. If you were a player, you would use the (median salary) to convince the public that a strike was needed. The average and median salaries differ so greatly because the )distribution of salaries is skewed right).
A study looked at the effects of light on female mice. Fifty mice were randomly assigned to a regimen of 12 hours of light and 12 hours of dark (LD), while another fifty mice were assigned to 24 hours of light (LL). Researchers observed the mice for two years. Six of the LD mice and 13 of the LL mice developed tumors. The accompanying table summarizes the data. Complete parts a through c.
In the LD mice, 12% developed tumors. In the LL mice, 26% developed tumors. The LD mice developed tumors at a lower rate than the LL mice.
If there is a positive correlation between number of years studying science and head size (for children), does that prove that a larger head causes more studying of science, or vice versa? Can you think of a hidden variable that might be influencing both of the other variables?
It does not prove causation, because older children have a larger head and have studied science longer. Upper A larger head does not cause an increase in years of studying. The hidden variable is age.
The accompanying scatterplot shows the relationship between age and number of text messages sent in a day. Comment on the appropriateness of linear regression.
Linear regression is not appropriate because the points appear to follow a curved trend.
All methods used for visualizing distributions are based on which of the following?
Make a mark that indicates how many times each value occurred in the data set.
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the ________ is resistant to outliers.
Median
When a distribution contains outliers, which of the following is the best choice for a measure of center?
Median
According to the ancient Roman architect Vitruvius, a person's armspan (the distance from fingertip to fingertip with the arms stretched wide) is approximately equal to his or her height. For example, people 5 feet tall tend to have an armspan of 5 feet. Explain, then, why the distribution of armspans for a class containing roughly equal numbers of men and women might be bimodal.
Men and women tend to have different heights and therefore different armspans.
What are two basic types of variables in statistics?
Numerical and categorical
a. In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier? b. Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?
Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers. The median is more resistant, which indicates that it usually changes less than the mean when comparing data with and without outliers.
Some investors use a technique called the "Dogs of the Dow" to invest. They pick several stocks that are performing poorly from the Dow Jones group (which is a composite of 30 well-known stocks) and invest in these. Explain why these stocks will probably do better than they have done before.
Part of the poor historical performance could be due to chance, and if so, regression toward the mean predicts that stocks turning in a lower-than-average performance should tend to perform closer to the mean in the future. In other words, they should increase.
In statistics, the data we work with is just one part of a bigger picture called the ____.
Population
Which of the following is a reason we can never draw cause-and-effect conclusions from observational studies?
Potential confounding variables may explain the differences between groups rather than the treatment variable.
"Relative frequency" is the same as which of the following?
Proportion
Which of the following is an identifying mark of an observational study?
Subjects in the study are put into the treatment group or the control group either by their own actions or by the decision of someone not involved in the research study.
The outcome variable in a question about causality is also referred to as what?
The Response Variable
If the correlation between height and weight of a large group of people is 0.67, find the coefficient of determination (as a percent) and explain what it means. Assume that height is the predictor and weight is the response, and assume that the association between height and weight is linear.
The coefficient of determination is 44.89%. Therefore, 44.89% of the variation in weight can be explained by the regression line.
How is the coefficient of determination related to the correlation, and what does the coefficient of determination show?
The coefficient of determination is the square of the correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable.
A dieter recorded the number of calories he consumed at lunch for one week. As you can see, a mistake was made on one entry. The calories are listed in increasing order below. 349, 371, 386, 398, 412, 4190 When the error is corrected by removing the extra 0, will the mean change? Will the median? Explain without doing any calculation.
The corrected value will give a different mean but not a different median. Medians are resistant to outliers and not as affected by extreme values, but the more extreme a value is, the more the mean is affected by it.
The distribution of in-state annual tuition for all colleges and universities in the United States is bimodal. What is one possible reason for this bimodality?
The distribution might be bimodal because private colleges and public colleges tend to differ in amount of tuition.
Predict the shape of the distribution of the numbers of times a group of 500 people eat breakfast in one week.
The distribution will beTh left-skewed. Most people will report eating breakfast every day, with a few reporting various values less than 7.
A teacher asks 90 students who drive how many speeding tickets they received in the last year. Predict the shape of the distribution and explain.
The distribution will be right-skewed. Most people will have no tickets, but there will be a few people with 1, 2, 3, or more tickets.
In a right-skewed distribution, which of the following is true?
The mean tends to be greater than the median.
When you are comparing two sets of data, and one set is strongly skewed and the other is symmetric, which measures of the center and variation should you choose for the comparison?
The medians and interquartile ranges
A study was done to see whether a smaller dose of flu vaccine could be used successfully. In this study, the usual amount of vaccine was injected into half the patients, and the other half of the patients had only a small amount of vaccine injected. The response was measured by looking at the production of antibodies. In the end, the lower dose of vaccine was just as effective as a higher dose for those under 65 years old. What more do we need to know to be able to conclude that the lower dose of vaccine was equally effective at preventing the flu for those under 65?
The patients need to be randomly assigned the full or lower dose. Without randomization there could be bias, however, with randomization we can infer causation.
Two sections of statistics are offered, the first at 8 a.m. and the second at 10 a.m. The 8 a.m. section has 25 women, and the 10 a.m. section has 15 women. A student claims this is evidence that women prefer earlier statistics classes than men do. What information is missing that might contradict this claim?
The percentage of female students in the two classes is unknown. There may be more females in the 8 a.m. because there are more students in the 8 a.m. class than the 10 a.m. class. This claim could be true only if the classes were the same size.
It was reported in 2007 that there were 8,987,000 people age 16 or older who had a "go outside the home" disability and that this was 23.1% of the population (of this age group). These are people who cannot go outside the home without help. How large was the total population (of this age group) in 2007?
The population in 2007 of people (of this age group) 38905000 is people. 8987000/0.231=38904761.904762
Why is it not possible to conclude which sport is the most dangerous by looking at the number of injuries in the accompanying data table?
The sports have different numbers of participants.
A study compared the rate of pneumonia before and after a vaccine was introduced. In the study, annual hospitalization rates were estimated from any cause using a database. Average annual rates of pneumonia-related hospitalizations before and after introduction of the vaccine were used to estimate annual declines in pneumonia-related hospitalizations. The annual rate of pneumonia-related hospitalizations among children of various age groups significantly declined relative to expected rates before introduction of the vaccine. Does this show that pneumonia vaccine caused the decrease in pneumonia that occurred? Explain.
The study does not show that the vaccine caused the decrease in pneumonia. This is an observational study because the children were not randomly assigned by the researchers. It is possible that confounding variables (other advances in medicine, for instance) would affect the rates of pneumonia.
A group of boys is randomly divided into two groups. One group watches violent cartoons for one hour, and the other group watches cartoons without violence for one hour. The boys are then observed to see how many violent actions they take in the next two hours, and the two groups are compared.
The study is a controlled experiment.
A researcher was interested in the effects of exercise on academic performance in elementary school children. She went to the recess area of an elementary school and identified some students who were exercising vigorously and some who were not. The researcher then compared the grades of the exercisers with the grades of those who did not exercise.
The study is an observational study.
A student watched picnickers with a large cooler of soft drinks to see whether teenagers were less likely than adults to choose diet soft drinks over regular soft drinks.
The study is an observational study.
It was predicted that a country will have an elderly population (65 and older) of 8,170,000 in the year 2050 and that this will be 22.5% of the population. What is the total predicted population of this country in 2050?
The total predicted population of the country in 2050 is 36311000 people.
It was predicted that a country will have an elderly population (65 and older) of 8,448,000 in the year 2050 and that this will be 19.3% of the population. What is the total predicted population of this country in 2050?
The total predicted population of the country in 2050 is 43,772,000 people. 8448000/0.193=43772020.725389
Why are percentages or rates often better than counts for making comparisons?
They take into account possible differences among the sizes of the groups.
A researcher is interested in the effect of music on memory. She randomly divides a group of students into three groups: those who will listen to quiet music, those who will listen to loud music, and those who will not listen to music. After the appropriate music is played (or not played), she gives all the students a memory test.
This is a controlled experiment. She assigns students to the control and treatment groups at random in order to control for all relevant factors aside from the effect of music on memory, which is essential to conducting a controlled experiment.
Patients with Alzheimer's disease are randomly divided into two groups. One group is given a new drug, and the other is given a placebo. After six months they are given a memory test to see whether the new drug fights Alzheimer's better than a placebo.
This is a controlled experiment. The researchers randomly assigned patients to either a treatment or control group, and they gave the patients a test afterwards to identify the effect of the new drug. This satisfies a key criterion of controlled experiments.
Indicate whether the following study is an observational study or a controlled experiment. Records of patients who have had broken ankles are examined to see whether those who had physical therapy achieved more ankle mobility than those who did not.
This is an observational study. Since the researchers did not assign subjects to the control or treatment group beforehand, they did not satisfy a key feature of controlled experiments.
A local public school encourages, but does not require, students to wear uniforms. The principal of the school compares the grade point averages of students at this school who wear uniforms with the GPAs of those who do not wear uniforms to determine whether those wearing uniforms tend to have higher GPAs.
This is an observational study. The principal does not randomly assign students to either wear or not wear uniforms. Random assignment is essential to conducting a controlled experiment.
A college magazine suggested that overeating reduces brain function. Is this likely to be a conclusion from observational studies or randomized experiments? Can we conclude that overeating causes a reduction in brain function? Why or why not?
This is likely to be from observational studies. It would not be ethical to assign people to overeat. We cannot conclude causation from observational studies because of the possibility of confounding factors.
Some people believe that wearing copper bracelets is a good treatment for arthritis of the hand. To test this belief, suppose you recruit 100 people and supply them all with copper bracelets. After the patients wear the bracelets for a month, you ask them whether or not their pain is less than it was before they began wearing the bracelets. Explain how to improve this study.
To improve the study, the patients should be randomly divided into two groups; one group will be given the copper bracelets, and the other group will be given non-copper bracelets. After a month, the patients will be surveyed on the levels of their pain.
The study of statistics rests on?
Variation and Data
Questions about causality are usually phrased in the form of ___ questions.
What if?
Which of the following questions should be asked when developing an understanding of data?
What variables wereWh measured? How were the variables measured? Who collected the data? (All of these questions are important.)
A stemplot is often useful in which of the following cases?
When technology is not available and the data set is not large
A study reported on the effects of vitamin C in breast milk for breast-feeding mothers. The children whose mothers had chosen to take high doses of vitamin C had a 30% lower risk of developing allergies. Can you conclude that the use of vitamin C caused the reduction in allergies? Why or why not?
You cannot conclude that the use of vitamin C caused the reduction in allergies because the researchers did not randomly assign mothers to treatment and control groups. This step is necessary for identifying causation.
Since, in general, the longer a car is owned the more miles it travels one can say there is a _______ between age of a car and mileage.
a positive association
The figure shows the relationship between the number of miles per gallon on the highway and that in the city for some cars. a. Report the slope and explain what it means. b. Either interpret the intercept (7.792) or explain why it is not appropriate to interpret the intercept.
a. For each additional city mpg, the highway value goes up by 0.9478 mpg. b. It is inappropriate to interpret the intercept because no cars get 0 mpg in the city.
The accompanying scatterplot shows the average life expectancy for some countries and the number of people per TV in those countries. Comment on the appropriateness of the regression. What do you think accounts for the relationship? Do you think you could raise the life expectancy by buying more TVs? Explain.
a. Linear regression is not appropriate because the two variables do not appear to have a linear association. b. Higher wealth in a country would increase life expectancy and decrease people per TV. c. Buying more TVs would likely not raise life expectancy because correlation does not mean causation.
The idea of sending delinquents to "Scared Straight" programs has appeared recently in several media programs. In a 1983 study, each male delinquent in the study (all aged 14-18) was randomly assigned to either Scared Straight or no treatment. The males who were assigned to Scared Straight went to a prison, where they heard prisoners talk about their bad experiences there. Then the males in both the experimental and the control group were observed for 12 months to see whether they were rearrested. Complete parts (a) and (b) below.
a. Report the rearrest rate for the Scared Straight group and for the No Treatment group, and state which is higher. The rearrest rate for the Scared Straight group is (81.1)%. The rearrest rate for the No Treatment group is (67.3)%. The rearrest rate for the Scared Straight group is (higher than) the rearrest rate of the No Treatment group. b. This experiment was done in the hope of showing that Scared Straight would cause a lower arrest rate. Did the study show that? Explain. (No. The study does not show that Scared Straight causes a lower arrest rate, because the rearrest rate in the Scared Straight group was higher than in the No Treatment group.)
A group of overweight people are asked to participate in a weight loss program. Participants are allowed to choose whether they want to go on a vegetarian diet or follow a traditional low-calorie diet that includes some meat. Half of the people choose the vegetarian diet, and half choose to be in the control group and continue to eat meat. Suppose that there is greater weight loss in the vegetarian group. Complete parts (a) and (b) below.
a. Suggest a plausible confounding variable that would prevent us from concluding that the weight loss was due to the lack of meat in the diet. Explain why it is a confounding variable. (People who are not prepared to change their diet very much (such as by excluding meat) might also not change other factors that affect weight, such as how much exercise they get.) b. Explain a better way to do the experiment that is likely to remove the influence of confounding variables. (The experiment would be improved if some subjects were randomly assigned to eat meat and the remaining subjects to consume a vegetarian diet.)
The equation for the regression line relating the salary and the year first employed is given above the figure. a. Report the slope and explain what it means. b. Either interpret the y-intercept of 4,255,424 or explain why it is not appropriate to interpret the y-intercept.
a. The average salary is $2099 less for each year later that the person was hired or an average of $2099 more for each year earlier. b. The y-intercept of $4,255,424 would be the salary for a person who started in the year 0, which is not appropriate to interpret.
A doctor who believes strongly that antidepressants work better than "talk therapy" tests depressed patients by treating half of them with antidepressants and the other half with talk therapy. After six months the patients are evaluated on a scale of 1 to 5, with 5 indicating the greatest improvement. Answer parts (a) through (d) below.
a. The doctor is concerned that if his most severely depressed patients do not receive the antidepressants, they will get much worse. He therefore decides that the most severe patients will be assigned to receive the antidepressants. Explain why this will affect his ability to determine which approach works best. (If the doctor decides on the treatment, this could introduce bias.) b. What advice would you give the doctor to improve his study? (The doctor should randomly assign the patients to the different treatments.) c. The doctor asks you whether it is acceptable for him to know which treatment each patient receives and to evaluate them himself at the end of the study to rate their improvement. Explain why this practice will affect his ability to determine which approach works best. (If the doctor is aware of the treatment each patient receives, that might influence his opinion about the effectiveness of the treatment.) d. What improvements to the plan in part (c) would you recommend? (To prevent bias, the experiment should be double-blind. Neither the patients nor the doctor evaluating the patients should know whether each patient received medication.)
Two drugs were tested to see whether they helped women with breast cancer. Of 1060 women, about half were randomly assigned to drug A and the other half were assigned to drug B. After 77 months, 473 out of 539, and 426 out of 521 women assigned to drugs A and B, respectively, were alive. Complete parts (a) and (b) below.
a. The survival rate for drug A is (87.8)%. The survival drug for drug B is (81.8)%. The survival rate for drug A (is higher than) the survival rate for drug B. b. Was this a controlled experiment or an observational study? Explain why. From studies like these, can we conclude a cause-and-effect relationship between the drug type and the survival percentage? Why or why not? (b. Was this a controlled experiment or an observational study? Explain why. From studies like these, can we conclude a cause-and-effect relationship between the drug type and the survival percentage? Why or why not?)
A doctor reported on a study that treated children who had sleep apnea, which interferes with breathing while a child is asleep. In the study, 464 children, 5 to 9 years of age, were randomly assigned to either surgery or to be under constant watch for a certain period of time. The study found that there were significantly greater improvements in behavioral, quality-of-life, and sleep study findings in the group that had surgery than the group assigned to constant watch. Complete parts (a) and (b) below.
a. Was the study a controlled experiment or an observational study? Explain how you know. (The study was a controlled experiment because the children were randomly assigned to either surgery or constant watch. This is essential to conducting a controlled experiment.) b. Assuming that the study was properly conducted, can we conclude that the early surgery caused the improvements? Explain. (We can conclude that the early surgery caused the improvements because it was a randomized controlled experiment.)
In a histogram, observations are grouped into intervals called ____.
bins
Changing the width of bins in a histogram _______.
changes the shape of the histogram.
The _____ organizes data by recording all the values observed in a sample as well as how many times each value was observed.
distribution of the sample
The number of times a value is observed in a data set is called a ______.
frequency
Since outliers can greatly affect the regression line they are also called _______ points.
influential
Values so large or so small that they do not fit into the pattern of the distribution are called what?
outliers
Categorical variables are also referred to as ______ variables.
qualitative
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
resistant
If a sample of people were asked how many hours of TV they watched in a typical week, what shape would the data distribution be expected to have?
right-skewed
A distribution of a variable in which most of the values are relatively small but that also has a few very large values is called ________.
right-skewed.
A large amount of scatter in a scatterplot is an indication that the association between the two variables is _____
weak
When testing the IQ of a group of adults (aged 25 to 50), an investigator noticed that the correlation between IQ and age was negative. Does this show that IQ goes down as we get older? Why or why not? Explain.
No, correlation does not mean causation.
What type of effect can outliers have on a regression line?
A big effect
A researcher was interested in the effect of exercise on memory. She randomly assigned half of a group of students to run up a stairway three times and the other half to rest for an equivalent amount of time. Each student was then asked to memorize a series of random digits. She compared the numbers of digits remembered for the two groups.
A researcher was interested in the effect of exercise on memory. She randomly assigned half of a group of students to run up a stairway three times and the other half to rest for an equivalent amount of time. Each student was then asked to memorize a series of random digits. She compared the numbers of digits remembered for the two groups.
A difference between two groups in an observational study that can explain why the outcomes were very different between the groups is called what?
An Outcome Variable
What is an influential point?
An influential point is a point that changes the regression equation by a large amount.
Which of the following is used to summarize two potentially related categorical variables?
A two-way table
A group of educators want to determine how effective tutoring is in raising students' grades in a math class, so they arrange free tutoring for those who want it. Then they compare final exam grades for the group that took advantage of the tutoring and the group that did not. Suppose the group participating in the tutoring tended to receive higher grades on the exam. Does that show that the tutoring worked? If not, explain why not and suggest a confounding variable.
Because this was an observational study, it only shows an association; it does not show that the tutoring worked. It could be that more motivated students attended the tutoring and that was what caused the higher grades. Correct. You can never draw cause-and-effect conclusions from observational studies because of potential confounding variables. An observational study can conclude only that there is an association between the treatement variable and the outcome variable.
The variable "eye color" is an example of what type of variable?
Categorical
In statistics, variables are_______.
Characteristics of people and things.
Of the following, which is the only method of data collection suitable for making conclusions about causal relationships?
Controlled experiments