Stats Test 5
Assume that boys and girls are equally likely and that the gender of a child is independent of the gender of any brothers or sisters. If a couple already has three girls, find the probability of getting a girl when their fourth baby is born.
1/2
What is a correlation? Give three examples of pairs of variables that are correlated.
A correlation exists between two variables when higher values of one variable consistently go with higher or lower values of another variable. Amount of smoking and lung cancer, height and weight of people, price of a good and demand of the good.
What is a probability distribution?
A probability distribution represents the probabilities of all possible events of interest.
What do we mean when we say that a result is statistically significant?
A result is statistically significant if it is unlikely to have occurred by chance.
What other types of research would you like to see before you conclude that high-voltage power lines cause cancer?
A study that determines the effect of electricity on a cell's growth mechanism.
Which outlier would make it appear that there is correlation when there is none?
An outlier far separated from the rest of the data points.
Which outlier would make it appear that there is no correlation when there is one?
An outlier located in a place opposite where the correlation would predict.
Those who favor gun control often point to a positive correlation between the availability of handguns and murder rates to support their position that gun control would save lives. Does this correlation, by itself, indicate that handgun availability causes a higher murder rate? Suggest some other factors that might support or weaken this conclusion.
Availability is not itself a cause. Social, economic, or personal conditions cause individuals to use the available handguns.
Which one of the following is an example of a relative frequency probability?
Based on statistical data, the chance of having the championship team coming from the Eastern Conference of a certain basketball league is about 1 in 10.
How is the expected value EV of two events computed?
EV = (event 1 value) x (event 1 probability) + (event 2 value) x (event 2 probability
What is an expected value?
Expected value is the estimated gain or loss of partaking in an event many times.
Determine whether the given property is true, and explain why or why not. The correlation coefficient remains unchanged if we interchange the variables x and y.
Interchanging x and y does not change the correlation coefficient because the variables x and y appear symmetrically in the formula for the correlation coefficient.
Can the law be applied to a single observation or experiment?
It does not apply to a single trial (observation or experiment), or even to small numbers of trials, but only to a large number of trials.
What is a best fit line?
It is a line that lies closer to the data points than any other possible line.
How is a best fit line useful?
It is useful to make predictions within the bounds of the data points.
What does the square of the correlation coefficient, r2, tell us about a best-fit line?
It tells us the proportion of the variation that is accounted for by the best-fit line. For example, if r2=0.9, or 90%, then 90% of the variability is accounted for by the best-fit line, but 10% is not.
Which one of the following is an example of a subjective probability?
My teacher assures me that he is certain that my SAT scores will be the highest for the entire country.
What is negative correlation?
Negative correlation means that two variables tend to change in opposite directions, with one increasing while the other decreases. An example might be age and vision.
What is no correlation?
No correlation means that there is no apparent relationship between the two variables. An example might be hair color and weight.
Suppose you toss a coin 100 times. Should you expect to get exactly 50 heads? Why or why not?
No, there will be small deviations by chance, but if the coin is fair, the result should be close to 50 heads.
Distinguish between an outcome and an event in probability.
Outcomes = are the most basic possible results of observations or experiments. Event = consists of one or more outcomes that share a property of interest.
For the following pair of variables, state whether you believe the two variables are correlated. If you believe they are correlated, state whether the correlation is positive or negative. Explain your reasoning. The amount of time students spend studying for a test and their grade on that test.
Positive correlation because more time spent studying results in higher grades.
For the following pair of variables, state whether you believe the two variables are correlated. If you believe they are correlated, state whether the correlation is positive or negative. Explain your reasoning. The heights and weights of 50 randomly selected males between the ages of 10 and 21.
Positive correlation because taller men tend to weigh more.
For the following pair of variables, state whether you believe the two variables are correlated. If you believe they are correlated, state whether the correlation is positive or negative. Explain your reasoning. The height of students and the length of their pants.
Positive correlation because taller people tend to have longer pants.
What is positive correlation?
Positive correlation means that both variables tend to increase (or decrease) together. An example might be shoe size and height.
A pollster randomly selects an adult for a survey. Let M denote the event of getting a male, and let R denote the event of getting a Republican. Are events M and R overlapping?
Since the pollster could select an adult who is male and Republican, the events are overlapping.
Let A denote the event of getting a female when you randomly select a fellow student in your statistics class. Let B denote the event of getting a female when you randomly select a fellow student in your psychology class. Are events A and B independent or dependent?
Since the student that is chosen from the statistics class does not affect the probability of choosing a female from the psychology class, the two events are independent.
Does the idea of statistical significance apply to samples or populations? Briefly explain why.
Statistical significance applies to samples because the values of population parameters have no uncertainty.
Give an example in which the same event can occur through two or more outcomes.
Suppose you roll a fair, six-sided die. The possible outcomes are rolling the number 1, 2, 3, 4, 5, or 6. The event of rolling an even number will occur with the three outcomes 2, 4, and 6.
Briefly state the 6 guidelines that can be used in establishing causality. What is the 1st guideline?
The 1st guideline states that one should look for situations where the effect is correlated with the suspected cause, even when other factors vary.
What is the 2nd guideline?
The 2nd guideline states that among groups that differ only in the presence or absence of the suspected cause, one should check that the effect is similarly present or absent.
What is the 3rd guideline?
The 3rd guideline states to look for evidence that larger amounts of the suspected cause produce larger amounts of the effect.
What is the 4th guideline?
The 4th guideline states that if the effect might be produced by other potential causes, make sure the effect still remains after accounting for these other potential causes.
What is the 5th guideline?
The 5th guideline states that the suspected cause should be tested with an experiment.
What is the 6th guideline?
The 6th guideline states that the physical mechanism by which the suspected cause produces the effect should be determined.
Determine whether the stated causal connection is valid. If the causal connection appears to be valid, provide an explanation. Heart disease can be cured by wearing a magnetic bracelet on your wrist.
The causal connection is not valid.
Determine whether the stated causal connection is valid. If the causal connection appears to be valid, provide an explanation. People with higher resting pulse rates (beats per minute) tend to have higher IQ scores.
The causal connection is not valid.
Determine whether the stated causal connection is valid. If the causal connection appears to be valid, provide an explanation. Drinking greater amounts of alcohol slows a person's reaction time.
The causal connection is valid. Alcohol is a depressant to the central nervous system, which leads to slower reaction time.
Determine whether the stated causal connection is valid. If the causal connection appears to be valid, provide an explanation. Test grades are affected by the amount of time and effort spent studying and preparing for the test.
The causal connection is valid. When students spend more time and effort studying for a test, their test grades tend to be higher.
Is the correlation most likely due to coincidence, a common underlying cause, or a direct cause?
The correlation is most likely due to a common underlying cause. Many crimes are committed with handguns that are not registered.
Is the correlation most likely due to coincidence, a common underlying cause, or a direct cause?
The correlation is most likely due to a direct cause. As students study more, they gain a better understanding of the subject and their test scores are likely to be higher.
What is the law of large numbers?
The law of large numbers states that if a process is repeated through many trials, the proportion of the trials in which event A occurs will be close to the probability P(A).
Which one of the following is an example of a theoretical probability?
The probability of rolling a 3 on a single die is 1/6
Determine whether the given property is true, and explain your answer. Because the same sample values are used, the correlation coefficient remains unchanged if we rearrange the order of the x-values that are paired with the y-values.
The property is false. Reversing the order of the x-values would flip the sign of the correlation coefficient, and other rearrangements would also change the magnitude of the correlation coefficient.
Determine whether the given property is true, and explain your answer. The correlation coefficient remains unchanged if we change the sign of all of the x-values.
The property is false. The sign of the correlation coefficient will also be swapped but its magnitude will remain unchanged.
Which of the following is true for the possible range of values for P(A)?
The range of possible values for P(A) is from 0 to 1 (inclusive), with 0 meaning there is no chance that event A will occur and 1 meaning it is certain that event A will occur.
Decide whether the following statement makes sense (or is clearly true) or does not make sense (or is clearly false). Explain your reasoning. I haven't won in my last 25 pulls on the slot machine, so I must be having a bad day and I'm sure to lose if I play again.
The statement does not make sense because the results of repeated trials do not depend on results of earlier trials.
Determine whether the statement makes sense (or is clearly true) or does not make sense (or is clearly false). Using sample data on footprint lengths and heights from men, the equation of the best-fit line is obtained, and it is used to find that a man with a footprint length of 40 inches is predicted to have a height of 154 inches, or 12 feet, 10 inches.
The statement does not make sense since a prediction is being made regarding a value that is beyond the bounds of the data points.
Determine whether the following statement makes sense (or is clearly true) or does not make sense (or is clearly false). Explain clearly. Because either there is life on Mars or there is not, the probability of life on Mars is 0.5.
The statement does not make sense. Although there are two possible outcomes, it is not reasonable to assume that both outcomes are equally likely.
Determine whether the statement makes sense or does not make sense, and explain your reasoning. I found a strong negative correlation for data relating the percentage of people in various countries who are literate and the percentage who are undernourished. I concluded that an increase in literacy causes a decrease in undernourishment.
The statement does not make sense. Correlation is not necessarily causation.
Does the following statement make sense? Explain. The numbers 5, 17, 18, 27, 36, and 41 were drawn in the last lottery; they should not be bet on in the next lottery because they are now less likely to occur.
The statement does not make sense. Lottery drawings are independent so the outcome of the last lottery does not affect the probabilities of the next one.
Determine whether the statement below makes sense or does not make sense. Explain clearly. Results from a study of heart disease have statistical significance because heart disease is such an important health risk for adults.
The statement does not make sense. Statistical significance results from low probability, not importance.
Determine whether the statement makes sense (or is clearly true) or does not make sense (or is clearly false). Explain clearly. The scatterplot showed all the data points following a nearly straight diagonal line, but only a weak correlation between the two variables being plotted.
The statement does not make sense. The data points following a nearly straight diagonal line would indicate a very strong correlation between the two variables.
Determine whether the statement makes sense (or is clearly true) or does not make sense (or is clearly false). Explain clearly. The two variables I studied showed such a strong correlation that they had a correlation coefficient of r=1.50.
The statement does not make sense. The value of the correlation coefficient ranges from −1 to 1, so having a value of r=1.50 is not possible.
Determine whether the statement below makes sense or does not make sense. Explain clearly. In an experiment testing a method of gender selection intended to increase the likelihood that a baby is a girl, 978 couples give birth to 490 girls and 488 boys. A company representative argues that this is evidence that the method is effective, because the probability of getting 490 girls in 978 births by chance is only 0.025, which is less than 0.05.
The statement does not make sense. With 978 births, any specific number of girls will have a very low probability, but 490 girls is very close to the 489 girls that is expected, so the result is not evidence that the method is effective.
Determine whether the statement below makes sense or does not make sense. Explain clearly. I created a scatterplot of CEO salaries and corporate revenue for 10 companies and found a negative correlation, but when I left out a data point for a company whose CEO took no salary, there was no correlation for the remaining data.
The statement makes sense. A CEO taking no salary is an outlier, and an outlier can make a correlation appear where there otherwise is none.
Determine whether the following statement makes sense (or is clearly true) or does not make sense (or is clearly false). I used a best-fit line for data showing the ages and hand sizes of thousands of boys of various ages to predict the mean hand size of 8-year-old boys.
The statement makes sense. Assuming the data were collected in a reasonable way and all ages were sampled, a scatterplot for thousands of boys should produce a best-fit line that makes reasonable predictions of mean hand sizes at different ages.
Determine whether the statement makes sense (or is clearly true) or does not make sense (or is clearly false). Explain clearly Researchers conducted animal experiments to study smoking and lung cancer because it would have been unethical to conduct these experiments on humans.
The statement makes sense. Researchers cannot randomly assign people to treatment and control groups and ask subjects in the treatment group to smoke.
Determine whether the statement below makes sense or does not make sense. Explain clearly. The probability of rolling a die and getting an even number is 1/2, and the probability of getting an odd number is 1/2, so the probability of getting a number that is even or odd is 1/2+1/2=1.
The statement makes sense. Since a number cannot be even and odd, it is a valid application of the either/or rule for non-overlapping events.
For the following pair of variables, state whether you believe the two variables are correlated. If you believe they are correlated, state whether the correlation is positive or negative. Explain your reasoning. The shoe sizes and SAT scores of randomly selected subjects who take the SAT
The variables are not correlated.
What are the differences among theoretical, relative frequency, and subjective techniques for finding probabilities?
Theoretical technique = is based on the assumption that all outcomes are equally likely. Relative frequency technique = is based on observations or experiments. subjective technique = is an estimate based on experience or intuition.
Consider the following scatterplot that shows one year's total sales (revenue) and profits for eight large retailers in a country. Estimate the correlation coefficient and determine whether there appears to be a correlation between sales and profits.
There is a strong positive correlation between sales and profits, and the correlation coefficient is approximately 0.94 .
For the description below, state the correlation clearly. (For example, state that "there is a positive correlation between variable A and variable B.") Then state whether the correlation is most likely due to coincidence, a common underlying cause, or a direct cause. Explain your answer. Statistics students find that as they spend more time studying, their test scores are higher. What is the correlation?
There is positive correlation between the number of hours spent studying and their test scores.
For the description below, state the correlation clearly. (For example, state that "there is a positive correlation between variable A and variable B.") Then state whether the correlation is most likely due to coincidence, a common underlying cause, or a direct cause. Explain your answer. In one state, the number of unregistered handguns steadily increased over the past several years, and the crime rate increased as well. What is the correlation?
There is positive correlation between the number of unregistered handguns and an increase in crime rate.
Several things besides smoking have been shown to be probabilistic causal factors in lung cancer. For example, exposure to asbestos and exposure to radon gas, both of which are found in many homes, can cause lung cancer. Suppose that you meet a person who lives in a home that has a high radon level and insulation that contains asbestos. The person tells you, "I smoke, too, because I figure I'm doomed to lung cancer anyway." What would you say in response? Explain.
This person may or may not be doomed to lung cancer, but smoking will only increase the risk of getting lung cancer.
Explain how to make a table of a probability distribution. Choose the correct answer below.
To make a table of a probability distribution, list all possible outcomes, identify the outcomes that represent the same event, and then find the probability of each event.
Should we always expect to get the expected value? Why or why not?
We should not always expect to get the expected value because expected value is calculated with the assumption that the law of large numbers will come into play.
Under what circumstances is it reasonable to ignore outliers?
When there is good reason to suspect that they represent errors in the data.
State whether the difference between what occurred and what you would have expected by chance is statistically significant. In 500 tosses of a coin, you observe 450 tails. Is the difference between what occured and what is expected by chance statistically significant?
Yes.
Suppose that people living near a particular high-voltage power line have a higher incidence of cancer than people living farther from the power line. Can you conclude that the high-voltage power line is the cause of the elevated cancer rate? If not, what other explanations might there be for it? What other types of research would you like to see before you conclude that high-voltage power lines cause cancer?
You cannot conclude that the power line is the cause of the elevated cancer rate because cause cannot be established until a mechanism is confirmed.
What does P(A) mean?
P(A) means the probability that event A will occur.