Stats 121 - part 2 - quizzes
(Review question) Three of the following statements about correlation are correct, but one is incorrect. Which one of the following statements contains an error concerning correlation? Select one: a. The correlation between high school GPA and college GPA is 0.39. b. The correlation between whether someone believes in God and the amount of alcohol they consumed last weekend is 0.63. c. The correlation between the length of time required for completing an English assignment and the length of time required for completing a statistics assignment is 0.07. d. The correlation between the hemoglobin level as determined by the standard method and as determined by a new, simpler method is 0.999.
"Whether someone believes in God" is a categorical variable. In order to calculate r, both variables must be quantitative and the relationship between them should be linear. The correct answer is: The correlation between whether someone believes in God and the amount of alcohol they consumed last weekend is 0.63.
True or false: The theoretical sampling distribution of \bar{x} is synonymous with the distribution of all possible \bar{x} values.
Although we as statisticians say "the sampling distribution of \bar{x}", we really mean "the sampling distribution of all \bar{x}'s". This often confuses students. Just remember that the two phrases given in the question say the same thing. The correct answer is: True
What does the probability distribution of a random variable give us? a. The range of the random variable. b. All possible values of the random variable together with their probabilities. c. A density curve whose x values are always positive. d. The measure of the amount of deviation of the random variable about the mean.
A distribution is defined as a list of possible values of a variable together with how often each value occurs. This definition is simply modified to define "probability distribution''. A probability gives the possible values, but it gives the probability of each value rather than how often each value occurs. The correct answer is: All possible values of the random variable together with their probabilities.
True or false: Two quantitative variables, x and y, may be strongly correlated because both are consequences of a third lurking variable.
A lurking variable is a variable not included in a study but that might explain the relationship between the two variables that you did. For example, length of feet and scores on a national reading exam are highly correlated for children ages 6 to 12. But length of feet is correlated with age and age is correlated with years of school which would explain higher scores on a national reading test. So, age is a lurking variable that explains the strong correlation between foot length and scores on a national reading exam. The correct answer is 'True'.
What is a regression line?
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
What is the purpose of a residual plot?
A residual plot acts as a magnifying glass for identifying problems with the assumptions required for obtaining valid models in regression. The correct answer is: To diagnose problems with the regression assumptions.
True or False: Probabilities cannot ever be negative or greater than one.
All probabilities must be greater than or equal to zero and less than or equal to one. The correct answer is: True
What is an influential observation? a. An observation that interacts with the explanatory variable, x. b. An observation that, if removed, would drastically change the values of slope and/or y-intercept. c. An observation whose effect cannot be separated from the other observations. d. The observation that lies in the center of all observations.
An influential observation is a data value whose removal from the data set results in drastically changed estimates for slope and/or y-intercept. The correct answer is: An observation that, if removed, would drastically change the values of slope and/or y-intercept.
True or false: An outlier in either the x or y direction affects the value of the correlation, r.
An outlier in the x direction affects the mean and standard deviation of the x's. An outlier in the y direction affects the mean and standard deviation of the y's. These means and standard deviations are used in the computation of the correlation and thus, affect the value of the correlation. The correct answer is: True
True or false: If the correlation between X and Y is r = 0.99, we can say that changes in X cause changes in Y.
As stated above, high correlation does not imply causation. The correct answer is: False
True or false: You should never compute correlation coefficient without also plotting the data in a scatterplot.
Because correlation coefficient can be misleading, you will only be able to correctly interpret r if you can also examine the scatterplot at the same time. The correct answer is: True
What is the purpose of a statistical control chart?
Control charts monitor a process by distinguishing between natural (ordinary) variation and variation due to a problem. The correct answer is: To monitor a process for problems.
What does the correlation coefficient r describe?
Correlation (r) is a measure of the strenth of a linear relationship.
The points on a scatterplot lie very close to the line whose equation is y = 4 − 3x. The correlation between x and y is close to ________.
Correlation measures the strength of the linear relationship between two quantitative variables. Data that lie very close to a line have a correlation close to either +1 or -1. In this case, since the slope of the line is negative, the correlation is close to -1 and the data have a negative relationship. The correct answer is: -1
The points on a scatterplot lie very close to the line whose equation is y = 4 − 3x. The correlation between x and y is close to ________. a. -4 b. -3 c. -1 d. 0 e. +1 f. +3 g. +4
Correlation measures the strength of the linear relationship between two quantitative variables. Data that lie very close to a line have a correlation close to either +1 or -1. In this case, since the slope of the line is negative, the correlation is close to -1 and the data have a negative relationship. The correct answer is: -1
Fill in the blank: The sampling distribution of \bar{x} gives ______________ from all possible samples of the same size from the same population.
Each sample yields its own value for \bar{x}. The sampling distribution of \bar{x} consists of the collection of all these possible \bar{x}-values. The correct answer is: all possible \bar{x}-values
For a sampling distribution of \bar{x} with Normal shape, what percentage of all possible \bar{x}'s are between \mu - 3 \frac{\sigma}{\sqrt{n}} and \mu + 3 \frac{\sigma}{\sqrt{n}}?
For Normal distributions, 99.7% of all observations are within three standard deviations of the mean. This is true for \bar{x}'s as well when the sampling distribution of \bar{x} has a Normal shape. The correct answer is: 99.7%
True or false: If the correlation between x and y is r = 0.99, we can say that changes in x cause changes in y.
High correlation does not imply causation unless the data are from an experiment. The correct answer is: False
Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is somewhat likely. But it won't occur more often than it will occur.'' a. 0.0 b. 0.01 c. 0.3 d. 0.6 e. 0.99 f. 1.0
If an event won't occur more often than it will occur, it's probability is 0.3. The correct answer is: 0.3
True or False: If the correlation between x and y is r = 0.01, then we can say that there is no relationship of any form between x and y. (Think carefully about curved relationships—like a frown as given in the scatterplot in the above question.)
If the correlation between x and y is r = 0.01, then EITHER there is no relationship between x and y OR the relationship between x and y is NOT linear. But a correlation of zero does not automatically imply that there is no relationship. The correct answer is: False
True or false: The mean of a sample always equals the mean of the population.
If the mean of a sample always equaled the mean of the population, we wouldn't need to study stabtistics. But the mean of a sample almost never equals the mean of the population. The Law of Large Numbers tells us that the sample mean gets closer and closer to the population mean. The correct answer is: False
For all sample sizes, when is the sampling distribution of \bar{x} exactly Normal?
If the population from which we take our simple random sample has a Normal distribution, then the sampling distribution of \bar{x} will be exactly Normal regardless of the sample size. The correct answer is: When the population is Normal and the sample is an SRS.
Four of the following statements about `the value of \bar{x}' are correct and one is NOT correct. Which one of the following statements is INCORRECT? Select one: a. If the value of \bar{x} from one SRS (simple random sample) is greater than μ, then the value of \bar{x} from the next SRS will be less than μ. b. The value of \bar{x} almost never equals the value of μ. c. The value of \bar{x} gets closer and closer to μ as sample size increases provided the sample is random. d. The value of \bar{x} varies from SRS to SRS. e. The value of \bar{x} can be one of many, many possible values.
If the value of \bar{x} from one SRS (simple random sample) is greater than μ, then the value of \bar{x} from the next SRS will be less than μ. The value of \bar{x} from any SRS is as likely to be greater than \mu as it is to be less than \mu. And the value of \bar{x} from one SRS is independent of the value of \bar{x} from a second SRS. This means that the value of \bar{x} from one SRS does not affect the value of \bar{x} from another SRS. The correct answer is: If the value of \bar{x} from one SRS (simple random sample) is greater than μ, then the value of \bar{x} from the next SRS will be less than μ.
Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to estimate the probability of winning the game?
If you repeat a random phenomenon many, many times, you can estimate the probability of winning by dividing the number of successes by the total number of trials. Since playing the game is a random phenomenon, we can estimate the probability of winning by dividing the number of times we win in the many, many trials by the total number of trials. The correct answer is: yes
What is the purpose of obtaining a statistic value from sample data?
In statistical inference, we use the value of a statistic to estimate the value of a parameter. The correct answer is: To estimate a parameter.
True or False: In statistics random is synonymous with haphazard.
In statistics random is NOT synonymous with haphazard. In statistics, a phenomenon is random if the outcome of one play is unpredictable, but the outcomes of many, many plays form a distribution from which we can estimate probabilities. The term random has a unique meaning in statistics. The correct answer is: False
What effect does increasing n have on the spread of the theoretical sampling distribution of \bar{x}?
Increasing the sample size, n, decreases the spread of the theoretical sampling distribution of our statistic, \bar{x}. The correct answer is: The spread of the theoretical sampling distribution of \bar{x} gets smaller.
What effect does increasing n have on the mean of the theoretical sampling distribution of \bar{x}? Note: Theoretical just means created from all possible samples. Select one: a. The mean of the theoretical sampling distribution gets closer to the mean of the population as n increases. b. The mean of the theoretical sampling distribution gets closer to \muas n increases. c. The variability of the population mean decreases as n increases. d. It has no effect. The mean of the theoretical sampling distribution of \bar{x} always equals the mean of the population.
Increasing the sample size, n, has no effect on the mean of the theoretical sampling distribution of \bar{x}. It will always be equal to the mean of the population, regardless of the size of n. The correct answer is: It has no effect. The mean of the theoretical sampling distribution of \bar{x} always equals the mean of the population.
What is a residual plot? Select one: a. A scatterplot of the observed y's versus the predicted y's (\hat{y}'s) used to assess the size of the residuals. b. A scatterplot of the residuals versus the x's used to diagnose problems with the regression assumptions. c. A dotplot of the residuals used to decide whether any residuals are unusually "large". d. A scatterplot of the y's versus the x's used to visualize the direction, form and strength of their relationship.
It is a scatterplot of the residuals versus the x's. It helps identify problems with the regression line. The correct answer is: A scatterplot of the residuals versus the x's used to diagnose problems with the regression assumptions.
For introductory statistics, suppose the correlation between exam score and the time (in hours) that elapsed since the exam period began before the student started taking the exam is -0.96. Because the negative relationship is so strong, can you say that taking the exam late on the last day would cause you to get a lower score than had you taken it earlier? Select one: a. Yes, because students who wait have less time to finish the exam. b. No, because there is a potential lurking variable of preparedness.
Just because two variables are strongly correlated, doesn't mean that one causes changes in the other. The only way to establish causation is to do an experiment. In this case, students who are well prepared probably take the exam early. Those who are not, procrastinate until they finally have to take it. Since we did not preform an experiement level of preparedness is a potential lurking variable that we cannot avoid and therefore cannot imply a causal relationship. The correct answer is: No, because there is a potential lurking variable of preparedness.
Many colleges offer online versions of courses that are also taught in the classroom. It often happens that the students who enroll in the online version do better than the classroom students on the course exams. This is not because online instruction is more effective than classroom teaching, but because the people who sign up for online courses are often quite different from the classroom students. Three of the following statements describe lurking variables for why online students do better, but one does NOT. Which one does NOT give a valid lurking variable? Select one: a. Material is presented online versus in a classroom. b. Students are often more mature in online classes than in regular classes. c. Students in online classes know that they are responsible for their learning whereas students in regular classroom instruction rely on their teacher to teach them. d. Students in online classes are often more motivated to succeed than students in regular classes.
Method of material presentation is the explanatory variable—not a lurking variable. The correct answer is: Material is presented online versus in a classroom.
True or false: Rows and columns are used to display the explanatory and response variables for bivariate categorical data. Which is which?
Often the row variable is the explanatory variable and the response variable is the column variable. The correct answer is: True
Suppose the slope of the regression line is 8.9904. What does this slope mean in context of the problem?
On average, if the size of a house increases by 1 square foot we would expect the price of the house to increase by 8.9904 thousands of dollars.
Fill in the blank: The mean of the theoretical sampling distribution of \bar{x} is _________________ the mean (μ) of the strongly left skewed population distribution from which samples are taken. Note: Theoretical just means created from all possible samples.
Regardless of sample size or population shape, the mean of the theoretical sampling distribution of \bar{x} is exactly equal to the mean of the population. The correct answer is: exactly equal to
Fill in the blank: For small random samples from a Normal population distribution, the shape of the sampling distribution of \bar{x} is _________________.
Regardless of sample size, the shape of the sampling distribution of \bar{x} is exactly Normal when taking random samples from a Normal population. The correct answer is: Normal
If you want to model the relationship between two quantitative variables and use one variable to predict the other, which type of analysis should you use? Select one: a. Correlation analysis b. Regression analysis
Regression analysis provides models that can be used to predict the value of one variable from another. The correct answer is: Regression analysis
Voter registration records show that 68% of all voters in Indianapolis are registered as Republicans. To test a random digit dialing device, you use the device to call 150 randomly chosen residential telephones in Indianapolis. Of the registered voters contacted, 73% are registered Republicans. Decide whether each of the boldface numbers is a parameter or a statistic.
Since 68% is the percent of "all" voters in Indianapolis, it is a parameter. And since 73% is the percent of registered voters in the 150 randomly chosen residential telephones, it is a statistic. The correct answer is: 68% is a parameter and 73% is a statistic.
A residual plot is really a scatterplot. So what does each point in a residual plot represent? a. An x value and the corresponding y value b. An x value and its corresponding residual value c. A y value and its corresponding \hat{y} value
Since a residual plot is really a scatterplot, the xare plotted versus the residuals (or equivalently, the x values versus the \hat{y}'s (i.e., predicted y's). Thus, each dot gives an x value and the corresponding residual value. The correct answer is: An x value and its corresponding residual value
What operation can be performed on categorical data? a. Adding b. Subtracting c. Multiplying d. Counting
Since categorical data are words not numbers that have value, mathematical operations such as adding, subtracting and multiplying do not make sense. Only counting is appropriate for categorical data. That is why it is sometimes called "count" data. The correct answer is: Counting
How do we display a set of bivariate categorical data? Select one: a. In a scatterplot b. In side-by-side boxplots c. In a back-to-back stemplot d. In a two-way table
Since counting is the only appropriate operation for categorical data, these counts need to be displayed. We display them with two way tables. The correct answer is: In a two-way table
The nonprofit group Public Agenda conducted telephone interviews with parents of high school children. Interviewers randomly chose equal numbers of parents from the races of black, white, and Hispanic from school records. One question asked was "Are the high schools in your state doing an excellent, good, fair or poor job, or don't you know enough to say?" What type of study is this? Select one: a. Observational study--simple random sample b. Observational study--stratified sample c. Observational study--multistage sample d. Experiment--completely randomized experiment e. Experiment--randomized block design Referring to the previous question, what is the explanatory variable? Referring to the previous two questions, what is the response variable?
Since equal numbers of black, white and Hispanic parents were chosen at random, sampling was done within race so this is a stratified sample. The correct answer is: Observational study--stratified sample In this study Public Agenda is comparing opinions of the races of parents. The explanatory variable is the race of the parents. The correct answer is: Race of parent The outcome measure in this study is the opinion on how the parents feel high schools are doing in this state. The correct answer is: Opinion on how high schools are doing
What is the purpose of the correlation coefficient, r? Select one: a. To assess the variability of the Y's. b. To determine whether y can be used to predict the variability of the X's. c. To measure the strength of the linear relationship between X and Y. d. To model the relationship between x and y with an algebraic equation.
Since it is difficult to describe the strength of the linear relationship between X and Y in words, the correlation coefficient gives us a numerical measure for the strength of the linear relationship between X and Y. The correct answer is: To measure the strength of the linear relationship between X and Y.
Why do we simulate many, many SRS's and obtain the value of \bar{x} from each sample? a. To determine the explanatory variable in each SRS. b. To answer questions about how much the \bar{x}'s vary from one SRS to the next. c. To compute the value of the population mean μ.
Since statistical inference is using the value of a statistic to estimate the value of a parameter, we need to simulate the value of a statistic using many, many SRS's. These simulations will give us information how much \bar{x} varies and how far off an \bar{x} value might be from μ. The correct answer is: To answer questions about how much the \bar{x}'s vary from one SRS to the next.
True or false: Whenever the correlation between x and y is zero, the slope of the least squares regression line is also zero.
Since the formula for slope is b = r\frac{s_y}{s_x}, if r = 0, then since both sy and sx are always positive, slope must also equal zero. Note: sy and/or sx could be zero. But if either is zero, there is no point in doing regression analysis. The correct answer is: True
The following situation applies to Questions 6-8: Researchers deliberately overfed 16 young adults for 8 weeks. They measured fat gain (in kilograms) and change in energy use (in calories) from activity other than deliberate exercise. These activities called nonexercise activities (NEA) include fidgeting, daily living and the like. The correlation between fat gain and NEA was r = -0.7786. What does this tell us about the relationship between fat gain and NEA? Select one: a. As NEA increases, fat gain decreases. b. As NEA decreases, fat gain decreases. c. As NEA increases, there is no change in fat gain. Is the relationship between fat gain and NEA strong or weak? Suppose fat gain was measured in pounds instead of kilograms. How would the value of r change? Select one: a. r would be closer to -1.0. b. r will be closer to 0.0. c. r would be closer to +1.0. d. r would not change.
Since the value of r is negative, we know that increases in one variable are associated with decreases in the other variable. The correct answer is: As NEA increases, fat gain decreases. For r = -0.7786, we would classify the relationship as strong. It is much closer to -1 than to 0. The correct answer is: Strong Changing unit of measure does not change the value of r because r is computed from z-scores that have no unit of measure. The correct answer is: r would not change.
For the line in the following graph, what is the value of slope? We can see that the line intersects the x-axis at 3. It also intersects the y-axis at 6. But as x values increase, y values decrease so the slope is negative. Select one: a. -3.0 b. -2.0 c. -1.0 d. +1.0 e. +2.0 f. +3.0 What is the equation?
Slope equals {rise y}/{run x}. We can see that the line intersects the x-axis at 3. It also intersects the y-axis at 6. But as x values increase, y values decrease so the slope is negative. We have \frac{rise}{run} = \frac{6}{-3} = -2. The correct answer is: -2.0
True or false: Correlation tells us the average increase in y for every one unit increase in x.
Slope tells us the average increase in y for every one unit increase in x, not correlation. The correct answer is: False
Assume that the y-intercept of the regression line used to predict the husband's height (Y) from the wife's height (X) in young couples is 5.68. What is the correct interpretation of this y-intercept?
Sometimes, even if the number exists it is not useful to us. This is one such case. The correct interpretation would be "In a young couple, if the wife is 0 inches tall, we would expect, on average, her husband would be 5.68 inches tall." This interpretation doen't make any logical sense as no women 0 inches tall would be married. This is an example of extrapolation. The correct answer is: In a young couple, if the wife is 0 inches tall, we would expect, on average, her husband would be 5.68 inches tall.
When sampling from a population that has a non-Normal distribution, which one of the following is a statement of the Central Limit Theorem? a. As sample size increases, the variable \bar{x} gets closer and closer to \mu. b. For large random samples the shape of the sampling distribution of \bar{x} is approximately Normal. c. As sample size increases, the mean of all possible \bar{x}'s gets closer and closer to \mu. d. As sample size increases, the standard deviation of the sample also increases.
Statement A is a statement of the Law of Large Numbers. Statements C and D are incorrect statements. The correct answer is: For large random samples the shape of the sampling distribution of \bar{x} is approximately Normal.
Three of the following are correct statements about probabilities and one statement is incorrect. Which statement is NOT correct? Select one: a. All probabilities must be between zero and one inclusive. b. The probability that an event does not occur equals one minus the probability that the event does occur. c. The sum of the probabilities from all possible outcomes equals one. d. If two events cannot occur simultaneously, then the probability that either occurs cannot be computed.
Statements A, B, and C are properties of probabilities. However, statement D is an incorrect statement. If two events cannot occur simultaneously, then the probability of either occurring can be computed by adding the probabilities of the two events. The correct answer is: If two events cannot occur simultaneously, then the probability that either occurs cannot be computed.
Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A straight line relating y to x has an equation of the form y = a + bx In this equation, b is the slope, the amount by which y changes when x increases by one unit. The number a is the intercept, the value of y when x = 0.
Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A straight line relating y to x has an equation of the form y = a + bx In this equation, b is the slope, the amount by which y changes when x increases by one unit. The number a is the intercept, the value of y when x = 0.
When the population distribution is non-Normal, what effect does increasing n have on the shape of the theoretical sampling distribution of \bar{x}?
The Central Limit Theorem says that as the sample size, n, increases, the shape of the theoretical sampling distribution of \bar{x} gets closer and closer to Normal. This is true whenever the shape of the population distribution is non-Normal. The correct answer is: The shape of the theoretical sampling distribution of \bar{x} gets closer and closer to Normal.
True or False: The Law of Large Numbers says that the sample mean, \bar{x}, equals the population mean, μ, whenever n > 100.
The Law of Large Numbers says that the sample mean, \bar{x}, gets closer and closer to the population mean, μ, as sample size increases--provided that the sample is random. The correct answer is: False
True or false: Suppose we take all possible samples of the same size from a population and for each sample, we compute \bar{x}. The mean of these \bar{x} values will exactly equal the mean of the population (μ) from which the samples were taken. Select one: a. True Correct b. False
The answer is true because the mean of all possible \bar{x} values from all possible samples is always exactly equal to μ, the mean of the population regardless of sample size. Please don't confuse the mean of a sample, namely \bar{x} with the mean of all possible \bar{x}'s. The mean of a sample, \bar{x}, gets closer to μ as n increases according to the Law of Large Numbers, but the mean of all possible \bar{x}'s always exactly equals μ. The correct answer is: True
In statistics, how do we define probability of an outcome?
The correct answer A is simply the statistical definition of probability. Statements B and C are nonsense. D gives a fact about probabilities---they are all numbers between zero and one. The correct answer is: The fraction of times the outcome occurs in many, many trials of a random phenomenon.
In statistics, how do we define probability of an outcome? Select one: a. The fraction of times the outcome occurs in many, many trials of a random phenomenon. b. The subjective assessment of the likelihood of all possible outcomes as set by experts. c. The ratio of one over the total number of possible outcomes. d. The rule that assigns a number between zero and one to the outcome of a random variable.
The correct answer is simply the statistical definition of probability. The correct answer is: The fraction of times the outcome occurs in many, many trials of a random phenomenon.
Read the following statements about correlation carefully. Three are true and one is false, which statements is FALSE? Select one: a. Because correlation has a unit of measurement, it is valid to say that the correlation between height (in inches) and arm span (in inches) is 0.73 inches. b. If we switch the values of X and Y (switched the X and Y axes) the value of r would remain unchanged. c. If we measure the X variable height (inches) against the Y variable weight (pounds) in different units (cm or kilograms), the value of r will still be the same. d. The correlation coefficient r can be sensitive to outliers.
The correct answer is: (A) Because correlation has a unit of measurement, it is valid to say that the correlation between height (in inches) and arm span (in inches) is 0.73 inches.
The mean height of American women in their twenties is about 64 inches, and the standard deviation is about 2.7 inches. The mean height of men the same age is about 69.3 inches, with standard deviation about 2.8 inches. If the correlation between the heights of husbands and wives is about r = 0.5, what is the slope of the regression line used to predict the husband's height (Y) from the wife's height (X) in young couples?
The correct answer is: 0.5185 Because slope = r(standard deviation of Y over standard deviation of X)
The scores of 12th-grade students on the national Assessment of Education Progress year 2000 mathematics test have a distribution that is approximately Normal with mean \mu = 250. What is the population? Select one: a. All 12th-grade students b. All 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test c. The mean score for all 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test d. 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test who recieved a score of 250 What is the parameter of interest? Select one: a. All 12th-grade students b. All 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test c. The mean score for all 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test d. 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematiics test who recieved a score of 250
The correct answer is: All 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test The correct answer is: The mean score for all 12th-grade students who took the National Assessment of Educational Progress year 2000 mathematics test
Because of concerns about employee obesity and related health problems, a very large company conducted a study to compare two-weight reducing programs (low-carb diet and low-fat diet). Forty employees volunteered to participate in the study for a 10-week period. Half of the employees were randomly assigned to the low-carb diet and the other half randomly assigned to the low-fat diet. What type of study is this? Select one: a. Observational Study b. Completely Randomized Experiment c. Randomized Block Experiment d. Matched Pairs Experiment
The correct answer is: Completely Randomized Experiment
Which of the following allows you to interchange the roles of x and y? a. Correlation b. Regression
The correct answer is: Correlation
Which one of the following statements is a correct statement about correlation coefficient? a. The correlation between major (like mathematics, accounting, Spanish, etc.) and overall GPA is very high. b. In professional baseball, the correlation between players' batting average and their salary is positive. c. The correlation between percentage of disposable income required to meet consumer loan payments and the percentage of disposable income required to pay mortgage payments for selected years is 11.8% d. The correlation for time between planting and harvesting of a grain called paddy and the yield of paddy in kilograms per hectare is r = 0.27. If we measure time in hours instead of days, then the correlation will be r = (24)(0.27).
The correct answer is: In professional baseball, the correlation between players' batting average and their salary is positive. BECAUSE both are quantitative and a correlation IS NOT a number. r, the correlation coefficient is!
To investigate the effects of the drug phen-fen, 200 women in the 30-40 age range who had used the drug for at least one year were located. 200 women of the same age group who had not used the drug were also located. The incidence of heart valve abnormality was compared between the two groups. What type of study is this?
The correct answer is: Observational Study
what is the name of the difference (distance) represented by = y - \bar{y}? a. a. Total variation in y (the variability of the y's about \bar{y}.) b. Explained variation (The variability of the predicted y's (\hat{y}'s) about \bar{y}.) c. Residual or "unexplained variation" (The variability of the y's about the regression line.)
The correct answer is: a. Total variation in y (the variability of the y's about \bar{y}.)
What does the correlation coefficient measure?
The correlation coefficient gives a numerical measure of the strength of the linear relationship between two quantitative variables.
Which one of the following is a correct statement about correlation coefficient? Select one: a. The correlation between profession (like physician, accountant, etc.) and starting salary is very high. b. In professional basketball, the correlation between players' average points scored during a game and their corresponding average minutes played per game is positive. c. The correlation between amount of fertilizer applied to fields and the yield for the field is .74 bushels. d. If the correlation between time spent until exhaustion on a treadmill and 20-km ski time in minutes was found to be r = -.8, then the correlation between time spent until exhaustion and ski time in seconds would be r = (60) (-.8).
The correlation coefficient requires that both variables be quantitative. The variable, "Players' average points scored", is quantitative as is the variable, "average minutes played per game." Thus, Answer B is the correct statement about correlation coefficient. The correct answer is: In professional basketball, the correlation between players' average points scored during a game and their corresponding average minutes played per game is positive.
A measure of the success of knee surgery is range of motion after surgery. When age was analyzed as a predictor for range of motion after surgery, the correlation coefficient was 0.553. What percentage of the variation in range of motion can be explained by age?
The definition of r2 is "the percentage of total variation in y that can be explained by the regression by x". The question asks about "the percentage of total variation in y (range of motion) that can be explained by x (age)." Thus, the question asks for r2 = (0.553)2 = 0.306 or 30.6%. The correct answer is: 30.6%
A study of class attendance and grades among first-year students at a state university showed that in general students who attended a higher percent of their classes earned higher grades. Class attendance explained 16% of the variation in grade index among the students. What is the numerical value of the correlation, r, between percent of classes attended and grade index? Select one: a. 0.026 b. 0.160 c. 0.400 d. 0.632
The definition of r2 is "the percentage of total variation in y, the response variable, that can be explained by x, the explanatory variable." The question tells us that "class attendance explained 16% of the variation in grade index." Thus, the question says that r2 = 0.16. To get the value of the correlation coefficient, we simply take the square root of 0.16 to get r = \sqrt{0.16} = 0.40. The correct answer is: 0.400
If a house is 2530 square feet in size, what will be its predicted selling price (in thousands of dollars)? The y-intercept of the regression equation is -39.806084 The slope of the regression line is 0.099197954.
The predicted selling price for a house 2530 feet in size using the regression equation -39.806084 + (0.099197954*size) =selling price is: -39.806084 + (0.099197954*2530) = 211.1647396. So the predicted price for a house that is 2530 square feet is $211.16 in $1,000s. The correct answer is: $211.16
An insurance company knows that in the entire population of millions of homeowners, the mean annual loss from fire is μ = $250 and the standard deviation of the loss is σ = $1000. This distribution of losses is strongly right skewed: many homeowners had $0 loss and a few had very large losses. We want to compute the probability that a randomly selected homeowner will have a loss greater than $275. Can we use the standard Normal table to find the probability? Why or why not? Refer to the distribution of homeowners losses from fire described in question 8 above. We need to know whether the company can safely base its rates on the assumption that its mean loss will be no greater than $275 if it sells 10,000 policies. Should we use the standard Normal table to find the probability that the mean of a sample of 10,000 policies does not exceed $275?
The distribution of losses is very right skewed and cannot be modeled by a Normal curve. Hence, we cannot use the standard Normal table to obtain the probability. The correct answer is: No, because we cannot compute the probability on a loss using the standard Normal table since the shape of the distribution of losses is not Normal. We know that the distribution of losses is not Normal since it is strongly right skewed. However, since the sample size is large (n = 10000) we can apply the Central Limit Theorem which states that for large random samples, the shape of the sampling distribution of \bar{x} is approximately Normal. This will allow us to use the standard Normal table to get the probability. Answers A and D are incorrect because they apply to an individual homeowner. Answer B is incorrect because the distribution of loss if not Normal. The correct answer is: Yes, because we want to find the probability on a sample mean from a large random sample which has an approximately Normal distribution according to the Central Limit Theorem.
True or false: The least squares regression line always intersects the point: (\bar{x}, \bar{y}).
The equation for the least squares line is\hat^{y} = a + bx. Since a = \bar{y} - b(bar{x}), if we predict y for \bar{x}, we get \hat{y} = a + bx = ( \bar{y} - b(bar{x}) +b (bar{x}) = \bar{y}. So the predicted \hat{y} for \bar{x} is \bar{y} and thus, the point (\bar{x}, \bar{y}) is on the regression line. The correct answer is: True
Researchers collected data on the number of breeding pairs of Scarlet Macaw in an isolated area of an Amazon rainforest in each of 8 years (X) and the percent of males who returned the next year (Y). The data show that the percent returning is lower after successful breeding seasons and that the relationship is roughly linear. The following shows a StatCrunch regression output for these data: Simple linear regression results: Dependent Variable: percent.returned Independent Variable: breeding.pairs percent.returned= 136.682 -3.218breeding.pairs Sample size: 8 R (correlation coefficient) = -0.8329 R-sq = 0.6937 Estimate of error standard deviation: 9.460 Parameter estimates: Intercept 136.682 Slope -3.218 What is the equation of the least-squares regression line for predicting the percent of males that return from the number of breeding pairs? Referring to the regression output given above, what percent of the year-to-year variation in percent of returning males is explained by its straight-line relationship with number of breeding pairs the previous year?
The first row is the y-intercept row. In the "Estimate" column, we find the estimated y-intercept to be a = 136.682. The second row is the slope row and in the "Estimate" column we find the estimated slope to be b = -3.218. So the regression equation is \hat{y} = 136.682 - 3.218x. The correct answer is: \hat{y} = 136.682 - 3.218x r2 gives us the percentage of variation in y that is explained by the regression of y on x. In the output, we are told: "R-sq = 0.6937" or 69.37%. The correct answer is: 69.37%
How do you find r?
The formula for r is r = 1/{n - 1}(sum of products of the z scores for x and y)
True or false: Slope, b, and correlation coefficient, r, have the same sign (positive or negative).
The formula for slope is b = r (sy/s_x). Since both s_y and s_x are positive, slope and correlation coefficient must have the same sign. The correct answer is: True
True or false: The sign of slope (positive or negative) is the same as the sign of r, the correlation coefficient (positive or negative).
The formula for slope is b = r \frac{s_y}{s_x}. Since both s_y and s_x are positive, slope and r have to have the same sign. The correct answer is: True
True or false: The z-score products used to compute the correlation coefficient are based on deviations from the point: (\bar{x}\text{, }\bar{y}).
The formula for the correlation coefficient is r = \frac{1}{n - 1} \sum{(\frac{x - \bar{x}}{s_x})(\frac{y - \bar{y}}{s_y}). The values of (x - \bar{x}) and (y - \bar{y}) in the numerators of the z-scores are deviations from the point (\bar{x}\text{, }\bar{y}). The correct answer is: True
True or false: Suppose we take all possible samples of size 50 from a population whose distribution is left skewed and for each sample, we compute \bar{x}. The shape of the histogram of these \bar{x}-values will be left skewed, just like the shape of the left skewed distribution of the population.
The larger the sample size, the closer to Normal is the shape of the histogram of all possible \bar{x}'s from all possible samples. Recall that when we sample repeatedly from a non-Normal population like the closing stock prices, the shape of the histogram of the \bar{x}-values was close to Normal and less skewed than the population. Soon you will learn two facts: 1. The shape of the histogram all \bar{x} values from all possible samples IS Normal whenever the population from which we sample has a Normal shape. 2. The shape of the histogram of the \bar{x}-values from all possible samples will be approximately Normal for large sample sizes---even when the population shape is not Normal. The correct answer is: False
In a two-way table, what is given in the "margins"? a. The totals for each row and each column. b. The count of non-responses for each category. c. The conditional percentages for each category.
The margins show the totals in each row and each column. These totals are sometimes called marginals because they are in the margins of the table. The correct answer is: The totals for each row and each column.
The distribution of the 1228 closing stock prices on the New York Stock exchange was very right skewed with a mean of μ = 26 and a standard deviation of σ = 20. All possible samples of size n = 4 were taken from these stocks and \bar{x} computed. How does the mean of these \bar{x} values compare with the mean of the closing stock prices? Select one: a. The mean of the \bar{x} values is less than the mean (μ) of the closing stock prices. b. The mean of the \bar{x} values is equal to the mean (μ) of the closing stock prices. c. The mean of the \bar{x} values is greater than the mean (μ) of the closing stock prices. How does the standard deviation of these \bar{x}-values compare with the standard deviation (σ) of the closing stock prices? how does the shape of the histogram of these \bar{x} values compare with the shape of the histogram of the closing stock prices? Referring to question 9, suppose the sample size is changed from n = 4 to n = 16. The mean of \bar{x} values from samples of size 16 is closer to μ, the mean of the closing stock prices, than the mean of the \bar{x} values from samples of size 4. True or false: Referring to question 9, suppose the sample size is changed from n = 4 to n = 16. The standard deviation of \bar{x}-values from samples of size 16 is smaller than the standard deviation of the \bar{x}-values from samples of size 4. True or false: Referring to question 9, suppose the sample size is changed from n = 4 to n = 16. The shape of the histogram of \bar{x}-values from samples of size 16 is closer to Normal than the shape of the histogram of the \bar{x}-values from samples of size 4.
The mean of the \bar{x} values from all possible samples always exactly equals the mean of the original population. In this example, the original population is all closing stock prices. The correct answer is: The mean of the \bar{x} values is equal to the mean (μ) of the closing stock prices. For n >1, the standard deviation of \bar{x} values is always less than the standard deviation of the original population. In this example, the original population is all closing stock prices. The correct answer is: The standard deviation of the \bar{x}-values was less than the standard deviation (σ) of the closing stock prices. For n >1, the shape of the histogram of all possible \bar{x} values is always closer to Normal than the shape of the histogram of the original population when the original population is NOT Normal. Here the original population is all closing stock prices which was right skewed (not Normal). The correct answer is: The shape of the histogram of the \bar{x}-values is less skewed than (closer to Normal than) the shape of the histogram of the closing stock prices. This statement is false since both means are the same. The mean of all possible \bar{x} values doesn't get closer to μ, the mean of the population. Regardless of sample size, the mean of all possible \bar{x} values always equals μ, the mean of the population. Please don't confuse the mean of the sample, namely \bar{x}, with the mean of all possible \bar{x}'s. The mean of the sample, \bar{x}, gets closer to μ as n increases, but the mean of all possible \bar{x}'s always exactly equals μ. The correct answer is: False The standard deviation of the sampling distribution of \bar{x} decreases as sample size increases. This fact is reflected in the Law of Large Numbers which states that \bar{x} gets closer to \mu as n increases. The correct answer is: True The larger the sample size, the closer to Normal is the shape of the sampling distribution of \bar{x}. However, this fact only applies when we are sampling from a non-Normal population like the closing stock prices. We will learn in the next lesson that the shape of the histogram all possible \bar{x} values IS Normal when the population from which we sample has a normal shape. The correct answer is: True
The number of accidents per week at a hazardous intersection varies with mean 2.2 and standard deviation 1.4. This distribution takes only whole-number values, so it is certainly not Normal. Let \bar{x} be the mean number of accidents per week at the intersection during a year (52 weeks). Consider the 52 weeks to be a random sample of weeks. What is the mean of the sampling distribution of \bar{x}?
The mean of the sampling distribution of \bar{x} equals μ which is 2.2. The correct answer is: Equal to 2.2
Probability is a measure of how likely an event is to occur. What is the probability for the event described in the following statement? "This event is certain. It will occur on every trial."
The probability of an event that is certain and will occur on every trial is 1.0. The correct answer is: 1.0
If you want to measure the strength of the linear relationship between two quantitative variables, which of the following should you use?
The purpose of correlation is to measure strength of linear relationships. The correct answer is: Correlation
We define the distribution of a random variable as "a list of the possible values and how often each value occurs." What word or words do we need to change in that definition to define `sampling distribution of \bar{x}?'
The random variable of a sampling distribution of \bar{x} is \bar{x}, the sample mean. So, the sampling distribution of \bar{x} is a list of the possible values of \bar{x} together with how often these values occur. The correct answer is: Change "of a random variable" to "of \bar{x}".
Coffee is a leading export from several developing countries. When coffee prices are high, farmers often clear forest to plant more coffee trees. Here are five years' data on prices paid to coffee growers in Indonesia and the percent of forest area lost in a national park that lies in a coffee- producing region: Price (cents per pound) 29 40 54 55 72 Forest lost (percent) 0.49 1.59 1.69 1.82 3.10 How would you describe the relationship between price and percent of forest lost? Construct a scatterplot to answer the question. The sum of products of the z-scores for x and y is 3.8203. What is the value for r?
The relationship is linear and positive. Since the points are concentrated very close to the linear form so we say that it is very strong. The formula for r is r = 1/{n - 1}(sum of products of the z scores for x and y) For these data, since n - 1 = 5 - 1 = 4 and the sum of the products of the z-scores for x and y is 3.8203, r = \frac{1}{4}(3.8203) = 0.955. The correct answer is: 0.9551
A large randomized trial was conducted to assess the efficacy of a new drug called Chantix R for smoking cessation compared with bupropion and a placebo. Chantix R is different from most other quit-smoking products, in that it targets nicotine receptors in the brain, attaches to them, and blocks nicotine from reaching them, while bupropion is an antidepressant often used to help people stop smoking. Smokers who were generally healthy and who smoked at least 10 cigarettes per day were assigned at random to take Chantix R (n = 352), bupropion (n = 329), or a placebo (n = 344). The study was double blind, with the response measure being continuous absence from smoking for weeks 9 through 12 of the study. What type of study is this? Observational study--simple random sample b. Observational study--stratified sample c. Observational study--multistage sample d. Experiment--completely randomized experiment e. Experiment--randomized block design
The smokers were randomly assigned to the drug cessation treatment groups so this is a completely randomized experiment. The correct answer is: Experiment--completely randomized experiment
Suppose we have a very right skewed population distribution where μ = 80 and σ = 20. For random samples of size n = 100, what is the standard deviation of the theoretical sampling distribution of \bar{x}?
The standard deviation of sampling distribution of \bar{x} equals \frac{\sigma }{\sqrt{n}}=\frac{20}{\sqrt{100}}=2.0. This is less than σ = 20. We sometimes shorten the phrase "the standard deviation of sampling distribution of \bar{x}" to "the standard deviation of \bar{x}. The correct answer is: Less than 20
What does the theoretical sampling distribution of \bar{x} consist of?
The theoretical sampling distribution \bar{x} is based on the \bar{x}'s from all possible samples. It is not based on the results of a sample (i.e., one sample) nor is it the \bar{x} from one sample. It certainly is not obtained by combining results from 150 or 200 samples, but all possible samples. The correct answer is: The values of the \bar{x}'s from all possible samples
A simple random sample of n = 9 students was taken. For each student, their score on an anxiety test (x) and their score on the first statistics exam (y) were recorded and are summarized in the following scatterplot. There is obviously a strong negative linear relationship between anxiety score and exam score. Does this imply that high anxiety causes a student to score low on an exam?
These data are from an observational study and cannot be used to establish causation. More studies are needed to establish causation than just one if an experiment cannot be performed. The correct answer is: No, because causation cannot be concluded from an observational study and this study is an observational study.
In the following scatterplot X = number of Methodist ministers per year is plotted against Y = number of barrels of rum imported for the same year. The Correlation Coefficient is close to +1. Data was collected every ten years in Boston. What is one lurking variable that could be influencing the strength of this relationship? a. Increase in population of Boston' is a lurking variable. As the population increased, so did the number of Methodist ministers and the demand for imported rum. b. Methodist ministers drink a lot of rum is a lurking variable. c. The data was collected only every ten years--leaving out the data in the gaps gives an inflated correlation coefficient, which could be a lurking variable. Since the correlation coefficient is very close to +1.0, can we say that the increase in number of Methodist ministers causes the increase in the number of barrels of imported rum?
These data were collected every ten years over a hundred and fifty year period. As time went by, the population of Boston grew and grew. With the growth in the population came an increase in rum consumption and an increase in the number of Methodist ministers. The correct answer is: Increase in population of Boston' is a lurking variable. As the population increased, so did the number of Methodist ministers and the demand for imported rum. This is a classic case of why we say that high correlation does not imply causation. These data were not collected using an experiment—they are observational data. We will discuss this more in a later lesson. The correct answer is: No, because association does NOT imply causation for observational studies.
Use the table below for questions 10-12 to illustrate the idea of a sampling distribution in the case of a very small sample from a very small population. The population is the scores of 10 students on an exam: Student - Score 0 82 1 62 2 80 3 58 4 72 5 73 6 65 7 66 8 74 9 62 The parameter of interest is the mean score μ in this population. What is the mean of the 10 scores in the population? Note: This is the population mean μ.
These scores sum to 694. So, μ = 694/10 = 69.4. The correct answer is: 69.4
True or false: The phrases "distribution of a sample" and "sampling distribution of \bar{x}" are synonymous.
These two may sound similar, but they are totally different. The distribution of a sample gives the distribution for data in one sample. The sampling distribution of \bar{x} gives the distribution of all possible \bar{x}'s from all possible samples. Be very careful with these two phrases. The correct answer is: False
True or false: Correlation coefficient measures the strength (and direction) of only the linear relationship between x and y.
This is a true statement. Correlation coefficient measures strength of relationship between x and y, but only the linear component of the relationship between x and y. The correct answer is: True
Suppose we have a very right skewed population distribution where μ = 80 and σ = 20. For random samples of size n = 100, what is the shape of the theoretical sampling distribution of \bar{x}?
This is an application of the Central Limit Theorem which states that for large random samples, the shape of the sampling distribution of \bar{x} is approximately Normal. The correct answer is: Approximately Normal
True or false: The probability that an event does not occur equals one minus the probability that the event does occur.
This is an important and often used probability rule. The correct answer is 'True'.
True or false: Mathematically, slope equals rise in y ----- run in x
This is how slope is defined in mathematics. The correct answer is: True
When is the sampling distribution of \bar{x} approximately Normally distributed when sampling from non-Normal populations?
This is the application of the Central Limit Theorem. Note the difference between the wording for this question and the wording for the previous question. In the previous questions, the sampling distribution of \bar{x} is "exactly'' Normal because the population is Normal. However, here the sampling distribution of \bar{x} is only "approximately'' normal because the shape of the population is non-Normal. The correct answer is: When the sample is large and SRS.
Which one of the following is a statistic? Select one: a. The proportion of all applicants to the law school who were accepted. b. The mean starting salary of all graduates at the law school in this year's class. c. The standard deviation of the starting salary of all law school graduates in this year's class. d. The median starting salary of a sample of law school graduates in this year's class.
This is the only answer that describes a summary measure on a sample. The correct answer is: The median starting salary of a sample of law school graduates in this year's class.
What does the Law of Large Numbers tell us?
This is the statement of the Law of Large Numbers as given in both the text and in the StatTutor lesson on "Statistical Estimation and the Law of Large Numbers." Note: The law of large numbers refers to the mean of a sample and not to the sampling distribution of \bar{x}. The correct answer is: As sample size increases, the variable \bar{x} from a random sample gets closer and closer to μ.
True or false: The proportion of voters in a sample who favor a certain candidate is always a statistic.
This is true because any characteristic of a sample is a statistic. So, the proportion of a sample who favor a certain candidate is a statistic. The correct answer is: True
True or false: If the value of X tells us nothing about the value of Y, then r2 = 0%.
This is true because if X tells us nothing about the value of Y, then none of the variation in Y is explained by X so r2 = 0%. The correct answer is: True
True or false: Values of correlation, r, close to -1.0 indicate weak linear relationships and values close to +1 indicate strong linear relationships.
This statement is false because values of r close to -1 indicate strong negative linear relationships and values close to +1 indicate strong positive linear relationships. Values of r close to 0.0 indicate no "linear" relationship.
True or false: The correlation between x and y equals the correlation between y and x (i.e., changing the roles of x and y does not change r).
This statement is true because correlation coefficient is the average of the products of the z-scores for x and the z-scores for y. Since x times y equals y times x, the z-score for x times the z-score for y equals the z-score for y times the z-score for x. Interchanging x and y in the computation does not change the value of r. The correct answer is: True
True or false: Correlation coefficient, r, does not change if the unit of measure for either X or Y is changed.
This statement is true because the correlation coefficient has no unit of measure. This is because it is computed from z-scores that have no unit of measure and therefore, the change in unit of measure for either X or Y or both is canceled out. The correct answer is: True
True or false: All processes have variation.
This statement is true. Whether the process is input, processing or output, a certain amount of natural variation is expected. The correct answer is: True
True or False: You should not predict for a data point that lies outside the range of your data.
True!
True or False: A curved pattern in the scatterplot will always have a curved pattern in the residual plot.
True! A curved pattern in the scatterplot will always have a curved pattern in the residual plot.
What do we use to describe the histogram constructed from all the \bar{x}'s from all possible samples of the same size from a population?
We always describe histograms using shape, center and spread. The correct answer is: Shape, center and spread
The statistical definition of probability says that probability is based on the fraction of how many times an outcome occurs divided by the number of trials. On the basis of this, why is the probability of obtaining a "1'' when tossing a die is 1/6? Select one: a. The fraction of "1's'' gets closer to 1/6 as more and more tosses are made. b. A die has six sides of which a "1'' is one of the six sides. c. The odds of obtaining a "1'' are 6 to 1. d. 1/6 works in all the probability computations.
We defined probability in question 5 as the fraction of times the outcome occurs in many, many trials of a random phenomenon. On the basis of this definition, the probability is one-sixth because the fraction of "1's'' gets closer and closer to one-sixth as more and more trials are made. The correct answer is: The fraction of "1's'' gets closer to 1/6 as more and more tosses are made.
Assume that the slope of the regression line used to predict the husband's height (Y) from the wife's height (X) in young couples is 2.03. What is the correct interpretation of this slope? Select one: a. We expect that for every one inch increase in heights of wives, husband's heights increase by 2.03 inches , on average. b. In young couples, 2.03% of the height of husbands can be explained by the height of the wives. c. When the height of the wife in a young couple is 0 inches we expect that her husband will 2.03 inches tall. d. We expect that for every one inch increase in the height of a wife in a young couple a husband's height will always increase 2.03 inches.
We expect that for every one inch increase in the height of wives, husband's heights increase by 2.03 inches , on average. The correct answer is: We expect that for every one inch increase in heights of wives, husband's heights increase by 2.03 inches , on average.
We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and and the standard deviations sx and sy of the two variables, and their correlation r. The least-squares regression line is the line ÿ = a + bx with slope b = r (sy/sx) and intercept a = ÿ - b(x-hat)
We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and and the standard deviations sx and sy of the two variables, and their correlation r. The least-squares regression line is the line ÿ = a + bx with slope b = r (sy/sx) and intercept a = ÿ - b(x-hat)
Scores on the National Assessment of Educational Progress 12th-grade mathematics test for the year 2000 are approximately Normal with mean 300 points (out of a possible 500 points) and standard deviation 35 points. We want to compute the probability that a randomly selected 12th-grade student scores below 310. Why can we use the standard Normal table to find the probability? Select one: a. Because we want to find the probability on the score of a randomly selected student and scores have an approximately Normal distribution. b. Because we want to find the probability on a sample mean which has a Normal distribution since scores are approximately Normally distributed. c. Because we want to find the probability on a sample mean from a large random sample which has an approximately Normal distribution according to the Central Limit Theorem. d. We actually cannot compute the probability on the score using the standard Normal table since the shape of the distribution of scores is not Normal.
We want to compute the probability on an individual student's score. Since these scores have an approximately Normal distribution, we can use the standard Normal table to find the probability. The correct answer is: Because we want to find the probability on the score of a randomly selected student and scores have an approximately Normal distribution.
True or false: Suppose we take all possible samples of the same size from a population and for each sample, we compute \bar{x}. The standard deviation of these \bar{x} values will be less than the standard deviation of the population from which the samples were taken.
We'll soon learn that the standard deviation of all possible \bar{x}-values from all possible samples is \dfrac{\sigma}{\sqrt{n}} where \sigma is the standard deviation of the population. Thus, the standard deviation of the \bar{x}-values from all possible samples of size n > 1 is less than the standard deviation of the population. The correct answer is: True
Suppose you have played a game many, many times---winning sometimes and losing sometimes. Can you use the results of playing the game to predict with certainty whether you will win the game on the next try?
When a random phenomenon occurs only once, you cannot predict the outcome. It wouldn't be a random phenomenon if we could predict the outcome. Since playing the game is a random phenomenon, we cannot predict with certainty the outcome. The correct answer is: no
Suppose the y-intercept of the regression equation is -22.000. What does this y-intercept mean in context of the problem?
When the average size of a house is 0 square feet we would expect this house to sell for -22.000 thousands of dollars.
True or false: When taking random samples of size n from the same population, the value of a sample statistic does not vary (i.e., change) from one sample to the next.
When we take random samples of size n from the same population, the value of the sample statistic varies from sample to sample. Conversely, the value of the population parameter stays the same since it is a population characteristic, not a sample characteristic. The correct answer is: False
True or false: The statistical model for a straight line is {y hat} = a + bx.
While the traditional mathematical model is y = mx + b where m = slope and b = y-intercept, statisticians switch slope times x with the y-intercept in the equation so that additional explanatory variables (x's) may be added. The correct answer is: True
The formula for a regression line is what?
Y' = bX + A Where Y' is the predicted score, b is the slope of the line, and A is the Y intercept. The equation for the line in Figure 2 is
Can you predict how many calories a child consumes based on how long they remain at the lunch table? In the scatterplot below data on a sample of 20 toddlers observed over several months at a nursery school are plotted. "Time" is the average number of minutes a child spent at a table when lunch was served. "Calories" is the average number of calories consumed by the child during lunch. The least squares regression equation is: calories = 560.65125-3.0770733 time (i.e., \hat{y} = 560.65-3.077x). Interpret the y-intercept in context even though it DOES NOT make sense. a. On average, a child consumes 560.65 calories during each lunch period. b. Children who spends no time at the lunch table consumes 560.65 calories on average. c. Children spend an average of 3.077 hours at the table during lunch. d. Children who consumes zero calories spends an average of 3.077 hours at the table during lunch. True or false: Referring to the equation given in question 6, we can say that the average number of calories consumed decreases as time increases. Referring to the equation given above in question 6, interpret the slope in context. Select one: a. For every additional minute a child spent at the table during lunch, the average number of calories consumed decreased by 3.077. b. For every additional minute a child spent at the table during lunch, the average number of calories consumed increased by 3.077. c. For every additional calorie consumed by a child during lunch, the child spent an average of 3.077 more minutes at the table. d. For every additional calorie consumed by a child during lunch, the child spent an average of 3.077 fewer minutes at the table. e. Cannot be determined from information given.
Y-intercept tells us the value of y when x equals zero. The y-intercept is 560.65 calories. Since x is the average time spent at the table, y-intercept tells us that the number of calories consumed on average by children who spent no time at the table us 560.65 calories. And this does not make sense. It's because this is an extrapolation. No data were collected for x near zero. The correct answer is: Children who spends no time at the lunch table consumes 560.65 calories on average. Since slope is negative, as x increases, y decreases. "x" is "time" and "y" is "calories." Hence, (average) number of calories (consumed) decreases as time increases. The correct answer is: True Slope tells us the average increase in y for every one unit increase in x. A one unit increase in x is an additional minute spent at the table during lunch. Slope is -3.077. Since slope is negative, y decreases as x increases. Our interpretation of slope is thus: for every additional minute a child spends at the table during lunch, the number of calories consumed decreases by 3.077 on average. The correct answer is: For every additional minute a child spent at the table during lunch, the average number of calories consumed decreased by 3.077.
What is the symbol for sample mean?
\bar{x} is the symbol we use to denote sample mean. It represents a statistic. The correct answer is: \bar{x}
The amount of (X)time a clay pot is in a kiln can be used to predict the (Y)number of impurities in the pot. Suppose the average time a clay pot is in a kiln is 32 minutes and the average number of impurities in clay pots is 10. Also suppose that the slope of a regression equation using the time a clay plot is in a kiln to estimate the number of impurities in the pot is b = 0.60. What is the value of the y-intercept, a?
a = ÿ - b(x-hat) a = 10 - .60(32)
What is the range of possible values for r?
a. 0.0 ≤ r b. 0.0 ≤ r ≤ +1.0 c. -1.0 ≤ r ≤ +1.0 d. r ≤ +1.0 e. r can take any value on the real number line.
Which one of the following statements about r is NOT correct? a. The value of r can be misleading if the relationship between x and y is non-linear. b. r is measured on a scale from 0 to +1. c. Computing r is only valid for quantitative variables. d. r has no unit of measure. e. The sign on r gives the direction of the relationship between x and y.
r is measured on a scale from -1 to +1, not 0 to +1. The correct answer is: r is measured on a scale from 0 to +1.
r = -0.988434 What percent of the observed variation in farm population is accounted for by linear change over time?
r2 tells us the percent of observed variation in farm population that is accounted for by linear change over time. r2 = (0.988434)2 = 0.977, so r2 = 97.7%. The correct answer is: 97.7%
(Review question) Data were recorded on a simple random sample of elementary school children. Researchers reported a correlation of r = -0.79 between the time spent watching TV (x) and the time spent doing homework (y) for a single day. Which of the following is a correct conclusion that can be drawn from this information? Select one: a. If a student watches too much TV, then they are certain to fail out of school. b. For every hour that students spend watching TV, they only spend 0.79 hours doing homework. c. (-0.79)^2 or 62% of the variation in time spent doing homework can be explained by the time spent watching TV. d. Limiting the time that a student spends watching TV will cause an increase in the average amount of time spent on homework.
r2= (-0.79)2 = 0.62 and we interpret r2 as the percentage of total variation in our y variable (time spent doinghomework) that can be explained by our x variable (time spent watching TV). The correct answer is: (-0.79)^2 or 62% of the variation in time spent doing homework can be explained by the time spent watching TV.
What does the sum of squared residuals measure? a. Total variation in y (The variability of the y's about \bar{y}.) b. Explained variation (The variability of the predicted y's (i.e., \hat{y}'s) about \bar{y}.) c. Unexplained variation (The variability of the y's about the regression line.)
residual = y - \hat{y}. Thus, residuals measure the deviations of the y's about the regression line. So, sum of squared residuals measures variability of the y's about the regression line. This is unexplained variation. (Why else would we call "y - \hat{y}" the residual or error?) The correct answer is: Unexplained variation (The variability of the y's about the regression line.)
For a biology project, you measure the weight in grams and the tail length in millimeters (mm) of a group of mice. The equation of the least-squares line for predicting tail length from weight is predicted tail length = 20 + 3 × weight What is the predicted tail length for a mouse weighing 18 grams? Referring to the previous question, suppose a mouse weighing 20 grams has a 78 mm tail. What is the residual for this mouse? Be sure to think about whether the y value of 78 is above the line (and thus has a positive residual) or below the line (and thus has a negative residual).
{predicted tail length} = 20 + 3 \times weight = 20 + 3 \times 18 = 20 + 54 = 74 The correct answer is: 74 mm \hat{y} = 20 + 3 \times weight = 20 + 3 \times 20 = 20 + 60 = 80.\text{residual} = y - \hat{y} = 78 - 80 = -2 The correct answer is: -2 mm