WGU C955 - Module 6: Correlation & Regression

Ace your homework & exams now with Quizwiz!

{{ Scatterplot of widget production levels and incidence of defects. The points decrease on the y-axis as the value in the x-axis increase. The points can be connected in a nearly straight line. }} Which of the following is the best estimate of the correlation coefficient for the above scatterplot? a) .90 b) .5 c) −.95 d) −.3

-.95

Which correlation coefficient suggests the strongest correlation? a −0.9 b −0.3 c 0.1 d 0.8

-0.9

Consider the following equation, y+8x=−2 . What is the y -intercept of a line with this equation? a −2 b 2 c 8 d −8

-2

Consider the following equation, y+9x=−4 . What is the y -intercept of a line with this equation? a) −9 b) −4 c) 9 d) 4

-4

Extrapolation is always appropriate. True or False? a) True b) False

False

Using a line of best fit in slope-intercept form, y=mx+b , what must be true if there is a negative correlation? a m must be >0 . b m must be <0 . c Both m and b must be <0 . d Both m and b must have the opposite signs.

m must be <0

What must be true about the dots on a scatterplot if there is no correlation? a The dots are far away from the line of best fit. b The dots form no recognizable linear pattern. c The dots are evenly spaced around the graph. d The dots form a non-linear pattern.

The dots form no recognizable linear pattern.

{{ Scatterplot of Free Media vs. Quarterly Revenue. The linear equation for this scatterplot is y equals six point four nine five seven times x plus eighteen point five one four. y=6.4957x + 18.514 }} company is measuring the effect of free news media on its quarterly revenue. Using the scatterplot Free Media vs. Quarterly Revenue, what would the total quarterly revenue be for a year that had 21 news articles about Zenish Corp.? Round your answer to the nearest whole number. a) $154,000 b) $155,000 c) $153,000 d) $156,000

$155,000

The following regression equation estimates total profit ($, measured in 1000s) based on x units produced (in 1000s) with data that was gathered from x=5 thousand units to x=35 thousand units . y=28.07+6.49x Determine the total profit (round to the nearest thousand) for a production level ( x ) of 25 thousand units. Round your answer to the nearest whole number. a $190,000 b $864,000 c $708,000 d $59,000

$190,000

Consider the following equation y=7.5x+9.3 . What is the slope of a line with this equation? a) 9.3 b) 7.5 c) −9.3 d) −7.5

7.5

Using the scatterplot below, estimate a possible correlation coefficient. {{ Scatterplot displaying data points distributed along a line from lower right to upper left. The points fall very close to the line. }} a 0.9 b 0.6 c −0.85 d −1

-0.85

Which correlation coefficient suggests the weakest correlation? a −0.9 b −0.3 c 0.1 d 1

0.1

Which of the following correlation coefficients describes a strong, positive correlation? a −0.99 b 0.69 c 0.95 d 0.21

0.95

{{ Scatterplot showing two data points, one at open parens two, three point seven five close parens and open parens four, six point five close parens. }} Due to budget cuts, a computer scientist had her funding reduced for a research project. Her support only enabled her to collect 2 data points, (2,3.26) and (4,6.52) . The data points are plotted above. What is your estimate of the correlation coefficient? a) −1 b) Zero c) 1 d) Cannot be determined

1

Consider the following equation, y=−3.2x+2.8 . What is the y -intercept for this equation? a −3.2 b 3.2 c 2.8 d −2.8

2.8

A car company wanted to study the relationship between the weight of the car and the car's average gas mileage, so they collected data from several of their cars and wrote down their weights and average gas mileage and plotted this data on the scatterplot below. The least squares regression line for the data is y=−0.0084x+48.8 , where x is the weight of the car in pounds and y is the average gas mileage (in miles per gallon). What is the predicted gas mileage for a car that weighs 2500 pounds? {{ Scatterplot illustrating Average Miles per Gallon and Weight of car in pounds. }} a) 29.18 miles per gallon b) 26.4 miles per gallon c) 27.8 miles per gallon d) 69.8 miles per gallon

27.8 miles per gallon

A restaurant owner wants to see if she can use low temperatures to boost soup sales at her restaurant. To study a possible relationship between temperature and soup sales, she collects data throughout the year on the temperature of a given day (ranging between 20 degrees F and 90 degrees F) and the amount of soup sold that day. She performs a linear regression and comes up with a least squares regression line of y=−1.64x+176.6 with r=−0.89 where x is the temperature (in degrees F) and y is the number of daily soup sales. How much soup should she expect to sell on a day that is 50 degrees F? Round to the nearest integer. a 177 orders of soup b 95 orders of soup c 259 orders of soup d 101 orders of soup

95 orders of soup

When linear regression is used to show there is a linear association between two variables, we know the relationship is: a A causation b A correlation c Neither A nor B d Both A and B

A correlation

What is the most appropriate definition of a scatterplot? a A graph that uses dots to demonstrate relationships between two categorical variables. b A graph where lines are shown to represent positive or negative trends. c A graph where the explanatory and response variables are plotted as ordered pairs. d A graph where a positive causation is always represented with dots forming a straight line.

A graph where the explanatory and response variables are plotted as ordered pairs.

Inappropriate sampling can occur during regression analysis. What is (are) an example(s) of inappropriate sampling? a Non-proportional sampling b Small sample sizes c Non-random exclusion of a population subset in the sample d A, B, and C

A, B, and C

What is (are) a potential problem(s) that can occur when attempting to use regression analysis? a Extrapolation b Lurking variables c Inappropriate sampling d A, B, and C

A, B, and C

{{ Scatterplot of advertising dollars spent vs. percent increase in sales. The linear equation for this scatterplot is y equals zero point four five four six times x plus sixty-four and one hundred thirty-three thousandths. }} Using the scatterplot above, what can we expect the percent increase in sales to be for an advertising expenditure of $ 165 thousand dollars? Round your answer to the nearest whole number. a Around 121 % b Around 139 % c Around 161 % d Around 198 %

Around 139 %

{{ Scatterplot of distance from city center vs. rent. The linear equation for this scatterplot is y equals negative one hundred five ten thousandths times x plus ten and three hundred eighty one thousandths. }} Using the scatterplot Distance from City Center vs. Monthly Rent, how far can we expect an apartment to be from the city center if the monthly rent is $ 980 ? Round your answer to the nearest mile. a Around 54 miles b Around 55 miles c Around 56 miles d Around 57 miles

Around 55 miles

A relationship between two or more variables is known as a(n) ________. a Causation b Association c Cause and effect d Correlation

Association

A study is done relating computer programming aptitude to typing speed. In this case, what type of variable would the amount of computer programming experience be considered, if it was not measured in the study? a Independent variable b Confounding variable c Dependent variable d Lurking variable

Lurking variable

Please select the correct definition for least squares. a) A method that minimizes the squared distances of data points from a line that captures the trend in paired data. b) The criteria by which the best-fit line is selected for paired data c) Both A & B d) None of the above

Both A & B

The b in y=mx+b is what? a) The point at which the line crosses the y-axis b) The value of y when x=0. c) Neither A nor B d) Both A and B

Both A and B

{{ Percentage of people age 12 and older who watched a movie in the past month after viewing an ad for it targeted to their age group. For all ages, seven point five percent of the population watched a movie, for men, five point five percent watched a movie, nine point five percent of women watched a movie. For ages twelve to seventeen, five point five percent of the population watched a movie, four percent of men watched a movie and seven point five percent of women watched a movie. For ages eighteen to thirty-nine, seven point five percent of the population, five point five percent of men were depressed and nine percent of women watched a movie. For people between the ages of forty and fifty-nine, nine point five percent of the population watched a movie. Seven percent of men were depressed and twelve percent of women in that age bracket watched a movie. Of the people who were sixty years or over, five percent of the population watched a movie, three and a half percent of men watched a movie and seven percent of women watched a movie. }} You work in marketing at an independent movie studio and are measuring the effect of targeted ads by gender and age. Based on the results shown, what do you think the relationship is between age and ad effectiveness? a) It is positively correlated b) It is negatively correlated c) No correlation d) Can't tell from this display

Can't tell from this display

A variable not included in the study that is related to the measured variables in a study is called a ____________. a Independent variable b Confounding variable c Dependent variable d Lurking variable

Lurking variable

Using the scatterplot below, what type of correlation is suggested? {{ Scatterplot displaying data points distributed along a line from upper left to lower right. The points fall relatively close to the line. }} a No correlation b Strong positive c Moderate negative d Moderate positive

Moderate negative

Using the scatterplot below, what type of correlation is suggested? Scatterplot displaying data points distributed from lower left to upper right. Points are relatively close to one another. a No correlation b Weak negative positive c Moderate positive d Strong negative

Moderate positive

{{ Scatterplot showing the relationship between productivity and sick days. The points move loosely up and to the right. }} To better understand employee burn-out, Kinetic Inc. is looking at the relationship between the productivity of its sales force (measured in the number of cold calls per work day, averaged across the work year) and the number of sick days taken in a year. The scatterplot shows the results of their data gathered from company records. What trend do you see? a) Employees who make more phone calls per day tend to take more sick days. b) There is no relationship between the average number of phone calls per day and sick days. c) As the number of sick days increases, the employee averages a lower number of phone calls. d) All of the above

Employees who make more phone calls per day tend to take more sick days.

Suppose there is a linear regression equation y=10+.5x where x is equal to money spent on advertising, measured in dollars, and y is equal to people visiting the company website. Which of the following is the correct interpretation of this slope? a) For every additional two dollars spent on advertising, another person clicks on the company website. b) For every .5 reduction in advertising dollars, another person does not click on the company website. c) For every 10 dollars spent on advertising, there is a corresponding .5 of a person not visiting the website. d) For every 10 dollars spent on advertisement, one additional person clicks on the company website.

For every additional two dollars spent on advertising, another person clicks on the company website.

A car company wanted to study the relationship between the weight of the car and the car's average gas mileage, so they collected data from several of their cars and wrote down their weights and average gas mileage and plotted this data on the scatterplot below. The least squares regression line for the data is y=−0.0084x+48.8 , where x is the weight of the car in pounds and y is the average gas mileage (in miles per gallon). What is the correct interpretation of the slope of the least squares regression line? {{ Scatterplot illustrating Average Miles per Gallon and Weight of car in pounds. The data trends downward from left to right. }} a) For every increase by 1 pound in a car's weight, there will be a corresponding increase in the car's miles per gallon by 0.0084. b) For every increase by 1 pound in a car's weight, there will be a corresponding decrease in the car's miles per gallon by 0.0084. c) For every increase by 1 pound in a car's weight, there will be a corresponding increase in the car's miles per gallon by 48.8. d) For every increase by 1 pound in a car's weight, there will be a corresponding decrease in the car's miles per gallon by 48.8.

For every increase by 1 pound in a car's weight, there will be a corresponding decrease in the car's miles per gallon by 0.0084.

A restaurant owner wants to see if she can use low temperatures to boost soup sales at her restaurant. To study a possible relationship between temperature and soup sales, she collects data throughout the year on the temperature of a given day (ranging between 20 degrees F and 90 degrees F) and the amount of soup sold that day. She performs a linear regression and comes up with a least squares regression line of y=−1.64x+176.6 with r=−0.89 where x is the temperature (in degrees F) and y is the number of daily soup sales. What is the correct interpretation of the slope of the regression line? a For every increase in one degree Fahrenheit, there is a corresponding increase of 1.64 sales of soup. b For every increase of one degree Fahrenheit, there is a corresponding increase of 176.6 sales of soup. c For every increase in one degree Fahrenheit, there is a corresponding decrease of 1.64 sales of soup. d For every increase of one degree Fahrenheit, there is a corresponding decrease of 176.6 sales of soup.

For every increase in one degree Fahrenheit, there is a corresponding decrease of 1.64 sales of soup.

Which of the following can help prevent Simpson's Paradox from occurring? a) Having the greatest number of subjects in the lowest performing trial. b) Having an equal number of subjects exposed to each of the treatments in each trial. c) Having the greatest number of subjects in the highest performing trial. d) Having each subject be exposed to each treatment in the trial.

Having an equal number of subjects exposed to each of the treatments in each trial.

Which of the following statements is always true? a If there is causation, there must also be association. b If there is association, there must also be causation. c Association is a stronger relationship than causation. d Cause and effect can never exist where there is an association.

If there is causation, there must also be association.

What technique is used to estimate the profit margin for a production level of 25 thousand units, if a line of best fit is created to estimate profit margin for production levels between 20 to 39 thousand units? a Interpolation b Linearization c Extrapolation d Internalization

Interpolation

What is the process used to create the equation for the line of best fit? a Completing the square b Least squares estimation c Fitting the line to the curve d Linear approximation

Least squares estimation

A college wants to study if there is a relationship between the health of students enrolled at the university and the number of credit hours they are enrolled in. They pull the student numbers of all students who used the gym in the last month and randomly selected 200 . They then asked them how many hours he/she exercised in the last week and how many credit hours he/she is enrolled in and based on the data announced that the fewer credit hours a student is enrolled in at the university, the more hours per week the student exercises. Is this a valid conclusion? a) No, the sample is biased because the sample was too small. b) No, the sample is biased because only students at one university were questioned. c) No, the sampling frame in this study introduced bias because it is not representative of the population. d) Yes, the study was conducted in a fundamentally random way.

No, the sampling frame in this study introduced bias because it is not representative of the population.

A researcher wants to know if there was a relationship between executive general managers' income and the number of college credits earned. To answer this question, the researcher used a cluster sample to randomly choose 10 states across the United States; then 10 random counties were chosen; then within those 100 total counties, 4 businesses were randomly chosen; all executive general managers working within these 400 total businesses, were then invited to participate in the study. Based on the information given, is there any potential bias in this study? a Yes, because the sampling frame does not match the intended population of the question b Yes, because the sampling method will not give a representative sample. c Yes, because a voluntary sample should have been used. d No, there will likely be no bias in this study.

No, there will likely be no bias in this study.

A study is done to determine whether or not age determines salary level. The subject also records their years of experience and level of education. What is the response variable in this study? a Age b Education level c Years of experience d Salary

Salary

If a trend appears in a large sample of data, the trend may not be replicated if the sample is broken up into smaller subsets. What is this effect known as? a Foster's Theorem b Bayes' Theorem c Simpson's Paradox d Consistency Construct

Simpson's Paradox

In order to study the relationship between customer satisfaction and health plan type, an insurance company surveyed 100 members from each of its 5 health plans offered. What sampling method did they employ? a) Stratified b) Cluster c) Voluntary d) Simple Random

Stratified

Which of the following best describes a scatterplot that has a correlation coefficient of −0.3 ? a) The points loosely follow a line that is moving down and to the right. b) The points closely follow a line that is moving down and to the right. c) The points roughly move up and to the right, but do not follow that close to a linear pattern. d) None of the above.

The points loosely follow a line that is moving down and to the right.

What does a weak, negative correlation look like on a scatterplot? a The points follow closely along a line that moves down and to the right. b The points loosely follow a line that moves down and to the right. c The points follow closely along a line that moves up and to the right. d None of the above.

The points loosely follow a line that moves down and to the right.

The line of best fit is also known as: a The extrapolation line b The regression line c The interpolation line d None of the above

The regression line

What does it mean for a result or relationship to be statistically significant? a The relationship is not caused by mere chance. b The relationship is caused by chance. c Your hypothesis test has failed. d Your significance level is not high enough.

The relationship is not caused by mere chance.

Which of the following statements is not a causal relationship? a The higher the temperature in the oven, the faster the food will cook. b The more miles a car is driven, the more fuel is consumed. c The time of day determines when the sun will rise. d The faster a runner goes, the shorter time it will take to complete the race.

The time of day determines when the sun will rise.

When constructing a line of best fit, what must be minimized? a The vertical distances between that line and the data points. b The length of the line. c The correlation coefficient. d The p-value

The vertical distances between that line and the data points.

A researcher conducts an experimental study and finds a correlation between salary and levels of experience. The correlation coefficient was r=.75 with a regression equation of y=515x+17500 . What can you say about the relationship between these variables? a There is an association between these variables b There is a causation between these variables. c There is no relationship between these variables d There is not enough information to determine the relationship between the variables.

There is a causation between these variables

{{ Scatterplot with points following a rough line trending up and to the right indicating a positive association between the length of the assembly line and the average number of defects per month }} What can be inferred from the data in the scatterplot Assembly Line Length vs. Number of Defects? a) There is a causal relationship between the length of the assembly line and the number of defects. b) There is a negative association between the length of the assembly line and the number of defects. c) There is no association between the length of the assembly line and the number of defects. d) There is a positive association between the length of the assembly line and the number of defects.

There is a positive association between the length of the assembly line and the number of defects.

What does a strong positive correlation between two variables suggest? a The explanatory variable is increasing and the response variable is decreasing b There is an association between the variables c There is a causation between the variables d Cannot determine

There is an association between the variables

A researcher conducts an observational study and finds a correlation between managers' income and the number of college credits earned. The correlation coefficient was r=.85 with a regression equation of y=515x+12000. What can you say about the relationship between these variables? a There is an association between these variables b There is a causation between these variables. c There is no relationship between these variables. d There is not enough information to determine the relationship between the variables.

There is an association between these variables

What should be true about the dots on a scatterplot once a line of best fit is drawn on the graph? a The y should form a perfect line. b The distance each dot is from the line is equal. c The dot furthest from the line is always an outlier. d There should be approximately the same number of dots above and below the line.

There should be approximately the same number of dots above and below the line.

In the study of the effects of carbon emissions on global warming, the quantity of carbon released in the atmosphere is an explanatory variable for changes in global temperature (response variable). True or False? a) True b) False

True

A researcher wants to know if there was a relationship between student age and desire to complete a college degree. To answer this question, the researcher used a local community college as the sampling frame and then used stratified sampling to get a sample of students from 18 to 80 years old. Based on the information given, is there any potential bias in this study? a Yes, because the sampling frame does not match the intended population of the question. b Yes, because the sampling method will not give a representative sample. c Yes, because a voluntary sample should have been used. d No, there will likely be no bias in this study.

Yes, because the sampling frame does not match the intended population of the question.

A researcher studied the effect of epidural steroid injections for chronic lumbar pain. Patients eligible for the study were randomly assigned to two groups. One group received an epidural injection with the steroid and a local anesthetic. The second group received an epidural injection with the local anesthetic. It was found that the epidural injection of the steroid produced a decrease in pain level among the patients. Can the researcher claim there is a causal relationship between the steroid injection and decreased pain level? a) No, because this was an observational study so only an association is established. b) No, because the researcher controlled for all lurking variables. c) Yes, because this is a controlled experiment. d) Yes, because the researcher did not control for all lurking variables.

Yes, because this is a controlled experiment.

The following table shows the performance of two airlines in two different cities. Is there a Simpson's Paradox occurring? Airline A Delayed Flights Airline A % Delayed Airline B Delayed Flights Airline B % Delayed Total Delayed Total % Delayed Los Angeles 62/559 11.10% 460/3450 13.30% 522/4009 13% San Diego 46/396 11.60% 30/221 13.60% 76/617 12.30% a No, because an equal number of flights departed from each city. b No, because it's clear from the data that Los Angeles has a higher rate of delayed flights. c Yes, because the delayed flight rates were different for each city. d Yes, because while Los Angeles has a greater overal rate of delayed flights, San Diego has a greater rate of delayed flights when looking at the individual airlines.

Yes, because while Los Angeles has a greater overal rate of delayed flights, San Diego has a greater rate of delayed flights when looking at the individual airlines.

Using a line of best fit in slope-intercept form, y=mx+b , what must be true if there is a positive correlation? a m must be >0 . b b must be >0 . c Both m and b must be >0 . d Both m and b must have the same sign.

m must be >0.


Related study sets

Chapter 9: Production and Operations Management (Smart book)

View Set

Chapter 1 History of Microbiology (quiz #1)

View Set

Catcher in the Rye Study Guide - 10 Honors English WIP

View Set

Ch 11: Designing Organizational Structure

View Set

Chapter 13- A House Divided, 1840-1861

View Set

Home (Toni Morrison): Chapter 8-17 Quiz

View Set