Stats Unit 4

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Analyze the residual plot below and identify​ which, if​ any, of the conditions for an adequate linear model is not met.

constant error variance (residuals too far away from line)

The linear coefficient is always a number between...

-1 and 1

Steps for computing the correlation coeff. (r)

1. Find X-bar, Y-bar, Sx, Sy 2. x value-xbar/Sx y value-ybar/Sy 3. Multiply each value from step 2 across 4. Find sum of multiplication products 5. sum/n-1=r *if r is close to -1 then the two variables are strongly neg. associated. Exact opposite if r is close to +1.

Steps for finding the least squares regression line:

1. First you need r, xbar, ybar, Sx, and Sy 2. Find slope using b1=r x Sy/Sx 3. Find intercept using b0=ybar-b1(xbar) *note that b1 was found in step 2 4. put into least squares regression line y^=b1x+b0 (no math required here)

How to check if a linear model is not appropriate:

1. If the graph is not randomly scattered (u-shaped, almost line) then it is NOT appropriate 2. When the residuals are too far away from the line 3. the presence of outliers

Before interpreting the y-intercept, what two questions must be asked? What if you answer no?

1. Is 0 a reasonable value for the explanatory variable? 2. Do any observations near x=0 exist in the data set? If the answer to either of those questions is no, then we do not interpret the y-intercept.

The linear correlation between violent crime rate and percentage of the population that has a cell phone is −0.918 for years since 1995. 1. Do you believe that increasing the percentage of the population that has a cell phone will decrease the violent crime​ rate? 2. What might be a lurking variable between percentage of the population with a cell phone and violent crime​ rate?

1. No 2. the economy

1. When the points are aligned in a straight line with positive slope, what is the value of the linear correlation coefficient? 2. When the points are aligned in a straight line with negative slope, what is the value of the linear correlation coefficient?

1. One 2. Negative one

Finding an equation that describes linear related data steps

1. Select 2 points on graph that form longest line 2. find slope 3. plug slope into equation y-y1=m(x-x1)

Compute the sum of squared residuals steps

1. Use given line and plug in x values 2. find residuals by given y-answer from step 1 3. Square residuals and find the sum

steps for conditional distribution

1. divide numbers in columns by that columns total 2. find the trend (ex: as ___ increases, ___ also increases)

y intercept steps

1. find slope 2. plug slope into formula and a plot point of your choice (x,y) 3. solve for b

What is meant by a conditional​ distribution?

A conditional distribution lists the relative frequency of each category of the response​ variable, given a specific value of the explanatory variable in a contingency table.

What is meant by a marginal​ distribution?

A marginal distribution is a frequency or relative frequency distribution of either the row or column variable in a contingency table.

What is a marginal distribution of a variable?

A marginal distribution of a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table.

What is a scatter diagram? How is it created?

A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual

What is an influential observation?

An influential observation significantly affects the least-squares regression line's slope and/or y-intercept.(It also affects the value of the correlation coefficient).

Define bivariate data.

Bivariate data is data in which two variables are measured on each individual.

List a combination of leverage and residual that indicates that an observation may be influential.

Case 1: low leverage and a large residual Case 2: high leverage and a small residual Case 3: high leverage and a large residual

What does this result​ mean? Does this suggest there is a linear relation between student task persistence and achievement​ score? Choose the best response below. ​Yes, since 0.712 is greater than the critical value for 30. What does this result​ mean?

Countries in which students answered a greater percentage of items in the background questionnaire tended to have higher mean scores on the exam.

Notice that two children are 26.75 inches tall. One has a head circumference of 17.3​ inches; the other has a head circumference of 17.5 inches. How can this​ be?

For children who are 26.75 inches​ tall, head circumference varies.

The coefficient of determination is a number between 0 and 1, inclusive. That is, 0 ≤ R2 ≤ 1. What does it mean if R2 = 0? What does it mean if R2 = 1?

If R2=0 then the least-squares regression line has no explanatory value. If R2=1 then the least-squares regression line explains 100% of the variation in the response variable.

The slope of the least-squares regression line from Example 3 is 3.1661 yards per mph. List two interpretations of the slope that are acceptable.

If club-head speed increases by 1 mph, the distance the golf ball will travel increases by 3.1661 yards, on average. or If club-head speed increases by 1 mph, the expected distance the golf ball will travel increases by 3.1661 yards.

Explain how to determine if two variables are positively associated.

If two variables are positively related then when one goes up, the other goes up.

What is the residual for an observation?

It is the error found when calculating the difference between the observed and predicted values of y Residual = Observed y - Predicted y -> (=y-y^)

Is the linear correlation coefficient resistant?

No

Does a correlation coefficient close to 0 imply that there is no relation? Why or why not?

No, a linear correlation coefficient near 0 does not mean there is no relation between the two variables, it just means there is no LINEAR relation between the two variables.

Define the coefficient of determination, R2.

R2 measures the proportion of total variation in the response variable that is explained by the least-squares regression line.

What does Sx and Sy mean?

Sample (n-1) standard dev of x and y

Explain how to find the coefficient of determination, R2, for the least-squares regression model y^=b1x+b0

Square the linear coefficient. That is, R2=r2.

List the three steps for testing for a linear relation

Step 1: Use absolute value of r and find critical table (given) Step 2: match n value to critical chart (n=8 find 8 on chart) Step 3: If r is greater than the crit value, then a pos linear relation exists. NOTE: the absolute value of r keeps the neg, so r will stay neg and can be smaller than crit value.

As your line gets closer and closer to the regression line, what happens to the SSE?

The SSE decreases as my line gets closer to the regression line. Note that it is not possible to find a line whose SSE is less than the regression line.

The closer the observed y's are to the regression line (the predicted y's), how is R2 affected?

The closer the observed y's are to the regression line, the larger R2 will be.

What is the difference between a strong correlation and a weak one?

The closer to -1 or 1 r is, the stronger the correlation will be.

If the pediatrician wants to use height to predict head​ circumference, determine which variable is the explanatory variable and which is the response variable.

The explanatory variable is height and the response variable is head circumference.

Define the linear correlation coefficient

The linear correlation coefficient is a measure of the strength and direction of the linear relation between two quantitative variables.

Explain how to determine which variable is the explanatory variable and which variable is the response variable

The researcher must determine which variable plays the role of explanatory variable based on the question they want answered. For example, if the researcher wants to predict SAT scores based on high school GPA, then high school GPA is the explanatory variable.

Explain why correlations should always be reported with scatter diagrams.

The scatter diagram is needed to see if the correlation coefficient is being affected by the presence of outliers.

Which of the following is true of the​ least-squares regression line y=b1x+b0​?

The sign of the linear correlation​ coefficient, r, and the sign of the slope of the​ least-squares regression​ line, b1​, are the same. The​ least-squares regression line minimizes the sum of squared residuals. The​ least-squares regression line always contains the point x,y. The​ least-squares regression line always contains the point​ (0,0). The predicted value of​ y, y^​, is an estimate of the mean value of the response variable for that particular value of the explanatory variable.

What does the least-squares regression line minimize?

The sum of the squared residuals This means that the sum of the squared residuals is smaller for the least-squares line than for any other line that may describe the relation between the two variables.

What does it mean if r^2= 0.882?

Then 88.2% of the variation is explained by the least-squares regression model and 11.8% of the variation is explained by other factors

If a plot of the residuals against the explanatory variable shows a discernable pattern, what does this say about the explanatory and response variables?

Then the response and explanatory variables may not be linearly related.

When the correlation coefficient indicates no linear relation between the explanatory and response variables and the scatter diagram indicates no relation between the variables, how do we find a predicted value for the response variable?

Then we use the mean value of the response variable as the predicted value so that y^=y-bar

What does it mean to say that we should not use the regression model to make predictions outside the scope of the model?

This means that we should not use the regression model to make predictions for values of the explanatory variable that are much larger or much smaller than those observed. For example, we should not use the line in Example 3 to predict distance when the club-head speed is 140 mph. The highest observed club-head speed is 105 mph. The linear relation between distance and club-head speed might not continue.

Why is it important for the residuals to have constant error variance?

To satisfy the regression assumptions and be able to trust the results

Explain the meaning of total deviation, explained deviation, and unexplained deviation.

Total deviation: (y - y-bar) Explained deviation: (y^ - y-bar) Unexplained deviation: (y - y^) predicted value= y^

Is there another way two variables can be correlated without a causal relationship existing?

Yes, through a lurking variable.

What is a conditional distribution?

a conditional distribution lists the relative frequency of each category of the response variable, given a specific value of the explanatory variable in the contingency table.

What is a contingency table?

a two-way table, as it relates two categories (qualitative and quantitative) of data.

Match the linear correlation coefficient to the scatter diagram. The scales on the​ x- and​ y-axis are the same for each scatter diagram. a. r= -0.049 b= -1 c= -0.969 (a) Scatter diagram: ​(b) Scatter diagram: ​(c) Scatter diagram:

a=2 b=3 c=1

Bubba's data point pulled the line towards the black point at the top. Therefore it is...

an influential observation.

The smaller the residual, the....

better the prediction

A student at a junior college conducted a survey of 20 randomly selected​ full-time students to determine the relation between the number of hours of video game playing each​ week, x, and​ grade-point average, y. She found that a linear relation exists between the two variables. The​ least-squares regression line that describes this relation is y^=−0.0584x+2.9424. Interpret the slope. For each additional hour that a student spends playing video games in a​ week, the​ grade-point average will _____ by _____ on average If​ appropriate, interpret the​ y-intercept.

decrease by 0.0584 The​ grade-point average of a student who does not play video games is 2.9424.

what is the the Simpson's paradox?

describes a situation in which an association between two variables inverts or goes away when a third variable is introduced to the analysis. ^For example: we noticed in example 6 that once the lurking variable was accounted for, women were in fact more likely to be accepted in the programs of study than men. The reason why it appeared otherwise was due to the lurking variable of a higher # of male applicants in programs of study that had a greater acceptance rate.

In a scatter​ diagram, the ______ variable is plotted on the horizontal axis and the ______ variable is plotted on the vertical axis.

explanatory, response

Comparing variables in the gender bias study, we calculate by...

finding the proportion of accepted applicants. Accepted divided by total of that row of accepted + not accepted

Note: Throughout the course, we agree to round the slope and y-intercept to...

four decimal places

What is the least-squares regression line?

it is the smallest possible sum if you take each residual and square it.

How do you draw a bar graph for a conditional​ distribution?

make sure the height of each bar matches the relative freq for that column

Will the following variables have positive​ correlation, negative​ correlation, or no​ correlation? shoe size and IQ

no correlation

Analyze the residual plot below and identify​ which, if​ any, of the conditions for an adequate linear model is not met.

patterned residuals

The closer r is to +1 the...

stronger the evidence of a pos relation between 2 variables

SSE stands for...

sum of squared residuals

An increase in Lyme disease does not cause an increase in drowning deaths. The....are likely lurking variables.

temperature and time of year

What do we call the relative vertical position of an observation? What do we call the relative horizontal position of an observation?

the relative vertical position of an observation is called residuals the relative horizontal position of an observation is called leverage.

How to find variance

the square of the standard deviation

steps for determining frequency marginal distributions

total rows and columns *make sure they both add up to be the same number

steps for determining RELATIVE frequency marginal distributions

total rows and columns divide each total by the same number they added up to

Total deviation=​_______deviation+​_______deviation

unexplained + explained

If your goal is to have one stock go up when the other goes​ down, you should select based on what?

whatever stocks linear correlation coefficient is closests to -1


Set pelajaran terkait

*Psych Review Test 1*, **Psych Unit 5 quizes for Test #2** P, **Psych Unit 6 Quizes for Test #2** P, **Psych quiz #7** P, Psych Unit 8 Tests P, P Psych Unit 9 Quiz questions, P Psych Unit 10 quiz ques, P Psych Unit 11 Quizes, P Psych Unit 12 Question...

View Set

ATI Learning System RN 3.0: Med Surg Oncology Practice Test

View Set

Essentials of Human Anatomy and Physiology(11) Activity Lab 4

View Set

English Usage on Commonly Confused Words (2nd half)

View Set

Vocabulary Workshop Level H: Unit 10

View Set

NCLEX: Fluids, Electrolytes, Blood Products

View Set

Management Information Systems Chapter 7 Information Security (WileyPlus) (IS 312)

View Set

Fun Exam 3 Practice Test/Questions/Study

View Set

Key vocab chromosomes and meiosis

View Set

Chapter 1: Database Systems Vocab

View Set

Chapter 7. Communicating in Social and Professional Relationships

View Set