STP 420 Exam 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Least-squares regression line

The line that minimizes the sum of the squares of the vertical distances of the data points from the line

Density curve

a curve that is always on or above the horizontal axis and has area exactly 1 underneath it

Label

a special variable used in some data sets to distinguish the different cases

Rules for Means

(a and b are fixed numbers, X and Y random variables) 1) µ(a+bX) = a + bµX 2) µ(X+Y) = µX + µY 3) µ(X-Y) = µX - µY

How does adding each observation by a positive or negative number a change the measures of center (mean, median) or spread (IQR, s)?

adds a to measures of center and quartiles, but does not change measures of spread (IQR, s)

Correlation strength

0.75-1- very strong 0.50-0.75- moderate strong 0.25- 0.50- moderate weak 0-0.25- very weak

Goals of Re-expression

1) Make the distribution more symmetric 2) Make the spread of several groups more alike 3) Make the form of a scatterplot more nearly linear 4) Make scatter in a scatterplot spread out evenly

Properties of Standard Deviation

1) S measures spread about the mean and should be used only when the mean is the measure of center. 2) s = 0 only when all observations have the same value and there is no spread. Otherwise, s > 0 . 3) s is not resistant to outliers. 4) s has the same units of measurement as the original observations

Ladder of Powers

Collection of re-expressions

Normal quantile plot

Data points are ranked and the percentile ranks are converted to z-scores. Z-scores- x-axis; Data- y-axis

________ is resistant to outliers.

Median

Probability Rules

- 0 ≤P(A)≤1 - P(S) = 1 - P(A or B) = P(A) + P(B) - P(Ac) = 1 - P(A)

T/F: Linear transformations change measures of center and spread

True

If the distribution is indeed Normal, a Normal quantile plot will show _________

- a straight line (good match between the data and a Normal distribution) - Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot

Lurking variable

a 3rd variable that may affect relationship between explanatory and response

Correlation Properties

-correlation between -1 and 1 -correlation of x and y = correlation of y and x -no units -correlation is not affected by changes in the center or scale of either variable -correlation measures the strength of linear association between 2 variables -correlation is sensitive to outliers

Rules for Variances

1) σ^2 a+bX = b^2σ^2X. 2) σ^2 X+Y = σ^2X + σ^2Y, σ^2X-Y = σ^2X + σ^2Y. 3) σ^2X+Y = σ^2X + σ^2Y + 2ρσXσY σ^2X-Y = σ^2X + σ^2Y ̶ 2ρσXσY

Facts about LSRL

1) change of one std dev in x corresponds to a change in r std devs in y 2) LSRL always passes through (x-bar, y-bar) 3) Distinction between explanatory and response variables is essential

Quantitative variable distributions can be graphically displayed with

histogram, stemplot

How does multiplying each observation by a positive number b change the measures of center (mean, median) or spread (IQR, s)?

multiplies both the center and spread by b

independent

2 events do not influence each other, and the knowledge about one does not change the probability of the other

In the Normal distribution with mean µ and standard deviation σ: - approximately _____ of the observations fall within σ of µ. - approximately _____ of the observations fall within 2σ of µ. - approximately _____ of the observations fall within 3σ of µ.

68%, 95%, 99.7%

Which of the following statements about the standardized z-score of a value of a variable X, which has a mean of m and a standard deviation of s, is/are TRUE? A) All of the above statements about the z-score are true. B) The z-score has a mean equal to 0. C) The z-score tells us how many standard deviation units from the original observation fall away from the mean. D) The z-score tells us the direction the observation falls away from the mean. E) The z-score has a standard deviation equal to 1.

A) All of the above statements about the z-score are true.

Suppose you are examining the correlation between two quantitative variables and the correlation, r, is very small. However, you expected it to be larger. What could you do? A) Examine the data to determine if there are any outliers that could be removed. If so, remove the outliers and recalculate r. B) Change the units of measurement to something else (e.g., convert data measured in inches to centimeters.) C) Plot the data on a smaller scale. D) None of the above

A) Examine the data to determine if there are any outliers that could be removed. If so, remove the outliers and recalculate r.

Before using the correlation, r, you should do which of the following? A) Look at the scatterplot of the data to determine if the relationship appears linear. B) Look at a histogram to be sure your data are approximately normal. C) Look at a stemplot to determine if the data are symmetric. D) All of the above

A) Look at the scatterplot of the data to determine if the relationship appears linear.

True or False. To use a log transformation, all values must be positive. A) True B) False

A) True

Correlation based on averages will tend to be ______ correlations based on individuals. A) higher than B) lower than C) the same as

A) higher than

When making histograms, the classes ________. A) should be equal width B) do not need to be equal width C) should be selected randomly D) should always be a width of 10

A) should be equal width

When making a stemplot, it is appropriate to _______ if the values cover a very small range. A) split the stem B) split the leaves C) trim the stem D) trim the leaves

A) split the stem

Two variables are ____________ associated when above-average values of one tend to accompany below-average values of the other and vice versa.

negatively

If above-average values of two quantitative variables and below-average values of the same two quantitative variables tend to occur together, the two variables are ____________ associated.

positively

What value of r^2 makes the LSRL a good model?

r^2 > 0.50

Simpson's Paradox

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined

Leverage

x-value is far from mean of x-values

event

An outcome or set of outcomes of a random phenomenon

Law of large numbers

As the number of observations drawn increases, the sample mean of the observed values gets closer and closer to the mean of the population

A set of midterm exam scores has a median that is much larger than the mean.Which of the following statements is most consistent with this information? A) A stemplot of the data would be symmetric. B) A stemplot of the data would be skewed left. C) A stemplot of the data would be skewed right. D) The data set must be so large that it would be better to draw a histogram rather than a stemplot.

B) A stemplot of the data would be skewed left.

Which of the following statements is/are FALSE? A) A scatterplot is a useful graphical tool for displaying the strength of the relationship between two quantitative variables. B) The only relationship that a scatterplot can usefully display is linear with no outliers. C) If above-average values of two quantitative variables and below-average values of the same two quantitative variables tend to occur together, the two variables are positively associated. D) An individual value that deviates from the overall pattern displayed on a scatterplot is called an outlier. E) A categorical variable can be added to a scatterplot by using a different color or symbol for each category.

B) The only relationship that a scatterplot can usefully display is linear with no outliers.

The least-squares regression line always passes through the point ____. A) (0,0) B) None of the above C) (x bar, y bar) D) (median of x, median of y)

C) (x bar, y bar)

When using a histogram to display categorical values, you should make sure the categories are in alphabetical order. A) True—histograms are not useful if the categories are not in order. B) True—histograms can be used on any type of data. C) False—You cannot use histograms to display categorical data. D) False—The categories cannot be in alphabetical order when displaying categorical data.

C) False—You cannot use histograms to display categorical data.

It is known that not exercising may lead to poor health. However, it is possible that people who are already in poor health do not have the ability or energy to exercise. This example is one of ________________. A) causation B) common response C) confounding

C) confounding

Common response

Changes in both x and y are caused by changes in a lurking variable z

Transformations are used to ______. A) make curved relationships more linear B) make data more normal C) change the scale of measurements D) All of the above

D) All of the above

Which of the following statements about Normal quantile plots is/are FALSE? A) In constructing a Normal quantile plot, each data point is plotted against its corresponding Normal score. B) The Normal quantile plot is a very useful graphical tool for assessing the adequacy of the Normal model. C) If the points on a Normal quantile plot lie close to a straight line, the plot indicates that the Normal model is an adequate representation for the data. D) Because you will see the usual mound-like appearance of the Normal distribution on a histogram, it is more helpful than the quantile plot for assessing Normality. E) On a quantile plot, outliers will appear as points that are far away from the overall pattern of the plot.

D) Because you will see the usual mound-like appearance of the Normal distribution on a histogram, it is more helpful than the quantile plot for assessing Normality.

Disjoint vs. Independent

Disjoint- no outcomes in common, knowing that one occurred means the other didn't (knowing probability of the first changed the probability of the second)

Which of the following is/are NOT resistant to outliers? A) Mean B) Median C) r D) Standard deviation E) A, C, and D

E) A, C, and D

When examining a distribution of a quantitative variable, which of the following features do we look for? A) Overall shape, center, and spread B) Symmetry or skewness C) Deviations from overall patterns such as outliers D) The number of peaks or modes E) All of the above

E) All of the above

T/F: Linear transformations change the basic shape of a distribution

False

If the distribution is skewed, where is the mean compared to the median?

Farther out in the long tail

Examining a scatterplot

Form- straight, curved Direction- positive, negative, no direction Strength- points close or scattered Outliers- unusual values

Associated variables

Knowing the value of one variable tells you something about the other

Distribution of a categorical variable

Lists the categories and gives either the count or the percent of individuals who fall in each category

Which measures of center/spread should you choose for a symmetric distribution?

Mean, standard deviation

Which measures of center/spread should you choose for a skewed distribution?

Median, IQR

What to look for when examining distributions

Overall Pattern Shape, Center, Spread Outliers

Multiplication Rule for Dependent Events

P(A and B) = P(A) * P(B|A)

Addition Rule for Not Disjoint Events

P(A or B) = P(A) + P(B) - P(A and B)

P(A and B) =

P(A) x P(B)

Conditional Probability

P(B|A) = P(A and B) / P(A)

To determine if events are independent:

P(B|A) = P(B)

Categorical variable

Places each case into several groups, or categories

discrete random variable

Takes a fixed set of possible values with gaps between

Quantitative variable

Takes numerical values for which arithmetic operations such as adding/averaging make sense

continuous random variable

Takes on all values in an interval of numbers

Distribution of a quantitative variable

Tells us what values a variable takes and how often it takes those values

Probability distributions can be visualized graphically using a _______ for continuous random variables.

density curve

probability model

description of some random phenomenon that consists of 2 parts: sample space S and probability for each outcome

If two events have no outcomes in common, then those two events are ________.

disjoint

Explanatory variable

explains/causes changes in the response variable (x)

Standard deviation

how far each observation is from the mean

A variable that explains or causes change to another variable is called a(n) _______ variable.

independent

random

individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions

Response variable

measures an outcome of a study (y)

Cases

objects described by a set of data (What am I studying?)

Residual

observed y - predicted y

Categorical variable distributions can be graphically displayed with

pie chart, bar graph

finite

probability model with a finite sample space

Residual plot

scatterplot of regression residuals against the explanatory variable -should be a random scatter around zero

Variable

special characteristic of a case

Influential

take out a point --> affects model

r^2 (coefficient of determination)

the fraction of the variation in values of y that is explained by the least-squares regressions of y on x

When the median is closer to Q3, the graph is skewed to

the left

probability

the proportion of times the outcome would occur in a very long series of repetitions

When the median is closer to Q1, the graph is skewed to

the right

sample space S

the set of all possible outcomes

Sampling variability

the value of a statistic varies in repeated random sampling

Confounding

two variables are confounded when their effects on a response variable cannot be distinguished from each other


Set pelajaran terkait

Realism and Regionalism Unit Test

View Set

English Midterms: A Marriage Proposal

View Set

cellular molecular biology exam 3

View Set

SUBJECT PRONOUNS and OBJECT PRONOUNS

View Set