Stats Test 3 (Ch. 7-9)
A nutrition major at Sate University was studying the relationship between carbohydrates (X) and calories (Y). For example, a serving of a particular brand of wheat pasta yielded 42 carbohydrates and 210 calories. After collecting X and Y data on many kinds of foods, the student determined the slope of the regression line to be 4.0 and the Y intercept to be 3.0. If a new food is tested, and the number of carbohydrates (X) is 100, what would be the predicted calories (Y')?
403
a _____ is the average (mean) cross product of the z-scores.
correlation
T/F: if correlation is +1.00, the regression line is at a 45* angle
false
know how to find a regression equation when they give data/chart (#24-26, ch. 8)
yes ma'am
the sum of the deviations of the true Y scores from the predicted Y' scores is always
zero
if the correlation is zero and you're trying to find y', using x will not be a reliable predictor. therefore, we use ___ to predict y'
ȳ
which of the following formulas represents the Y intercept of the regression line?
ȳ - (b)(x̄)
computing the standard error of the estimate (Ssuby) by subtracting each Y' from the corresponding Y, squaring the difference, summing the results, dividing by N, and then taking the square root (the defining formula) is difficult and time-consuming. Fortunately, the same value can be computed by multiplying the standard deviation of Y by
√(1-r²)
Which of the following r-values indicates the weakest relationship between two variables? a. +.45 b. -.30 c. +.03 d. -.45
+.03
Calculate the appropriate correlation coefficient for the following data (reading speed test score = X, number of books read = Y)
+.95
Which of the following r-values indicates the strongest relationship between two variables? a. +.65 b. -.89 c. +.10 d. -.10
-.89
what is the Y intercept of the following regression equation? Y' = -4.30X - 1.72
-1.72
what is the slope of the following regression equation? Y' = -8.27X + 3.09
-8.27
a student wanted to know whether memory for items on a grocery list was better when a bizarre image was created for each item, compared with a strategy of simply trying to remember the items. the student had a randomly selected sample use bizarre imagery to remember a list of 15 items; the mean number of items remembered was 10. the student compared this mean to a population mean of 7 and a standard error of 1.20. What is the probability of obtaining a sample mean of 10 or higher?
.0062
if the probability of getting a z score between the mean and +1 standard deviation is .3413, what is the probability of getting a z score lower than -1 standard deviation?
.1587
Assad wants to show Cheryl and Cindy a card trick he has learned. He first asks Cheryl to draw one card at random from a standard deck of 52 cards. Cheryl draws out the 4 of hearts, shows the card to Cindy, and then lays it face down on the table in front of her. Assad then extends the remaining cards for Cindy to make her selection. What is the probability that Cindy also will draw out a heart?
.24
A recent study of burnout among therapists shows a positive correlation of 0.53 between the number of clients a therapist is treating and the therapist's feelings of burnout. What proportion of the variance in feelings of burnout is accounted for by this relationship?
.28
one hundred tickets have been sold. you've purchased 30 tickets. what is the probability of one of your tickets being randomly selected?
.30
on a roll of a die, the probability that 3 or more dots will appear on the top side is .67, and the probability that 2 or fewer dots will appear is .33. If we know that 6 dots showed up on the last throw, what is the probability that 2 or 1 will show up on the next throw?
.33
the president of state college wants to conduct a survey of the students. he knows there are 325 freshmen, 250 sophomores, 175 juniors, 100 seniors, and 150 grad students. What is the probability that the very first person randomly selected for the survey will be a freshman?
.33
what is the probability of getting a sample mean between 500 and 520 if the population mean is 500 and the standard deviation of the sampling distribution (the standard error of the mean) is 20?
.3413
your professor told the class that, typically, 20% of the class receives As, 20% Bs, 35% Cs, 10% Ds, and 15% Fs. What is the probability that the student sitting next to you will receive an A or an F?
.35
an advertising executive wanted to know whether a new ad campaign designed to deter children from smoking was working. after seeing the ads, a randomly selected sample of 36 teenagers rated what they thought of smoking on a -5 (it's terrible) to +5 (it's fantastic). the mean rating for the sample was -.70. the population mean is 0, with a population standard deviation of 2.00. what is the probability of obtaining a sample mean of .-70 or lower (more negative)?
.3632
Suppose you select a candy from a jar with 6 red and 4 green candies. You draw out a red candy and eat it. You then select a second candy from the jar. What is the probability you draw out a green candy this time?
.44
the population of getting heads on a fair coin is .50. The probability of getting tails is also .50. if you toss the coin 3 times and get heads all 3 times, what is the probability of getting tails on the next toss?
.50
Suppose you select a candy from a jar with 6 red and 4 green candies, note its color, and then replace it ten times. Each time, the candy you have selected has been green. What is the probability that on the next drawing you will select a red candy?
.60
if the probability of getting a z score between the mean and +1 standard deviation is .3413, what is the probability of getting a z score of +1 standard deviation?
.8413
When r = 1.0, then Sy equals
0
if correlation is 0, what is the z score of y'?
0
the y' regression line intercepts the Y axis at ȳ if r = ?
0
when r = 0.0, the slope of the regression line equals
0
what's the range of values in probability?
0-1
if there is no relationship between two variables, the slope of the regression will equal
0.0
the Y intercept is the value of Y' when X equals
0.0
what do you get when you add the coefficients of determination and non-determination?
1
what 3 things must you do to establish cause-effect?
1. determine high correlation 2. have a time-order sequence 3. eliminate all plausible alternatives
steps to constructing a regression line for predicting Y from X (and also X from Y)
1. take the two extreme values of X and predict a Y score from them 2. plot the two points on a graph 3. draw a line between the two points. repeat the same but X from Y to get the other regression line
what are reasons for low correlation?
1. the two variables aren't related 2. they are related in a non-linear way (curvilinear relationship) 3. a truncated range 4. low N 5. an outlier 6. heteroscedasticity
what two things do you need to remember when calculating the pearson r?
1. x and y don't have to be the same units of measurement 2. x and y have the same mean
what is the Y intercept of the following regression equation? Y' = .56X + 2.41
2.41
what is the slope of the following regression equation? Y' = 2.69X - 3.92
2.69
how would a statistician define probability?
A mathematical statement indicating the likelihood of an event when we randomly sample a particular population
If you see the notation ∑XY, what should you do?
First multiply each X by its partner Y, then sum the results.
If you see the notation (∑X)(∑Y), what should you do?
First sum the Xs, then sum the Ys, then multiply the sums.
What does a correlation coefficient do?
It quantifies the pattern in a relationship
which of the following formulas represents the slope of the regression line?
N(∑XY) - (∑X)(∑Y) -------------------- N(∑X²) - (∑X)²
Professor Johnston has found a strong positive correlation between wearing neckties and the frequency of strokes (r = .89) He thinks that the necktie reduces blood flow to the brain, preventing the brain from receiving enough oxygen. Professor Johnston and his associates claim to have proven that wearing neckties causes strokes. What error has Professor Johnston made?
Professor Johnston is drawing a causal conclusion from correlational findings.
when knowledge of a relationship is used, the average error remaining after predictions have been made based on the relationship is
Sy²
"The self-confidence of a group of students is positively correlated with their chances of getting through the course." What does this statement mean?
The chances of passing the course tend to increase as the self-confidence scores of the students increase
what is the purpose of the critical value?
To define the minimum absolute z-value required for a sample to be in the region of rejection
a "weak" relationship between two variables is represented by
a large spread of Y scores at each X score
"The more you save, the less you spend" describes
a negative linear correlation
"The bigger they are, the harder they fall" describes
a positive linear correlation
which of the following best describes knowing the relative frequency of every possible event in a population?
a probability distribution
what can we never be sure that a sample represents a population?
a random sample may poorly represent the population, or it may represent a population that is different
which of the following accurately describes a theoretical probability distribution? it is based on
a theoretical model of the relative frequency of events in a population
the "error" in a single prediction is equal to the degree to which a participant's _____ score deviates from the _____
actual; corresponding predicted score
the probability of obtaining either one of two events is equal to the sum of the separate probabilities of each event ex. probability of rolling either a 5 or 6, (1/6) and (1/6) = (2/6)
addition rule for mutually exclusive events
ex. what is the probability of obtaining either a jack or a spade? p(A or B) = p(A) + p(B) - p(A and B) = (4/52) + (13/52) - (1/52) = (16/52)
addition rule when two events are not mutually exclusive
When the correlation coefficient representing the relationship between X and Y is intermediate, then all of the following are true except: a. there is not a perfectly consistent association b. there are different Y scores associated with a single X score c. prediction of Y from a known X score has some error d. all data points fall on the regression line
all data points fall on the regression line
Using a correlational design, a researcher found a relationship between the healthiness of one's heart and the amount of fish oil in one's diet. the researcher should conclude that:
although a relationship exists, one cannot infer that changes in one variable are causing changes in the other variable.
when rolling a pair of fair dice, the probability of rolling a total point value of "7" is .17. if you rolled a pair of dice 1,000 times and the point value of "7" appeared 723 times, what would you probably conclude?
although not impossible, this outcome is so unlikely that the fairness of the dice is questionable
In general, a zero correlation means that
as the values of one variable increase, there is no tendency for the values of the other variable to change in any consistent, predictable fashion.
Professor Miller has found that the correlation between a person's "need for affiliation" (found by taking a test to determine the need to be with others) and the number of hours spent watching television is -.69. He should conclude that
as we observe people with higher and higher need for affiliation, we see a tendency for those people to spend less and less time watching television.
One assumption of linear regression is
at each X, the sample of Y scores should represent an approximately normal distribution
the "error" in all predictions made from a sample using linear regression is the
average spread of actual Y scores around the predicted Y' scores
the standard error of the estimate is defined as the
average spread of actual Y scores around the predicted Y' scores
which of the following is not true of the criterion: a. the criterion is the probability that defines samples as unlikely b. samples that meet the criterion occur more than 5% of the time c. behavioral researchers usually use .05 as their criterion probability d. sample means that occur with a probability less than that of the criterion probability are likely to represent some other population
b. samples that meet the criterion occur more than 5% of the time
can you make better predictions with +1.00 or -1.00
both perfect
At State University Medical Center, a research study has produced a very strong negative correlation between the number of years a person has smoked and that person's lung capacity. Assuming the correlation passes the appropriate inferential test, what should the researchers do next?
calculate the linear regression equation
if we calculate a correlation coefficient and we find that there is a relationship between the two variables, we
cannot conclude that changes in one variable cause changes in the other variable
a high correlation coefficient does not imply that one thing ____ the other. it shows _____ of the relationship, not cause-effect.
caused, strength
if odds are 5 out of 100 or less it could have occurred by chance. if p ≤ .05, we reject _____
chance
In a nonlinear or curvilinear relationship, as the X scores change, the Y scores
change consistently, but in more than one direction
If there is a relationship between "amount of coffee consumed" and "nervousness", then as the amount of coffee consumed increases, the amount of nervousness
changes in some consistent, predictable manner.
probability that assumes an ideal situation (ex. coin flip, throwing dice)
classical
what are the 3 types of probabilities?
classical, empirical, and subjective
how can we determine the representativeness of a sample mean for a particular population?
convert the sample mean to a z score and compare the z score to the critical value
_____ is the statistical index that tells us how much two variables are related
correlation coefficient
y' is the y score we are PREDICTING based upon an x score (and x' is the x score we predict from a y score)
dang that makes sense now
at a basic level, when deciding whether a sample is representative of a particular population, we
decide against low-probability events in favor of of high-probability events
what is the basis of all inferential statistics?
deciding whether or not a sample of scores is representative of a particular population
one sample affects the other - without replacement (ex. removing a card from the deck and not putting it back)
dependent random sampling
coefficient of _____, r², tells us the amount of variation in Y accounted for by X. this is the ____ variation.
determination, true
When heteroscedasticity exists, the problem with r is that it
does not accurately describe the strength of the relationship for all Xs
this probability depends upon what has been found in past studies (ex. knowing the proportion of female students in past years, and the probability of selecting one in a random sample now)
empirical
combination of outcomes (ex. whatever we decide an _____ is with dice like even #'s)
event
when r = 0.0, the Y intercept is equal to
every predicted Y value
if the events exhaust all possible outcomes (ex. heads and tails)
exhaustive
what do you do when you have tied ranks?
find the average
To predict a Y score from a given X score using the regression constants, we would
first multiply X by the slope and then add the Y intercept
the greater the correlation, the ____ y' will be from ȳ
further
Compared to a strong relationship, a weak relationship between two variables results in
greater prediction error and a larger value of Sy
when two variables have similar variability
homoscedasticity
In a linear relationship, as the X scores increase, the Y scores change
in only one direction
In general, a positive correlation means that as values of one variable _____, there is a tendency for the values of the other variable to _____.
increase; increase
occurrence of an event doesn't affect the probability of the other event
independent
you roll a die twice, and both times you roll a 6. what type of events are these two rolls
independent
choice of one sample has no effect on the choice of the next sample (ex. removing a card from the deck, then putting it back)
independent random sampling
as a general rule, when statisticians determine the probability of events, they assume that the events are ____ and sampled _____ replacement.
independent; with
is probability descriptive or inferential statistics?
inferential
with which scale ranking would we use the pearson correlation coefficient?
interval and/or ratio
what is the post hoc fallacy?
is committed when it is assumed that because one thing occurred after another, it must have occurred as a result of it. Mere temporal succession, however, does not entail causal succession.
There are 26 red cards in a playing deck and 26 black cards. The probability of randomly selecting a red card or a black card is 26/52 = 0.50. Suppose you randomly select a card from the deck five times, each time replacing the card and reshuffling before the next pick. Each of the five selections has resulted in a red card. On the sixth turn, the probability of getting a black card
is the same as it has always been if the deck is a fair deck
a cognitive psychologist tested whether people spend more time looking at and comprehending content words (ex. ball, kick) than other words (the, a) when reading a passage. The mean looking time for a sample of content words was compared to the mean looking time for the population of other words by transforming it into a z score. the z score for the sample mean was z = 3.00. with critical values of +-1.96, what should the psychologist conclude about the sample mean?
it is an unlikely sample for the population of looking times for other words and probably represents some other population (times looking for meaningful words)
which of the following is not true of the linear regression equation? a. it is the equation from which the correlation coefficient is calculated b. it defines the straight line that summarizes a relationship c. it describes two characteristics of the regression line: its slope and its Y intercept d. it is the equation that produces the value of Y' at each X
it is the equation from which the correlation coefficient is calculated
Linear regression is important because
it is used to predict unknown Y scores based on X scores from a correlated variable
the lower the correlation, the ____ the standard error of estimate
larger
in a ____ relationship, as the X scores increase, the Y scores tend to change in only one direction
linear
the scores that lie in the tails of a normal distribution have a _____ and a _____ probability of occurring
low; low
the purpose of probability and inferential statistics is to
make decisions about the population that have a good chance of being correct
when the correlation is zero, what is the best measure to use?
mean
the intersection points of your two lines are the _____ of X and Y
means
occurrence of one event affects the next event If I have 2 girl names and 2 boys names in a hat: the odds of drawing girl name is 1/2. if I draw a girl's name, then the next time, what are the odds of drawing a girl's name? 1/2 x 1/3 = 1/6
multiplication rule for dependent events
probability of simultaneously or successive occurrence of two events is the product of the separate possibilities of each event p (A and B) = p(A) p(B) ex. probability of throwing a 5 and then a 6 = (1/6)(1/6) = (1/36)
multiplication rule for independent events
if two events cannot occur simultaneously (ex. can't roll a 2 and 4 at the same time)
mutually exclusive
do you have replacement in opinion polls?
no
in some cases when the correlation is not perfect, can the regression line fall through all the pair of scores?
no
does adding or subtracting a constant affect r? why?
no it also changes the mean, so the relative distance stays the same
What type of relationship does a horizontal line represent?
no relationship
coefficient of _____, 1-r², tells us the error variation; aka the variation not explained by the correlation between the two variables
non-determination
when a z score is not in the region of rejection, we should
not reject the idea that the sample represents the raw score population
The regression line is the best fitting line because
on average, the regression line passes through the center of the various Y scores
The strength of a relationship is indicated by the extent to which _____ paired with each individual value of the _____ variable.
one value of the Y variable is; X
with which scale ranking would we use the spearman correlation coefficient?
ordinal scale
which of the following is the criterion that psychologists usually use to determine the likelihood that a sample mean was obtained by chance?
p = .05
in a _____ _____, each individual has the same z score on both the x and y variable
perfect correlation
one purpose of correlation is to enable us to _____ an unknown value of Y from a known value of X
predict
statisticians use linear regression to
predict unknown Y scores from known X scores
When we divide the error remaining after we use the relationship to predict Y scores by the total error when we use the mean to predict the Y scores and then subtract the results from 1, the final result is
proportion of variance accounted for
the coefficient of determination is interpreted as the
proportion of variance accounted for
when we square the correlation coefficient to produce r², the result is equal to the
proportion of variance accounted for
the ______ is the proportional improvement in the accuracy of our predictions produced by using a relationship to predict Y scores, compared to our accuracy when we do not use the relationship
proportion of variance accounted for (r²)
the coefficient of alienation is interpreted as the
proportion of variance not accounted for
what's the best way to look at linearity?
put it on a standard diagram
each sample has exactly the same chance of being chosen
random sampling
_____ contains means that are so unlikely to be representing the underlying population, we reject they represent the population
region of rejection
what do we call that portion of the sampling distribution in which values are considered too unlikely to have occurred by chance?
region of rejection
The best fitting line through a scatter plot is known as the _____ line
regression
the _____ line summarizes a relationship by passing through the center of the scatterplot
regression
The best-fitting line through a scatterplot is known as the
regression line
if the correlation coefficient turns out to be a relatively high value, then the value of Sy will be
relatively low
In an experimental design _____, whereas in a correlational design _____
researchers assign each person an X score and then measure the score on the Y variable; researchers measure scores on variables that a participant has already experienced.
arises when the range between the lowest and highest scores on one or both variables is limited; this will produce a coefficient that is smaller than it normally would be
restriction of range
the coefficient of determination is equal to
r²
all possible outcomes (ex. flipping a coin, _______ is heads or tails)
sample space
____ occurs when random chance produces a sample statistic that is not equal to the population parameter it represents
sampling error
suppose you take a piece of candy out of a jar, look to determine its color, the put it back into the jar before you randomly select the next piece of candy. this type of sampling is called
sampling with replacement
if you were to plot two variables on a graph, this is called a:
scatter diagram
When plotting correlational data, the appropriate graph to use is the
scatterplot
We should do a scatterplot of the data when we compute a correlation because the scatterplot allows us to
see the nature of the relationship between the two variables.
a study about the college aptitude of seniors at south city high school has resulted in a sample mean with a corresponding z score of 1.89. If the critical value for the region of rejection is +-1.96, what is the correct conclusion?
since the z value does not fall within the region of rejection, we should not conclude this sample mean represents some other population.
a study about the college aptitude of seniors at south city high school has resulted in a sample mean with a corresponding z score of 2.00. If the critical value for the region of rejection is +-1.96, what is the correct conclusion?
since the z value falls within the region of rejection, we should conclude this sample mean likely represents some other population.
the slope of a line is a number indicating the
slant of the line and the direction in which it slants
In looking at the regression constants, we know that the relationship is negative if the
slope value is negative
the __________ is the clearest way to describe the "average" error when using Y' to predict Y scores
standard error of the estimate
type of probability that we use most of the time in everyday situations (ex. asking someone on a date, getting up for class, driving)
subjective
What statistic should be used to find out whether there is a relationship between hours spent participating in sports and GPA?
the Pearson correlation coefficient
What statistic should be used to find out whether there is a relationship between years of education and annual income?
the Pearson correlation coefficient
Which correlation coefficient should we use if we want to find out whether a relationship exists between two variables that are both interval or ration variables?
the Pearson correlation coefficient
Suppose a researcher has trained two observers to rank participants according to their level of frustration when trying to solve a puzzle. What statistic should be used to determine the extent to which the two observers agree in their rankings of frustration?
the Spearman rank-order correlation coefficient
What statistic should be used to find out whether there is a relationship between high school class rank and first-semester college GPA rank?
the Spearman rank-order correlation coefficient
Which correlation coefficient should we use if we want to find out whether a relationship exists between two variables that represent pairs of ordinal scores?
the Spearman rank-order correlation coefficient
Homoscedasticity occurs when
the Y scores at all Xs are spread out to the same degree
Heteroscedasticity
the Y scores have a different degree of spread at different Xs
Linear regression is defined as the procedure for determining
the best-fitting straight line in a linear relationship
which of the following accurately describes an empirical probability distribution? It is based on
the computed relative frequency of observed events
A regression line is usually used when
the correlation coefficient is not 0.0
what is the critical value?
the inner edge of the region of rejection
in general, the greater the proportion of variance accounted for,
the more accurately we can predict behavior
When a sample mean is different from the mean of the sampling distribution (the population mean), two alternatives must be considered: The sample mean may represent _____, or it may represent _____.
the population poorly; a different population
an event's relative frequency in the population equals
the probability of an event
a probability distribution gives us
the probability of every possible event in a population
two events are said to be independent when
the probability of one event is not influenced by the occurrence of the other event
how is the relative frequency of an event defined
the proportion of times an event occurs in the population of events
Suppose you drew a random sample from a population where the mean is 100. The standard error of the sampling distribution is 10. The mean for your sample is 80. What could you conclude about your sample?
the sample mean does not occur very often by chance in the sampling distribution of means and probably did not come from the given population.
what can we conclude when the absolute value of a z score for a sample mean is larger than the critical value?
the sample mean does not represent the particular raw score population on which the sampling distribution is based
Suppose you drew a random sample from a population where the mean is 100. The standard error of the sampling distribution is 10. The mean for your sample is 110. What could you conclude about your sample?
the sample mean occurs very often by chance in the sampling distribution of means and probably did not come from the given population
what can you conclude about a sample mean that falls within the region of rejection?
the sample probably represents some population other than the one on which the sampling distribution was based
To know whether there is a relationship between two variables, you draw a line around the outer edges of a scatterplot. You can tell when there is no relationship when
the scatterplot is either circular or elliptical, with the ellipse being parallel to the X axis
the criterion determines
the size of the region of rejection
in the regression equation, the slope summarizes _____ and the Y intercept indicates _____
the steepness and direction of the regression line; the value of Y' when X=0
we calculate the proportion of variance accounted for because it is the statistical basis for evaluating
the usefulness of a relationship
when there is no relationship between two variables, the value of every Y' is equal to
the value of the Y intercept
Professor Helgin has found that the correlation between the length of a person's index finger and the person's IQ is -.09. He should conclude that
there is a very weak relationship between the length of the index finger and IQ because r is nearly 0.
Which relationship is stronger, r = +.62 or r = -.62
there is no difference in the strength of the two relationships
which of the following is correct regarding means that fall within the region of rejection when the critical values are +-1.96?
they occur with a probability of 5%
In a correlational analysis, N stands for the
total number of pairs of scores
t/f: the magnitude of r increases whenever the variability of either X or Y increases
true
More questions (like 6 of them) on calculating the correlation coefficient so idk memorize those?
uh okay
what can we conclude about a sample mean that is found to lie in the region of rejection? it is extremely ___ to have occurred by chance, and it represents ____.
unlikely; some other population
when do we use the spearman R
when there is perfect correlation