Stats final
correlation
A _____ exists between two variables when the values of one variable are somehow associated with the values of the other variable.
uniform
A continuous random variable has a _______ distribution if its values are spread evenly over the range of possibilities.
z score with an area of a to its right
A critical value, Za, denotes the _______
significantly low or significantly high
A data value is considered ________ if its z-score is less than −2 or greater than 2.
*bell shaped graph*
A normal distribution is informally described as a probability distribution that is "bell-shaped" when graphed. Draw a rough sketch of a curve having the bell shape that is characteristic of a normal distribution.
least-squares property
A straight line satisfies the _____ if the sum of the squares of the residuals is the smallest sum possible.
The players height is 3.74 standard deviation(s) above the mean
A successful basketball player has a height of 6 feet 7 inches, or 201 cm. Based kn statistics from a data set, his height converts to the z score of 3.74. How many standard deviations is his height above the mean?
The probability of randomly selecting a smartphone user and getting a response other than "yes" is 0.22
A survey of smartphone users showed that 78% of respondents answered "yes" when asked if abbreviations (such as LOL) are annoying when texting. What is the probability of randomly selecting a smartphone user and getting a response other than "yes"?
multiple regression
A(n) ______ equation expresses a linear relationship between a response variable y and two or more predictor variables.
-least-squares -The criterion says that the line that best fits a set of data points is the one having the smallest possible sum of squared errors.
Answer the follow questions regarding the criterion used to decide on the line that best fits a set of data points. a.What is that criterion called? b.Specifically, what is the criterion?
the law of large numbers
As a procedure is repeated again and again, the relative frequency of an event tends to approach the actual probability. This is known as _______.
The coefficient of determination is 0.181. 18.1% of the variation is explained by the linear correlation, and 81.9% is explained by other factors.
Assume that you have paired values consisting of heights (in inches) and weights (in lb) from 40 randomly selected men. The linear correlation coefficient r is 0.425. Find the value of the coefficient of determination. What practical information does the coefficient of determination provide?
relative frequency
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
scatterplot
A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
What test scores are significantly low? Select the correct answer below and fill in the answer box(es) to complete your choice. - test scores that are less than 9.5 What test scores are significantly high? Select the correct answer below and fill in the answer box(es) to complete your choice - test scores that are greater than 29.9
Consider a value to be significantly low if it's z score less than or equal to -2 or consider a value to be significantly high if it's z score is greater than or equal to 2 A test is used to assess readiness for college in a recent year, the mean test score was 19.7 and the standard deviation was 5.1. Identify the test scores that are significantly low or significantly high.
Since the difference in weights are not countable, the data are from a continuous data set
Determine whether the data are from a discrete or continuous data set. A sample of married couples is randomly selected and the difference in weights for each couple is recorded
The study is an observational, since the subjects were not given any treatment Major problem: there is a strong possibility that the sample is biased, since a voluntary response sample is used.
Determine whether the following study is an experiment or an observational study, and then identify a major problem with the study. In a survey conducted by a national newspaper, 1465 Internet users chose to respond to this question posted on the newspaper's electronic edition: "How often do you seek medical information online?" 38% of the respondents said "frequently."
A discrete data set because there are a finite number of possible values.
Determine whether the given value is from a discrete or continuous data set. When a car is randomly selected, it is found to have enough room to seat 6 people
a discrete data set because there are a finite number of possible values
Determine whether the given value is from a discrete or continuous data set. When a car is randomly selected, it is found to have enough room to seat 6 people.
This is an observational study because the researcher does not attempt to modify the individuals. The major problem: The sample is too small
Determine whether the study is an experiment or an observational study, then identify a major problem with the study. A medical researcher tested for a difference in systolic blood pressure levels between male and female students who are 12 years of age. She randomly selected four males and four females for her study.
disjoint
Events that are ____ cannot occur at the same time.
The probability is 0.294
Express the indicated degree of likelihood as a probability value between 0 and 1 inclusive. Based on a report in a magazine, 29.4% of survey respondents have sleepwalked.
The probability is 0.5
Express the indicated degree of likelihood as a probability value between 0 and 1. Based on a survey of hiring managers who were asked to identify the biggest mistakes that job candidates make during an interview, there is a 50-50 chance that they will identify "inappropriate attire."
The probability is 0.1
Express the indicated degree of likelihood as a probability value between 0 and 1. When using a computer to randomly generate the last digit of a phone number to be called for a survey, there is 1 chance in 10 that the last digit is zero.
The area of the shaded region is 0.7939
Find the area of the shaded region. The graph depicts the standard normal distribution of bone density scores with mean 0 and standard deviation 1. (z=-0.82)
The area of the shaded region is 0.7296
Find the area of the shaded region. The graph depicts the standard normal distribution of bone density scores with mean 0 and standard deviation 1. (z=-0.99 and z=1.23).
68.27%
Find the indicated area under the curve of the standard normal distribution; then convert it to a percentage and fill in the blank. About_____ % of the area is between z=−1 and z=1 (or within 1 standard deviation of the mean).
95.44%
Find the indicated area under the curve of the standard normal distribution; then convert it to a percentage and fill in the blank. About ______% of the area is between z=−2 and z=2 (or within 2 standard deviations of the mean).
99.95%
Find the indicated area under the curve of the standard normal distribution; then convert it to a percentage and fill in the blank. About ______% of the area is between z=−3.5 and z=3.5 (or within 3.5 standard deviations of the mean).
z0.07 = 1.48
Find the indicated critical value. z0.07
The indicated z score is -1.22
Find the indicated z score. The graph depicts the standard normal distribution of bone density scores with mean 0 and standard deviation 1. (z=0......0.8888)
a. The difference is 34 beats per minute. b. The difference is 2.66 standard deviation. c. The z score is z = -2.66 d. The lowest pulse rate is slightly low.
For a data set of the pulse for a sample of adult females, the lowest pulse rate is 37 beats per minute, the mean of the listed pulse rates is x = 71.0 beats per minute, the their standard deviation is s = 12.8 beats per min. a. What is the difference between the pulse rate of 37 beats per minute and the mean pulse rate of the females? b. How many standard deviations is that [the difference found in part(a)]? c. Convert the pulse rate of 37 beats per min to a z score. d. If we consider pulse rates that convert to z scores between -2 and 2 to be neither significantly low nor significantly high, is the pulse rate of 37 beats per minute significant?
residual
For a pair of sample x- and y-values, the _____ is the difference between the observed sample value of y and the y-value that is predicted by using the regression equation.
The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful and there is a natural zero starting point.
For the given description of data, determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Volumes (cm3) of brains
systematic sampling
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A researcher selects every 292th social security number and surveys the corresponding person.
systematic
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. To estimate the percentage of defects in a recent manufacturing batch, a quality control manager at Microsoft selects every 17th software CD that comes off the assembly line starting with the fifth until she obtains a sample of 40 software CDs.
cluster
Identify which type of sampling is used: random, systematic, convenience, stratified, or cluster. To determine customer opinion of their safety features, Daimler−Chrysler randomly selects 30 service centers during a certain week and surveys all customers visiting the service centers.
No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.
If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global temperature?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: -2.00, -1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: -2.00, -1.00, 0, 1.00, 2.00? Why?
P(Ā) = 0.497
In a certain region, the probability of a baby being born a boy is 0.503 instead of 0.5. Let A denote the event of getting a boy when a baby is born. What is the value of P(Ā)?
The probability of getting a green pea is approximately 0.436. No, it is not reasonably close.
In a genetics experiment on peas, one sample of offspring contained 399 green peas and 516 yellow peas. Based on those results, estimate the probability of getting an offspring pea that is green. Is the result reasonably close to the value of 3/4 that was expected? Is this probability reasonably close to 3/4? Choose the correct answer below.
the group sample sizes are all large so the researchers could see the effects of the treatment
In a study designed to test the effectiveness of a medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the medication taken at regular intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the study?
relative frequency
In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
outlier
In modified boxplots, a data value is a(n) _____ if it is above Q3+(1.5)(IQR) or below Q1−(1.5)(IQR).
prospective
Indicate whether the observational study used is cross-sectional, retrospective, or prospective. A study of doctors' coffee consumption was started in 2001 with 721 licensed doctors. The study is ongoing.
the probability that in a single trial, event A occurs, event B occurs, or they both occur.
P(A or B) indicates ___.
rule of complementary events
P(A) + P(Ā) = 1 is one way to express the _____.
s = range/4
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
sample space
The _____ for a procedure consists of all possible simple events or all outcomes that cannot be broken down any further.
linear correlation coefficient r
The _____ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.
P5= 39.5
The accompanying data are lengths (inches) of bears. Find P5
P95 = 74
The accompanying data are lengths (inches) of bears. Find P95
Q3 = 66.5
The accompanying data are lengths (inches) of bears. Find Q3.
equally likely
The classical approach to probability requires that the outcomes are _______.
the probability that the z score is less than a
The notation P(z<a) denotes ______
difference between
The residual is the _____ the observed value of y and the predicted value of y.
{bbbb, bbbg, bbgb, bbgg, bgbb, bgbg, bggb, bggg, gbbb, gbbg, gbgb, gbgg, ggbb, ggbg, gggb gggg} Probability: 0.375
The sample space listing the eight simple events that are possible when a couple has three children is {bbb, bbg, bgb, bgg, gbb, gbg, ggb, ggg}. After identifying the sample space for a couple having four children, find the probability of getting two girls and two boys (in any order). Identify the sample space for a couple having four children. Find the probability of getting 2 girls and 2 boys (in any order).
no, because while there is no linear correlation, there may be a relationship that is not linear.
Twenty different statistics students are randomly selected. For each of them, their body temperature (°C) is measured and their head circumference (cm) is measured. If it is found that r=0, does that indicate that there is no association between these two variables?
-r is a statistic that represents the value of the linear correlation coefficient computed from the paired sample data, and ρ is a parameter that represents the value of the linear correlation coefficient that would be computed by using all of the paired data in the population of all statistics students. -The value of r is estimated to be 0, because it is likely that there is no correlation between body temperature and head circumference. -The value of r does not change, because r is not affected by converting all values of a variable to a different scale.
Twenty different statistics students are randomly selected. For each of them, their body temperature (°C) is measured and their head circumference (cm) is measured. a. For this sample of paired data, what does r represent, and what does ρ represent? b. Without doing any research or calculations, estimate the value of r. c. Does r change if body temperatures are converted to Fahrenheit degrees?
slope; intercept
Two key parts of a regression equation involve ______ and the y- _____.
5.0
Use the given data to find the best predicted value of the response variable. Ten pairs of data yield r=0.003 and the regression equation y=2+3x. Also, y=5.0. What is the best predicted value of y for x=2?
What is the value of the coefficient of determination? r^2 = 0.0026 What is the percentage of the total variation that can be explained by the linear relationship between cricket chirps and temperature? explained variation = 0.26%
Use the value of the linear correlation coefficient r to find the coefficient of determination and the percentage of the total variation that can be explained by the linear relationship between cricket chirps and temperature (x=number of cricket chirps in 1 minute, y=temperature in °F). r=0.051
Since the z score for the tallest man is z=10.17 and the z score for the shortest man is z= - 6.96, the the tallest man had the height that was more extreme.
Use z scores to compare the given values. The tallest living man at one time had a height of 244 cm. The shortest living man at that time had a height of 130.6 cm. Heights of men at that time had a mean of 176.68 cm and a standard deviation of 6.62 cm. Which of these two men had the height that was more extreme?
Since the z score for the actor is z= -1.41 and the z score for the actress is z=1.33, the actor had the more extreme age.
Use z scores to compare the given values. In a recent awards ceremony, the age of the winner for best actor was 39 and the age of the winner for best actress was 51. For all best actors, the mean age is 47.3 years and the standard deviation is 5.9 years. For all best actresses, the mean age is 35.2 years and the standard deviation is 11.9 years. (All ages are determined at the time of the awards ceremony.) Relative to their genders, who had the more extreme age when winning the award, the actor or the actress? Explain.
The response variable is weight and the predictor variables are length and chest size.
Using the lengths (in.), chest sizes (in.), and weights (lb) of bears from a data set, a researcher gets the regression equation below. -Weight = -274 + 0.426 Length + 12.1 Chest size. Identify the response and predictor variables in this regression equation.
to its right
What does the notation zα indicate? The expression zα denotes the z score with an area of α ______
A scatterplot is a graph of paired (x, y) quantitative data. It provides a visual image of the data plotted as points, which helps show any patterns in the data.
What is a scatterplot and how does it help us?
The value of r will always have the same sign as the value of b1
What is the relationship between the linear correlation coefficient r and the slope b1 of a regression line?
The mean and standard deviation have the values of μ=0 and o = 1
What requirements are necessary for a normal probability distribution to be a standard normal probability distribution?
z-score
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a ________
scatterplot
When determining whether there is a correlation between two variables, one should use a _____ to explore the data visually.
Use the regression line for predictions only if the data go far beyond the scope of the available sample data.
When making predictions based on regression lines, which of the following is not listed as a consideration?
P(A) represents: probability of selecting an adult with blue eyes. P(Ā) represents: probability of selecting an adult who does not have blue eyes.
When randomly selecting an adult, A denotes the event of selecting someone with blue eyes. What do P(A) and P(Ā) represent?
addition rule
When using the _____ always be careful to avoid double-counting outcomes.
the corresponding z-score is negative
Whenever a data value is less than the mean, _______.
nominal
Which level of measurement consists of categories only where data cannot be arranged in an ordering scheme?
simple random sample
Which of the following corresponds to the case when every sample of size n has the same chance of being chosen?
The graph is uniform
Which of the following does NOT describe the standard normal distribution?
the graph is uniform
Which of the following does NOT describe the standard normal distribution?
percentage, probability, and proportion
Which of the following groups has terms that can be used interchangeably with the others?
The graph is centered around 0
Which of the following is NOT a descriptor of a normal distribution of a random variable?
All events are equally likely in any probability procedure.
Which of the following is NOT a principle of probability?
if r > 1, then there is a positive linear correlation.
Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables?
mean
Which of the following is NOT a value in the 5- number summary?
mean
Which of the following is NOT a value in the 5-number summary?
Correlation does not imply causality
Which of the following is NOT one of the three common errors involving correlation?
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
Which of the following is always true?
data that were obtained from an entire population
Which of the following is associated with a parameter?
We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.
Which of the following statements about correlation is true?
II and III
Which of the following statements concerning the linear correlation coefficient are true? I: If the linear correlation coefficient for two variables is zero, then there is no relationship between the variables. II: If the slope of the regression line is negative, then the linear correlation coefficient is negative. III: The value of the linear correlation coefficient always lies between −1 and 1. IV: A linear correlation coefficient of 0.62 suggests a stronger linear relationship than a linear correlation coefficient of -0.82
stratified
Which sampling method subdivides the population into categories sharing similar characteristics and then selects a sample from each subdivision?
significantly low or significantly high
a data value is considered ______ if its z-score is less than -2 or greater than 2.
z0.01=2.33
z0.01
multiple regression
A(n) ___________ equation expresses a linear relationship between a response variable y and two or more predictor variables.