Statistics Test 1 (1-4)
The regression line summarizes the relationship of a distribution. The regression line formula is?
"Predicted" y = a + bx
Statisticians often write the word ____ in front of the y-variable in the equation of the regression line.
"predicted"
The correlation coefficient (r) is always a number between ____
-1 and 1
The intercept of a regression line tells a person the predicted mean y-value when the x-value is ___
0
In a boxplot, potential outliers are points that are more than _____ IQRs from the edges of the box.
1.5
What percentage of the observations will be within one standard deviation of the mean? Within two?
68% 95%
What is the main difference between a bar chart and a histogram?
A bar chart is used for numerical variables while a histogram is used for categorical variables.
What type of effect can outliers have on a regression line?
A big effect
A differences between two groups in an observational study that can explain why the outcomes were very different between the groups is called what?
A confounding variable
Two commonly used graphs to display the distribution of a sample of categorical data are?
Bar graph and pie chart
What is the most common trick to mislead readers of bar graphs?
Change the scale of the vertical axis so that it does not start at 0
Of the following, which is the only method of data collection suitable for making conclusions about causal relationships? Observational Studies Anecdotes Controlled Experiments All three are suitable
Controlled Experiments
Which of the following is NOT a way in which the Internet is influencing statistical graphics? Decreasing the use of misleading graphics Increasing the use of interactive displays Allowing for a greater variety of graphical displays None of the above
Decreasing the use of misleading graphics
When on has influential points in their data, how should regression and correlation be done?
Do regression and correlation with and without these points and comment on the differences
Which of the following is NOT one of the criteria for the "gold standard" for experiments? Large Sample Size Random assignment of subjects to treatment or control groups Double-blinding Equal sample sizes for control and treatment group
Equal sample sizes for control and treatment group
When examining the shape of a distribution of numerical data, which of the following is NOT one of the three basic characteristics of a distribution's shape? Whether the distribution is symmetric of skewed How many numbers are in the data set How many mounds appear Whether any unusually large or small values are present
How many numbers are in the data set
A standard unit measures what?
How many standard deviations away an observation is from the mean
What are some things that should be asked when developing an understanding of data?
How where the variables measured? What variables were measured? Who collected the data?
The length of the box in a boxplot is proportional to what?
IQR
The interquartile range is the measurement of variability best used when the distribution is skewed. Its formula is?
IQR = Q₃ - Q₁
When computing the correlation coefficient (r), what is the effect of changing the order of the variables on r?
It has no effect on r
What is the first step in almost every investigation of data?
Make an appropriate graph
The range is a crude measure of variability. It's formula is?
Maximum - Minimum
What are two measures of the center of distribution?
Mean and Median
When can a correlation coefficient (r) based on an observational study be used to support a claim of cause and effect?
Never
What are two basic types of variables in statistics?
Numerical and Categorical
Values so large or so small that they do not fit into the pattern of the distribution are called what?
Outliers
What two-step process is used to examine distributions?
See the data and summarize it
The standard deviation is the measure of variability best used if the distribution is symmetric. The formula for standard deviation is?
Standard Deviation = s = √(∑(x-x̄)²/(n-1)
In an experiment studying the association between a treatment variable and an outcome variable, the group of people who do NOT receive the treatment are called what?
The Control Group
Under what conditions is the use of the mean preferred?
The mean is preferred when the data is relatively symmetric
In a right-skewed distribution which of the following is true? The mean tends to be less than the median The mean and median are approximately the same The mean tends to be greater than the median None of these
The mean tends to be greater than the median
Under what conditions is the use of the median preferred?
The median is preferred when the data is strongly skewed or has outliers
What are the fiver numbers needed to make a boxplot?
The minimum Q1 The Median Q3 The mamimum
If an observation has a z-score of 0, what does that mean?
The observation is equal to the mean
The outcome variable in a question about causality is also referred to as what?
The response variable
Because the human eye has a difficult time judging how much area is taken up by the wedge-shaped slice of a pie chart, which of the following is true of pie charts? They are only used for small data sets They are only used if they are made by a computer They are not commonly used by statisticians or in scientific settings They are preferred over bar graphs
They are not commonly used by statisticians or in scientific settings
Why are percentages or rates often better than counts for making comparisons?
They take into account possible differences among the sizes of the groups.
An important use of the regression line is to do what?
To make predictions about the values of y for a given x-value
Why is random assignment used to assign people to treatment groups and control groups in a controlled experiment?
To make the groups as similar as possible, minimizing bias.
In a boxplot, the whiskers extend to?
To the most extreme values that are not potential outliers
The existence of multiple mounds in a distribution is sometimes a sign of what?
Two very different groups have been combined into a single collection
The circles shown are similar, but not exactly the same. This is an example of?
Variation
Which of the following is NOT something that one looks for when studying scatterplots? Shape Variation Strength Trend
Variation
The study of statistics rest on what two major concepts?
Variation and data
A stemplot is often useful when?
When technology is not availiable and the data set is not large
What is used to compare values measured in different units, such as inches and pounds?
Z-Score
The formula for the intercept (a) of a regression line is?
a = ȳ - bx̄
Since, in general, the longer a car is owned the more miles it travels one can say there is a ______ between age of car and mileage.
a positive association
The formula for the slope (b) of a regression line is?
b = r(Sy/Sx)
In a histogram, observations are grouped into intervals called _____.
bins
Changing the width of bins in a histogram ______
changes the shape of the histogram
In statistics variables are
characteristics of people or things
The value that measures how much variation in the response variable is explained by the explanatory variable is called the ____.
coefficient of determination
Data are more than just numbers, because data have _____
context
The _____ is a number that measures the strenth of the linear association between two numerical variables.
correlation coefficient (r)
The ____ organizes data by recording all the values observed in a sample as well as how many times each values was observed.
distribution of a sample
Attempting to use the regression equation to make predictions beyond the range of the data is called _____
extrapolation
The number of times a value is observed in a data set is called a ___
frequency
Since outliers can greatly affect the regression line they are also called ____ points
influential
Another name for the regression line is the ____ line.
least squares line
The ____ is another term for the arithmetic average.
mean
The mean is the measure of center best used if the distribution is symmetric. Its formula is?
mean = x̄ = ∑x/n
In a boxplot, the vertical line inside the box marks the location of the _____.
median
The value that would be right in the middle if you were to sort the data from smallest to largest is called the ____
median
When a distribution is skewed, the ____ is used to measure the center and the ____ is used to measure variation.
median interquartile range
When describing the distribution of a categorical variable, the category that appears most often is called the ____
mode
In statistics, the data we work with is just one part of a bigger picture called the
population
When writing a regression equation, what are names for the y-variable?
predicted variable response variable dependent variable
When writing a regression equation, what are names for the x-variable?
predictor variable explanatory variable independent variable
"Relative frequency" is the same as?
proportion
Categorical values are also referred to as ____ variables.
qualitative
Numerical values are also referred to as ____ variables.
quantitative
The correlation coefficient (r) measures the strength of a linear association. What is its formula?
r = (∑ZxZy)/n-1
The ____ is a tool for making predictions about future observed values and is a useful way of summarizing a linear relationship.
regression equation
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
resistant
When describing two-variable associations, a written description should always include what?
trend shape strength context
The correlation coefficient (r) makes sense only if the trend is linear and the ____
variables are numerical
Variance is another measure of variability and is used if the distribution is symmetric. What is the variance formula?
variance = s² = (∑(x-x̄)²/(n-1)
A large amount of scatter in a scatterplot is an indication that the association between the two variables is ____.
weak
A z-score converts observations into standard units. Its formula is?
z = (x-x̄)/s