Statistics Test 1 (1-4)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The regression line summarizes the relationship of a distribution. The regression line formula is?

"Predicted" y = a + bx

Statisticians often write the word ____ in front of the y-variable in the equation of the regression line.

"predicted"

The correlation coefficient (r) is always a number between ____

-1 and 1

The intercept of a regression line tells a person the predicted mean y-value when the x-value is ___

0

In a boxplot, potential outliers are points that are more than _____ IQRs from the edges of the box.

1.5

What percentage of the observations will be within one standard deviation of the mean? Within two?

68% 95%

What is the main difference between a bar chart and a histogram?

A bar chart is used for numerical variables while a histogram is used for categorical variables.

What type of effect can outliers have on a regression line?

A big effect

A differences between two groups in an observational study that can explain why the outcomes were very different between the groups is called what?

A confounding variable

Two commonly used graphs to display the distribution of a sample of categorical data are?

Bar graph and pie chart

What is the most common trick to mislead readers of bar graphs?

Change the scale of the vertical axis so that it does not start at 0

Of the following, which is the only method of data collection suitable for making conclusions about causal relationships? Observational Studies Anecdotes Controlled Experiments All three are suitable

Controlled Experiments

Which of the following is NOT a way in which the Internet is influencing statistical graphics? Decreasing the use of misleading graphics Increasing the use of interactive displays Allowing for a greater variety of graphical displays None of the above

Decreasing the use of misleading graphics

When on has influential points in their data, how should regression and correlation be done?

Do regression and correlation with and without these points and comment on the differences

Which of the following is NOT one of the criteria for the "gold standard" for experiments? Large Sample Size Random assignment of subjects to treatment or control groups Double-blinding Equal sample sizes for control and treatment group

Equal sample sizes for control and treatment group

When examining the shape of a distribution of numerical data, which of the following is NOT one of the three basic characteristics of a distribution's shape? Whether the distribution is symmetric of skewed How many numbers are in the data set How many mounds appear Whether any unusually large or small values are present

How many numbers are in the data set

A standard unit measures what?

How many standard deviations away an observation is from the mean

What are some things that should be asked when developing an understanding of data?

How where the variables measured? What variables were measured? Who collected the data?

The length of the box in a boxplot is proportional to what?

IQR

The interquartile range is the measurement of variability best used when the distribution is skewed. Its formula is?

IQR = Q₃ - Q₁

When computing the correlation coefficient (r), what is the effect of changing the order of the variables on r?

It has no effect on r

What is the first step in almost every investigation of data?

Make an appropriate graph

The range is a crude measure of variability. It's formula is?

Maximum - Minimum

What are two measures of the center of distribution?

Mean and Median

When can a correlation coefficient (r) based on an observational study be used to support a claim of cause and effect?

Never

What are two basic types of variables in statistics?

Numerical and Categorical

Values so large or so small that they do not fit into the pattern of the distribution are called what?

Outliers

What two-step process is used to examine distributions?

See the data and summarize it

The standard deviation is the measure of variability best used if the distribution is symmetric. The formula for standard deviation is?

Standard Deviation = s = √(∑(x-x̄)²/(n-1)

In an experiment studying the association between a treatment variable and an outcome variable, the group of people who do NOT receive the treatment are called what?

The Control Group

Under what conditions is the use of the mean preferred?

The mean is preferred when the data is relatively symmetric

In a right-skewed distribution which of the following is true? The mean tends to be less than the median The mean and median are approximately the same The mean tends to be greater than the median None of these

The mean tends to be greater than the median

Under what conditions is the use of the median preferred?

The median is preferred when the data is strongly skewed or has outliers

What are the fiver numbers needed to make a boxplot?

The minimum Q1 The Median Q3 The mamimum

If an observation has a z-score of 0, what does that mean?

The observation is equal to the mean

The outcome variable in a question about causality is also referred to as what?

The response variable

Because the human eye has a difficult time judging how much area is taken up by the wedge-shaped slice of a pie chart, which of the following is true of pie charts? They are only used for small data sets They are only used if they are made by a computer They are not commonly used by statisticians or in scientific settings They are preferred over bar graphs

They are not commonly used by statisticians or in scientific settings

Why are percentages or rates often better than counts for making comparisons?

They take into account possible differences among the sizes of the groups.

An important use of the regression line is to do what?

To make predictions about the values of y for a given x-value

Why is random assignment used to assign people to treatment groups and control groups in a controlled experiment?

To make the groups as similar as possible, minimizing bias.

In a boxplot, the whiskers extend to?

To the most extreme values that are not potential outliers

The existence of multiple mounds in a distribution is sometimes a sign of what?

Two very different groups have been combined into a single collection

The circles shown are similar, but not exactly the same. This is an example of?

Variation

Which of the following is NOT something that one looks for when studying scatterplots? Shape Variation Strength Trend

Variation

The study of statistics rest on what two major concepts?

Variation and data

A stemplot is often useful when?

When technology is not availiable and the data set is not large

What is used to compare values measured in different units, such as inches and pounds?

Z-Score

The formula for the intercept (a) of a regression line is?

a = ȳ - bx̄

Since, in general, the longer a car is owned the more miles it travels one can say there is a ______ between age of car and mileage.

a positive association

The formula for the slope (b) of a regression line is?

b = r(Sy/Sx)

In a histogram, observations are grouped into intervals called _____.

bins

Changing the width of bins in a histogram ______

changes the shape of the histogram

In statistics variables are

characteristics of people or things

The value that measures how much variation in the response variable is explained by the explanatory variable is called the ____.

coefficient of determination

Data are more than just numbers, because data have _____

context

The _____ is a number that measures the strenth of the linear association between two numerical variables.

correlation coefficient (r)

The ____ organizes data by recording all the values observed in a sample as well as how many times each values was observed.

distribution of a sample

Attempting to use the regression equation to make predictions beyond the range of the data is called _____

extrapolation

The number of times a value is observed in a data set is called a ___

frequency

Since outliers can greatly affect the regression line they are also called ____ points

influential

Another name for the regression line is the ____ line.

least squares line

The ____ is another term for the arithmetic average.

mean

The mean is the measure of center best used if the distribution is symmetric. Its formula is?

mean = x̄ = ∑x/n

In a boxplot, the vertical line inside the box marks the location of the _____.

median

The value that would be right in the middle if you were to sort the data from smallest to largest is called the ____

median

When a distribution is skewed, the ____ is used to measure the center and the ____ is used to measure variation.

median interquartile range

When describing the distribution of a categorical variable, the category that appears most often is called the ____

mode

In statistics, the data we work with is just one part of a bigger picture called the

population

When writing a regression equation, what are names for the y-variable?

predicted variable response variable dependent variable

When writing a regression equation, what are names for the x-variable?

predictor variable explanatory variable independent variable

"Relative frequency" is the same as?

proportion

Categorical values are also referred to as ____ variables.

qualitative

Numerical values are also referred to as ____ variables.

quantitative

The correlation coefficient (r) measures the strength of a linear association. What is its formula?

r = (∑ZxZy)/n-1

The ____ is a tool for making predictions about future observed values and is a useful way of summarizing a linear relationship.

regression equation

Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.

resistant

When describing two-variable associations, a written description should always include what?

trend shape strength context

The correlation coefficient (r) makes sense only if the trend is linear and the ____

variables are numerical

Variance is another measure of variability and is used if the distribution is symmetric. What is the variance formula?

variance = s² = (∑(x-x̄)²/(n-1)

A large amount of scatter in a scatterplot is an indication that the association between the two variables is ____.

weak

A z-score converts observations into standard units. Its formula is?

z = (x-x̄)/s


Kaugnay na mga set ng pag-aaral

Live Virtual Machine Lab 2.3: Module 02 Organizational Networking Diagrams and Agreements

View Set