Applied Statistics 2 (Cumulative Final)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Dummy (Indicator) Variables are coded as:

0 or 1

Which of the following is the MSEvalue for a randomized block design with 4 treatment levels and 5 cases per group, with SSC = 8.64, SSE = 1.04, and SSR = 1.26?

0.087 (The MSE, or mean square error, is calculated as MSE = SSE /(N-n-C+1) where N is the total sample size, n is the sample size for a group, and C is the number of groups.)

For a Simple Regression equation: b₀ = 0.54, b₁ = 0.46, SSxx = 12.5, and SSyy = 13.2. What is the value of the coefficient of determination?

0.20

For a Regression Analysis: SSE = 2.86, SSxx = 3.23, and SSyy = 3.81. What is the coefficient of determination?

0.24

For a Regression Analysis: SSE = 4.89, SSxx = 7.65, and SSyy = 6.69. What is the coefficient of determination?

0.26

For a Simple Regression equation: b₀ = 0.65, b₁ = 0.5, SSxx = 16.2, and SSyy = 12.7. What is the value of the coefficient of determination?

0.32

Suppose there is a 0.35 chance it will rain on a given day. The odds of it raining on that day are:

0.54 (0.35/(1 - .35) = 0.54)

For a Regression Analysis: SSE = 4.78, SSxx = 10.65, and SSyy = 12.72. What is the coefficient of determination?

0.62

For a Simple Regression equation: b₀ = 0.91, b₁ = 0.75, SSxx = 6.4, and SSyy = 5.8. What is the value of the coefficient of determination?

0.62

Types of Experimental Design:

1. Completely Randomized Design 2. Randomized Block Design 3. Factorial Experiments

Regression Model (assumptions):

1. model is linear 2. error terms have constant variables 3. error terms are independent 4. error terms are normally distributed

When the goal is to test the assumptions of the regression model, the ordered pair __________ should be plotted.

(x, y- ŷ)

(Tukey) A neutral ladder for x is:

- (1/√x)

t test is used:

- when population variances are unknown (but assumed to be equal) - to find the differences in two related samples

z test is used:

- when testing differences in population proportions - to find the difference between two independent sample means

(Tukey) A neutral ladder for y is:

-(1/√y)

The variable that is not being controlled for by the researcher in an experiment, but can have an effect on the outcome of the treatment studied is called a:

Confounding/Concomitant variable

In a General Linear Regression Model, scatter plots may not reveal a _____ relationship between x & y.

Curvilinear

(True/False) Interaction can be examined as a separate dependent variable.

False (independent)

(True/False) Q2 is equal to the mean.

False (median)

(True/False) Dummy variables are quantitative.

False (qualitative)

One distinction between the stepwise model building and forward selection is that:

in forward selection, a variable is never removed from the model once it has been entered

A researcher wants to study the strength of different metal thicknesses. Other factors that may influence strength include humidity, temperature, and the area over which force is applied. The metal's thickness is the:

independent variable

In a 2-way Anova, " Xijk " refers to:

individual observations

α (interpretation):

intercept for baseline group

When using a T-test to determine differences in two related samples, the df is calculated as:

n-1

In Multiple Regression w/ p predictor variable, when constructing a confidence interval for any βi, the degrees of freedom for the tabulated value of t should be:

n-p-1

A bigger sample is...

narrower

If H₀: β₁ = 0 and H₁: β₁ < 0 , the relationship is:

negative

If a distribution is skewed left it is ________ skewed.

negatively

By 'failing to reject' the null hypothesis, we are saying that the regression model has:

no significant predictability for the Dependent variable

(n×p) & n×(1 - p) must both be at least 5 in order to use normal approximation for z test for a

population proportion

β₀ (Simple Regression Model):

population y-intercept

If H₀: β₁ = 0 and H₁: β₁ > 0 , the relationship is:

positive

If a distribution is skewed right it is ______ skewed.

positively

A researcher is studying the proportion of customers of a particular product who are female. The null and alternative hypotheses are H0: p = 0.72 and Ha: p < 0.72. Which of the following alternative population proportions carries the greatest probability of committing a Type II error?

p₁ = 0.70

What does it mean if an adjusted R² is negative?

regression carries no meaning & is not useful

β₀ (Multiple Regression Model):

regression constant

Multiple regression models produce a:

response surface

In a 2-way ANOVA, the term "main effects" refer to the:

row and column effects

A first-order regression model with at least two independent variables is defined by the fact that:

the highest power of either variable is 1

In Tukey's Ladder of Transformations, the Four-Quadrant Approach determines:

which expressions on a ladder are more appropriate for a given situation

A smaller sample is...

wider

In the regression model, Y=α+βx+ε, the change in Y for one uni increase in x:

will always be the same amount, β

In Probabilistic Multiple Regression models, ___ is sometimes referred to as the "response variable"

In regression analysis, a residual is calculated as:

y - ŷ

What value is driven by confidence level?

z-value

The parameters to be estimated in the Simple Linear Regression Model Y=α+βx+ε ε~N(0,σ) are:

α, β, σ

You are given the regression model yi = β₀ + β₁xi + ei; which of the following changes is predicted in the value of yi from a 1-unit change in the value of xi?

β1

Given the regression model yi = β₀ + β₁xi + ei; which of the following changes is predicted in the value of yi from a 1-unit change in the value of xi?

β₁

In a Multiple regression model w/ 3 independent variables, the Partial Regression Coefficient for the 3rd independent variable is written as:

β₃

Which of the following statements is true concerning the Slope-Intercept equation of the Regression line?

The b indicates the y intercept

In Experimental Design, the variable that is controlled or modified is called the:

Treatment variable

The general direction of data over the long term is called a(n):

Trend

What represents the # of standard deviations a value is above or below the mean of a data set?

Z-score

Percentile location is calculated as:

i = (p÷100)×n

Calculating Skewness (w/ Mode):

(mean - mode) ÷ (standard deviation)

Calculating Skewness (w/ Median):

(median - mean) ÷ (standard deviation)

Interval Estimates are also known as:

Confidence Intervals

In general, the Least Squares Regression approach finds the equation:

that has the smallest sum of squared errors

Given the multiple regression equation ŷ = β₀ + β₁ x₁ + β²x², what is the meaning of β²?

the amount of change in y predicted from a one unit change in x₂, if all other variables are held constant

Given the multiple regression equation ŷ = β₀+ β₁ x₁ + β₂x₂, what is the meaning of β₂?

the amount of change in y predicted from a one unit change in x₂, if all other variables are held constant

In __________, the researcher controls or manipulates one or more variables.

Experimental Design

"Can give bad predictions if the conditions do not hold outside the observed range x's."

Extrapolation

"When Elvis Presley died in 1977, there were 48 professional Elvis impersonators. Today there an estimated 7328. If that growth is projected, by the year 2012, one person in four on the face of the globe will be an Elvis impersonator." This is an example of:

Extrapolation

Qualitative Variables are also known as:

"Indicator" or "Dummy Variables"

If a scatter plot of x & y indicates a shape shown in the upper left quadrant, recoding should move:

"down the ladder" for x

k (interpretation):

# of Independent variables

C (interpretation):

# of groups or column treatments

N (interpretation):

# of observations OR total sample size

R (interpretation):

# row treatments

The property taxes paid in Oakville over the past 5 years are reported as: $3600, $3500, $3450, $3300, and $3050. What is the average deviation from the mean?

$213.9

If a dataset has k independent variables, there are ________ possible regression models.

(2^k) - 1

F statistic for overall regression is calculated as:

(SSreg/dfreg) / (SSerror/dferror).

Z-score for a Sample is calculated as:

(Xi-Xbar)/S

Z-score for a Population is calculated as:

(Xi-µ)/σ

Necessary Sample Size is calculated as:

(Z-score)² × (Stand Dev.×(1-Stand Dev.)) ÷ (Margin of Error)²

β₁ (Simple Regression Model):

(hypothesized) population slope

For the Pearson's product-moment correlation coefficient, a perfect negative relationship is denoted by a correlation of:

-1.0

Variance Inflation Factor (VIF) is calculated as:

1/ (1 - R²)

A researcher is planning a factorial ANOVA with 2 factors: years with the company (6 levels) as the row effect, and educational level (3 levels) as the column effect. There are 60 employees included in the study. What will the degrees of freedom for the interaction effect?

10 (calculated as (R - 1)(C - 1), where R is the number of row treatments & C the number of column treatments. (6 - 1)(3 - 1) = 10.)

Given the Multiple Regression equation ŷ =12.5+ 4.98x¹ - 2.35x², the Regression Constant is:

12.5

A Simple Regression equation predicts sales (in millions of dollars) by year: ŷ = -4,865.81 + 3.07 year. The expected difference over 5 years is:

15.35 million higher (5 * 3.07 = 15.35)

One rule of thumb for diagnosing Multicollinearity in a regression equation is:

2 independent variables that have a correlation above 0.90

In a ________, research interest is focused on both variables; in a ________, the focus is on one variable, with the other included to control its effects.

2-way factorial ANOVA

# of dimensions required to fit the resulting Response Surface for a Regression Model w/ 2 independent variables is:

A researcher is studying a dataset with 5 independent variables. How many possible regression models can be created with these data?

31 (2⁵ - 1 = 31)

A sample size = ________ is not appropriate to test hypotheses about a single mean using the z test.

35, population variance unknown

For ŷ =11.2 + 3.98x¹ - 1.35x² - 4.33x³ + 2.35x⁴, the value of k =

How many independent variables are in a Multiple Regression equation w/ 5 simultaneous equations?

To determine a Multiple Regression equation, 5 simultaneous equations are required. How many independent variables does the Multiple Regression equation have?

In the Least Squares analysis, how many equations are required to estimate a Regression Model with 4 independent variables?

For a Logistic Regression, there should be at least ___ observations for each predictor variable.

In a Logistic Regression, at least __ observations are required for each predictor variable.

A researcher is planning a factorial ANOVA with 2 factors: gender (2 levels) and education (3 levels). How many cells will the table have?

All else held equal, which of the following will result in a higher coefficient of determination?

A larger value for SSR

When α = 0.05, β = 0.10, _______ of the area under the curve will be in the non-rejection region.

95%

The following regression equation was developed to predict the prime interest rate, with independent variables of unemployment rate (x1) and personal savings rate (x2): ŷ = 8.32 - 1.58x1 + 0.88x2. Which of the following statements is true?

A 1% change in the unemployment rate is predicted to produce a GREATER change in the prime interest rate than a 1% change in the personal savings rate, all else held equal.

In Experimental Design, a variable present prior to the experiment is called a:

Classification variable

ANOVA (refers to):

ANalysis Of VAriance

The __________ search procedure computes all possible linear multiple regression models from the data using combinations of the variables.

All Possible Regressions

What is NOT true regarding using a regression model with a dichotomous outcome?

All the independent variable must also be dichotomous

Arithmetic Mean:

Average of a group of #'s; found by adding all the #'s & dividing it by the # of #'s

R² (interpretation):

Coefficient of Determination

A researcher builds a regression model by entering all the predictors into the equation, & removing any that are not significant predictors. What approach to model building is this?

Backward elimination

Which step-by-step process begins w/ the "full" model (all k predictors) ?

Backwards Elimination

The variable that researchers want to control, but is not the treatment variable of interest is called a:

Blocking variable

Which of the following states a critical requirement for using the F distribution to make inferences about two population variances?

Both populations must be normally distributed.

In ________________________, subjects are assigned randomly to treatments.

Completely Randomized Design

For one-way ANOVA, the degrees of freedom for the numerator & denominator of the F statistic are:

C - 1, N - C

"Can be erroneously assumed in an observational study."

Cause & Effect

X² (interpretation):

Chi-Square Distribution

Which of the following states an assumption of the t test for the difference in means between two related populations?

D is normally distributed

y (interpretation):

Dependent variable

In an Experimental Design, which variables respond to the different levels of Independent variables?

Dependent variables

What is found by subtracting the mean from each value of data?

Deviation from Mean

"Used in a Regression Model to represent Categorical variables."

Dummy Variables

The average value of y for a given value of x in regression analysis is written as:

E(yx)

In __________, every level of treatment is studied under the conditions of every level of all other treatments.

Factorial Design

(True/False) A factorial design for an ANOVA can only be used if the study includes at least one factor which has at least 3 levels.

False

(True/False) In an ANOVA table provided by the computer printout for a bivariate regression analysis, total DF is always 2 greater than residual error DF.

False

(True/False) The Standard Error of Estimate is the sum of the residuals column in a regression analysis.

False

(True/False) The coefficient of determination has a range from -1 to +1.

False

(True/False) The definition of a Type II error is rejecting the null hypothesis when it is true.

False

(True/False) The standard error of estimate is the sum of the residuals column in a regression analysis.

False

Which Multiple Regression model, with two independent variables, is the simplest?

First-Order Regression

yhat=a+b₁x₁+b₂x₂+...+bpxp

Fitted Equation

"a variable is never removed from the model once it has been entered"

Forward Selection

An advertising company wants to know if a commercial can change people's opinions toward a brand of household appliances. They have a sample of 35 adults indicate their opinion of the brand, before and after watching a commercial intended to raise the brand's favorability. If they begin with the assumption that viewing the commercial will have no effect on the ratings of the brand, what are the null and alternative hypotheses?

H₀: D = 0 & Ha: D ≠ 0

A researcher studying the influence of educational level (row effects: 3 levels) & years of experience (column effects: 4 levels) on worker accuracy. What are the null & the alternative hypotheses for the interaction effects?

H₀: the interaction effects are zero Ha: there is an interaction effect

Which of the following are the null and the alternative hypotheses for the interaction effects when a researcher studies the influence of educational level (row effects: 3 levels) and years of experience (column effects: 4 levels) on worker accuracy?

H₀: the interaction effects are zero Ha: there is an interaction effect

Which of the following is the null hypothesis for the row effects, when a researcher studies the influence of educational level (row effects: 3 levels) and years of experience (column effects: 4 levels) on worker accuracy?

H₀: μ₁. = μ₂. = μ₃. Ha: at least one row mean is different from the others.

In a randomized block design, the null hypothesis for the blocking effects for an ANOVA with 4 levels of treatment effect is:

H₀: μ₁. = μ₂. = μ₃. = μ₄. Ha: at least one of the blocking means is different from the others.

A researcher studying the influence of educational level (row effects: 3 levels) and years of experience (column effects: 4 levels) on worker accuracy. What is the null hypothesis for the row effects?

H₀: μ₁. = μ₂. = μ₃. and Ha: at least one row mean is different from the others.

In a random block design with 3 treatment groups, the null and alternative hypotheses for the blocking effects are:

H₀: μ₁. =μ₂. = μ₃. & Ha: at least one of the blocking means is different from the others

Which of the following will increase the probability of committing a Type I error?

If the alternative mean is close to the sample mean

A researcher is testing a regression model of the form: y = β0 + β1x1 + β2x2 + β3 β3 + e. Which of the following statements is true?

If β3 is positive, x₁ is negative; If x₂ is negative, β3 x₁x₂ will be positive.

The sample size requirements for ____________ are larger than for multiple regression.

Logistic Regression

x (interpretation):

Independent variable

"Worst kind of outlier, can totally reverse the direction of association between x & y.

Influential Points

What can be designed by multiplying the data values of one variable by the values of another variable, creating a new variable?

Interaction Predictor Variables

What yields a range, that declares within certain confidence, where the population mean (or parameter) is located?

Interval Estimate (Confidence Interval)

What describes the amount of peakedness of a distribution?

Kutosis

What is a distribution called if it is high & thin?

Leptokurtic

F (treatments) calculation:

MSC÷MSE

The F statistic for treatment is calculated as:

MSC÷MSE

F (blocks/rows) calculation:

MSR÷MSE

Example of an appropriate Indicator Variable:

Marital Status

Which of the measures of central tendency is the largest in a positively skewed distribution?

Mean

The average of the absolute values of the deviations around the mean for a set of numbers is called the:

Mean Absolute Deviation (MAD)

All students who took the economics final exam scored over 75%. Unfortunately, four students were absent for the test, and the computer listed each of their scores as 0. Assuming that no scores were repeated more than once, with the exception of the 0s, which measure of central tendency would most likely give the best representation of this data?

Median

What is a "normal" distribution between Leptokurtic & Platykurtic called?

Mesokurtic

"Problem that can occur when the information provided by several predictors overlaps."

Multicollinearity

In a Multiple Regression Model, when two or more of the independent variables are highly correlated they are said to have:

Multicollinearity

The sample size requirements for logistic regression are larger than for _____________.

Multiple Regression

y=α+β₁x₁+B₂x₂+...+βpxp+ε ε~N(0,σ²)

Multiple Regression Model

Which of the following is allowed in a First-order regression model?

Multiple independent variables

"A point that lies far away from the rest."

Outliers

Odds Ratio is calculated as:

P ÷ (1-P)

Holding all other variables constant, what represents the increase in value of y from one-unit increase in that independent variable?

Partial Regression Coefficient

What is a measure of central tendency that divides a group of data into 100 parts?

Percentiles

What is a distribution called if is flat & spread out?

Platykurtic

σ (interpretation):

Population Standard Dev.

σ² (interpretation):

Population Variance

Which quartile is found at the 25th percentile?

What quartile is found at the 50th percentile and is equal to the median?

Which quartile divides the first three-quarters of data from the last quarter or is the 75th?

Interquartile Range is calculated as:

Q3-Q1

"Used when a numerical predictor has a curvilinear relationship with the response."

Quadratic Regression

Dummy (Indicator) Variables are:

Qualitative

What measure of central tendency divides a group of data in four subgroups or parts?

Quartiles

A _____________ focuses on one dependent variable (treatment variable) of interest.

Randomized Block Design

In a random block design with 3 treatment groups of 6 cases each, and an alpha value of 0.10, the F statistic for treatment is 3.02, and the F statistic for blocks is 2.40. The null hypotheses are that all treatment and block means are the same. What is the correct decision, given the sample data?

Reject H₀ for treatment; do not reject H₀ for blocks.

Is the Observed Value of y minus the Predicted Value of y for the observed x."

Residual

"Used to check the assumptions of the regression model."

Residual Plots

"Proportion of the variability in y explained by the regression model."

R²

We can measure the proportion of the variation explained by the regression model by:

R²

What will increase, every time, if you add another variable into a residual?

R²

"Used when trying to decide between two models with different numbers of predictors."

R² adjusted

In a Randomized Block design, what factors make up the Total Sum of Squares?

SSC, SSR, & SSE

The MSC is calculated as:

SSC/C-1

The MSE, or mean square error, is calculated as:

SSE /(N-n-C+1)

For a Bivariate Regression Analyses, the coefficient of determination can be calculated as:

SSE/SSyy

Of the following formulas, which can be used to calculate the coefficient of determination for a Multiple Regression Analysis?

SSR/SSyy

s² (interpretation):

Sample Variance

What is used to describe a distribution that is asymmetrical or that lacks symmetry?

Skewness

Standard Error of Estimate (Se) is also known as:

Standard Deviation of the Error of Regression Model

Se (interpretation):

Standard Error of Estimate

Which approach to model building begins w/ a single predictor & adds the other predictors one at a time, testing model fit after each?

Stepwise Regression

Factors:

Subcategories or Independent variables

SSC (interpretation):

Sum of Squares Column (A.K.A.) Treatment Sum of Squares

Which of the following statements best represents results when a researcher conducts a 2-way factorial ANOVA examining the influence of years of experience and job type on income?

The influence of years of experience on income depends on the job type.

Which of the following is true regarding the general linear regression model?

The relationship between the dependent variable & the predictors may be curvilinear.

Which of the following is true of Bivariate Regression?

The variable to be predicted is called the dependent variable

Which of the following statements indicate a situation in which 2 variables may not be appropriate for Simple Regression?

Their scatterplot shows a curved relationship

SSE (interpretation):

Total Sum of Errors

SST (interpretation):

Total Sum of Squares

(True/False) A Logistic Regression does not assume a linear relationship between the dependent & independent variables.

True

(True/False) A factorial design for an analysis of variance (ANOVA) can only be used if the study includes at least 2 factors, each of which have at least 2 levels.

True

(True/False) A larger z-score indicates a larger difference in sample means.

True

(True/False) An interval with 100% confidence is so wide that it is meaningless.

True

(True/False) Central Limit Theorem only applies when the sample size is large.

True

(True/False) Dummy or Indicator variables are nominal & ordinal.

True

(True/False) Dummy variables are dichotomous.

True

(True/False) Dummy variables convey information at the Nominal or Ordinal level.

True

(True/False) In a General Linear Regression Model, the dependent variable (y) is not necessarily linearly related to the predictor variables.

True

(True/False) In a General Linear Regression Model, the parameters βi are linear

True

(True/False) In a Logistic Regression, data does not need to be normally distributed.

True

(True/False) In a Logistic Regression, the independent variables do not have to be interval/ratio in data level.

True

(True/False) In a regression model with a dichotomous outcome, the error terms are not independent.

True

(True/False) In a regression model with a dichotomous outcome, the error terms do not follow a normal distribution.

True

(True/False) In a regression model with a dichotomous outcome, there is heteroscedasticity of error variances.

True

(True/False) In a regression model with a dichotomous outcome, there is no guarantee that predicted values will always be between 0 & 1.

True

(True/False) In the First-Order Regression Model, the highest predictor variable is 1.

True

(True/False) In the First-Order Regression Model, there are no interaction terms

True

(True/False) Interaction can be examined as a separate independent variable.

True

(True/False) Logistic Regression does not assume a linear relationship between dependent & independent variables.

True

(True/False) Point Estimates are less accurate than Confidence Intervals.

True

(True/False) Sample distribution is approx. normal for small samples if the population is normally distributed.

True

(True/False) Sample size requirements for Logistic Regression are larger than for Multiple Regression.

True

(True/False) The Logistic Regression is commonly used to develop models to predict dichotomous dependent variables.

True

(True/False) The best estimate of y for any value of x is the mean of the y values if the slope of a simple regression model is 0.

True

(True/False) The definition of a type II error is failing to reject the null hypothesis when it is false.

True

(True/False) The formula used in solving for Multiple Regression Coefficients are based on the principle of minimizing the Sum of Squares for the model.

True

(True/False) The mean of the mean is always equal to the population mean.

True

(True/False) Variables not controlled by the researcher in the experiment are confounding variables.

True

(True/False) Variances cannot be negative.

True

The probability of a _________ is determined in a reference to specific alternatives while the probability of a _______ is not determined in reference to specific alternatives.

Type II error; Type I error

Researchers are studying the average age of production line workers in an industry. A previous study found the mean to be 47.8 years, but they believe it to be lower and plan a study using a one-tailed test of a single population pro portion with alpha = 0.10. The population standard deviation is 11.9. They are considering using a sample size of either 30 or 40. If they go with the smaller sample size, how will this change the critical raw score value compared to using the larger sample size?

Using the smaller sample size decreases the critical raw score by 0.47

Chi-Square Distribution is calculated as:

X² = ((n-1)s²)/ (σ²)

Researchers are studying the average age of production line workers in an industry. A previous study found the mean to be 55.2 years, but they believe it to be higher. They decide to plan a study using a one-tailed test of a single population proportion with alpha = 0.01. The population standard deviation is 12.4. He is considering using a sample size of either 35 or 45. If he goes with the smaller sample size, how will this change the critical raw score value compared to using the larger sample size?

Using the smaller sample size increases the critical raw score by 0.57

What value is the average of the squared deviations about the "arithmetic mean" for a # set?

Variance

What is a "Box-and-Whisker" plot?

a diagram that utilizes the upper & lower quartiles, the median, & the 2 most extreme values to depict a distribution graphically

A researcher is studying price differences in homes in two cities. All else held equal, which of the following would result in a larger z score?

a larger difference in sample means

In the population of a city, it is believed that 75% of adults age 18-65 are employed. In a sample of 100, 68% are employed. All else held equal, which of the following would result in a wider confidence interval for the mean ?

a sample size of 50

A Correlation Coefficient of 0.45 between 2 variables can be described as having a:

moderate positive relationship

If you compare the mean of raw data and the mean of the same raw data grouped into a frequency distribution, the 2 means will be

approximately equal

Rejecting the null hypothesis indicates that:

at least 1 of the Independent variables is adding significant predictability

Point Estimates estimate a:

population parameter

Forward Selection:

begins by finding the independent variable that will produce the largest absolute value of t (& largest R²) in predicting y

A variable that is not the treatment of interest, but that a researcher wants to control, is a:

blocking variable

A Type II error exists when a researcher ______ the null hypothesis when it is ________.

fails to reject; true

In a 2-way Anova, " k " refers to:

cell members

β₂ (interpretation):

change in intercept

β₃ (interpretation):

change in slope

If a predictor variable x is found to be highly significant we would conclude that:

changes in x are associated to changes in y

Both the prediction interval for a new response & the confidence interval for the mean response are narrower when made for values of x that are:

closer to the mean of the x's

In a 2-way Anova, " j " refers to:

column/treatment levels

A researcher is studying the effect of employee incentive plans in decreasing absenteeism. He plans a 2-way factorial design with the type of incentive plan as the treatment variable, and the shift worked as the second variable. He believes that age may also influence the effectiveness of the treatment effect. In this case, age is an example of a ______ variable.

concomitant

A researcher is planning a factorial ANOVA with 2 factors: years with the company (5 levels) as the row effect, and educational level (3 levels) as the column effect. There are 60 employees included in the study. What will the degrees of freedom for the interaction effect?

df = 8 calculated as (R - 1)(C - 1); R is the # of row treatments & C the # of column treatments

Range is the:

difference between the largest & smallest value in a data set

The value of Q2 is:

equal to the median or the 50th percentile

E (Simple Regression Model):

error of prediction

A ________ regression model with at least two independent variables is defined by the fact that the highest power of either variable is 1.

first order

The value of Q1 is:

found at the 25th percentile

The value of Q3 is:

found at the 75th percentile

A _________ will produce a larger coefficient of determination, all else being held equal.

larger SSR

All else being equal, which of the following will result in a larger value for the z score when estimating the population proportion?

larger n

Simple Regression models produce a:

line

In a 2-way ANOVA, the row & column effects refer to __________ effects.

main

Studies have shown a high positive correlation between the number of firefighters dispatched to combat a fire & the financial damages resulting from it. A politician commented that the fire chief should stop sending so many firefighters since they are clearly destroying the place. This is an example of:

misuse of causality

For a T-test, where the population variances are not known but assumed to be equal, the df is calculated as:

n₁+n₂-2

One-way Anova vs. Two-way Anova

one treatment vs. at least two treatments

In a regression model with a dummy variable without interaction there can be:

only one slope, but more than one intercept

Odds Ratio is computed as:

p / (1 - p)

The sample proportions of ____________ will result in the widest confidence interval for the population proportion when samples of size 50 are drawn from several different cities and the proportion of employed adults calculated for each, estimating the proportion of employed adults in each city.

p-hat = 0.49

β₁ (Multiple Regression Model):

partial regression coefficient for independent variable 1

µ (interpretation):

population mean

A researcher is studying the influence on location and nearby competitors on daily sales for a number of drug stores in a single city. The column variable is the whether there are any competitor stores within 1 mile of the studied store (yes or no) and the row variable is location (central city, suburban, rural, or mall). A 2-way ANOVA with 5 cases per cell produced the following result: SSC = 8.54, SSR = 3.92, and SSI = 2.78. The MSE has already been calculated as 0.92. If alpha = 0.10, which of the main and interaction effects are significantly different from 0?

row effect

In a 2-way Anova, " i " refers to:

row treatment level

The coefficient of determination can be calculated as:

r² = (b₁² × SSxx)÷(SSyy)

Point Estimates are taken from a:

sample

Which of the following is an example of a point estimate?

sample mean

b₁ (interpretation)

sample slope

b₀ (interpretation):

sample y intercept

In Tukey's Ladder of Transformations, the Four-Quadrant Approach is based on the:

shape of a scatter plot of x & y

β₁ (interpretation):

slope for baseline group

All else being held equal, which of the following will result in a smaller t statistic?

smaller n

The p-value for the ANOVA test was 0.0952, so there is ________ evidence that scores on the test depend on the size of the cash incentive.

some

At the same confidence level, a prediction interval for a new response is always:

somewhat larger than the corresponding confidence interval for the mean response

A research hypothesis:

states what a researcher believes will be the result of a study

Pearson's correlation coefficient of -0.97 indicates what relationship between the variables?

strong negative

In Tukey's Ladder of Transformations, Ladders:

suggest potential ways to recode the data

Which distribution should be used if a population standard deviation is unknown & the population is normally distributed?

t distribution

Anova compares:

the means of more than 2 populations

One problem with Multicollinearity in a regression equation is:

the predictors expected to have a positive relationship with the dependent variable may have a negative sign in the equation

One distinction between a randomized block design & a one-way ANOVA is:

the randomized block design includes both the treatment of interest & a second controlled variable.

If the overall F test for a Multiple Regression does not indicate that the null hypothesis should be rejected:

the regression model has no predictability for the dependent variable

In a Multiple Regression model, where x's are predictors & y is the response, Multicollinearity occurs where:

the x's provide redundant information about y

A researcher is studying the influence of gasoline prices on miles driven per week by adults in a large city. In a simple regression modeling this relationship,

the y variable will be the miles driven

In a simple linear regression, when β is not significantly different from zero, we conclude that:

there is no linear relationship between X and Y

n (interpretation):

total sample size for a group

An experimenter is studying the influence of incentives on employee productivity designs. In this experiment, some employees receive a bonus for meeting certain goals, while others do not. This is an example of a(n) ____________ variable.

treatment

Independent variable:

treatment/classification variable

If you are interested in describing only the variability in your sample data, but not in calculating the population variance, you should use the

unbiased estimate of the variance

Interquartile Range is the range of:

values between the 1st & 3rd quartile or the middle 50% of the data set

In order to study the differences in population proportions, a researcher draws a sample of size 30 from each population and calculates the sample proportion of each. Which of the following correctly states the formula for the mean difference in sample proportions?

Applied Statistics 2 (Cumulative Final)

Ensembles d'études connexes

Chpt. 13 HW

Launchpad 13

E3 ER Exam

International Business Law, Chapter 12 - Imports, Customs, & Tariff Law

econ 1100 test practice

9.27.F - Test: 3rd Quarter DOES NOT HAVE MODULE 9, STUDY SET

Chapter 35 Pain Prep U

CH 9

Retirement Plans Exam Review

MGT 391 Quantitative Analysis Mid-Term Exam

EMT CHAPTER 12

03.08 Module Three Exam

History Exam 2

Blood typing and Genetics

CFP - 5111 Textbook Questions I got wrong

Unit 3-5 Review Study Guide

Week 8

All Sections

ITM 456 Chapter 6 Final

MGT 449 Ch 4