MIS 345/MKTG 343: EXAM 2
If a scatterplot of residuals shows a parabola shape, then a logarithmic transformation may be useful in obtaining a better fit. True False
False
In regression analysis, the unexplained part of the total variation in the response variable Y is referred to as the sum of squares due to regression, SSR. True False
False
In simple linear regression, the divisor of the standard error of estimate is n - 1, simply because there is only one explanatory variable of interest. True False
False
In testing the overall fit of a multiple regression model in which there are three explanatory variables, the null hypothesis is H0=B1=B2=B3 . True False
False
In the multiple regression model Y=6.75+2.25X1+3.5X2 we interpret X1 as follows: holding X2 constant, if X1 increases by 1 unit, then the expected value of Y will increase by 9 units. True False
False
Scatterplots are used for identifying outliers and quantifying relationships between variables. True False
False
The regression line Y= 3 + 2X has been fitted to the data points (4, 14), (2, 7), and (1, 4). The sum of the residuals squared will be 8.0. True False
False
In order to test the significance of a multiple regression model involving 4 explanatory variables and 40 observations, the numerator and denominator degrees of freedom for the critical value of F are 4 and 35, respectively. False True
True
In regression analysis, homoscedasticity refers to constant error variance. True False
True
In regression analysis, we can often use the standard error of estimate (Se) to judge which of several potential regression equations is the most useful. True False
True
In time-series data, errors are often not probabilistically independent. False True
True
Multicollinearity is a situation in which two or more of the explanatory variables are highly correlated with each other. False True
True
One of the potential characteristics of an outlier is that the value of the dependent variable is much larger or smaller than predicted by the regression line. True False
True
The R2 can only increase when extra explanatory variables are added to a multiple regression model. True False
True
The adjusted R2 is used primarily to monitor whether extra explanatory variables really belong in a multiple regression model. True False
True
The assumptions of regression are: 1) there is a population regression line, 2) the dependent variable is normally distributed, 3) the standard deviation of the response variable remains constant as the explanatory variables increase, and 4) the errors are probabilistically independent. True False
True
The multiple R for a regression is the correlation between the observed Y values and the fitted Y values.. True False
True
The residuals are observations of the error variable . Consequently, the minimized sum of squared deviations is called the sum of squared error, labeled SSE. False True
True
The two primary objectives of regression analysis are to study relationships between variables and to use those relationships to make predictions. True False
True
In a multiple regression analysis with three explanatory variables, suppose that there are 60 observations and the sum of the residuals squared is 28. The standard error of estimate must be 0.7071. True False
True
In a simple linear regression model, testing whether the slope of the population regression line could be zero is the same as testing whether or not the linear relationship between the response variable Y and the explanatory variable X is significant. True False
True
In a simple linear regression problem, if the standard error of estimate = 15 and n = 8, then the sum of squares for error, SSE, is 1,350. False True
True
In multiple regression, if there is multicollinearity between independent variables, the t-tests of the individual coefficients may indicate that some variables are not linearly related to the dependent variable, when in fact, they are. False True
True
In multiple regressions, if the F-ratio is small, the explained variation is small relative to the unexplained variation. True False
True
In order to estimate with 90% confidence a particular value of Y for a given value of X in a simple linear regression problem, a random sample of 20 observations is taken. The appropriate t-value that would be used is 1.734. False True
True
A researcher can check whether the errors are normally distributed by using: the Durbin-Watson statistic a frequency distribution or the value of the regression coefficient a t-test or an F-test a histogram or a Q-Q plot
a histogram or a Q-Q plot
In regression analysis, extrapolation is performed when you: attempt to predict beyond the limits of the sample have to use a lag variable as an explanatory variable in the model have to estimate some of the explanatory variable values do not have observations for every period in the sample
attempt to predict beyond the limits of the sample
In regression analysis, the variables used to help explain or predict the response variable are called the: independent variables dependent variables regression variables statistical variables
independent variables
A correlation value of zero indicates. a strong linear relationship a weak linear relationship no linear relationship a perfect linear relationship
no linear relationship
A scatterplot that appears as a shapeless mass of data points indicates: a curved relationship among the variables a linear relationship among the variables a nonlinear relationship among the variables no relationship among the variables
no relationship among the variables
The standard error of the estimate (Se) is essentially the mean of the residuals standard deviation of the residuals mean of the explanatory variable standard deviation of the explanatory variable
standard deviation of the residuals
A logarithmic transformation of the response variable Y is often useful when the distribution of Y is symmetric. True False
False
A regression analysis between sales (in $1000) and advertising (in $) resulted in the following least squares line: Y = 32 + 8X. This implies that an increase of $1 in advertising is expected to result in an increase of $40 in sales. True False
False
An interaction variable is the product of an explanatory variable and the dependent variable. True False
False
Correlation is measured on a scale from 0 to 1, where 0 indicates no linear relationship between two variables, and 1 indicates a perfect linear relationship. True False
False
The correlation value ranges from: 0 to +1 -1 to +1 -2 to +2 -¥ to+ ¥
-1 to +1
The percentage of variation (R2) ranges from: 0 to +1 -1 to +1 -2 to +2 -1 to 0
0 to +1
A multiple regression analysis including 50 data points and 5 independent variables results in (Sum of) e-squarred = 40. The multiple standard error of estimate will be: 0.901 0.888 0.800 0.953 0.894
0.800
Approximately what percentage of the observed Y values are within one standard error of the estimate () of the corresponding fitted Y values? 67% 95% 99% It is not possible to determine this.
67%
In regression analysis, which of the following causal relationships are possible? X causes Y to vary. Y causes X to vary. Other variables cause both X and Y to vary. All of these options are possible
All of these options are possible
Which approach can be used to test for autocorrelation? Durbin-Watson statistic F-test or t-test regression coefficient correlation coefficient
Durbin-Watson statistic
Given the least squares regression line, Y=8-3X, which statement is true? The relationship between X and Y is positive. The relationship between X and Y is negative. As X increases, so does Y. As X decreases, so does Y. There is no relationship between X and Y.
The relationship between X and Y is negative.
Which of the following is not one of the assumptions of regression? The response variable is not normally distributed. There is a population regression line. The response variable is normally distributed. The errors are probabilistically independent.
The response variable is not normally distributed.
A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line: Y = 84 +7X. This implies that if there is no advertising, then the predicted amount of sales (in dollars) is $84,000. True False
True
A confidence interval constructed around a point prediction from a regression model is called a prediction interval because the actual point being estimated is not a population parameter. True False
True
A constant elasticity, or multiplicative, model the dependent variable is expressed as a product of explanatory variables raised to powers. True False
True
A multiple regression model involves 40 observations and 4 explanatory variables produce SST = 1000 and SSR = 804. The value of MSE is 5.6. True False
True
A negative relationship between an explanatory variable X and a response variable Y means that as X increases, Y decreases, and vice versa. True False
True
A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line: Y= 84 +7X. This implies that if advertising is $800, then the predicted amount of sales (in dollars) is $140,000. True False
True
A regression analysis between weight (Y in pounds) and height (X in inches) resulted in the following least squares line: = 140 + 5X. This implies that if the height is increased by 1 inch, the weight is expected to increase on average by 5 pounds. True False
True
Correlation is used to determine the strength of the linear relationship between an explanatory variable X and response variable Y. True False
True
Homoscedasticity means that the variability of Y values is the same for all X values. False True
True
If exact multicollinearity exists, redundancy exists in the data. True False
True
If the regression equation includes anything other than a constant plus the sum of products of constants and variables, the model will not be linear. True False
True
In multiple regressions, if the F-ratio is large, the explained variation is large relative to the unexplained variation. True False
True
Which of the following is an example of a nonlinear regression model? a quadratic regression equation a logarithmic regression equation constant elasticity equation the learning curve model all of these choices
a logarithmic regression equation
An important condition when interpreting the coefficient for a particular independent variable X in a multiple regression equation is that: the dependent variable will remain constant the dependent variable will be allowed to vary all of the other independent variables remain constant all of the other independent variables be allowed to vary
all of the other independent variables remain constant
Time series data often exhibits which of the following characteristics? autocorrelation homoscedasticity multicollinearity heteroscedasticity
autocorrelation
Data collected from approximately the same period of time from a cross-section of a population are called: time series data linear data cross-sectional data historical data
cross-sectional data
The weakness of scatterplots is that they: do not help identify linear relationships can be misleading about the types of relationships they indicate only help identify outliers do not actually quantify the relationships between variables
do not actually quantify the relationships between variables
Another term for constant error variance is: multicollinearity autocorrelation homoscedasticity heteroscedasticity
homoscedasticity
Regression analysis asks: if there are differences between distinct populations if the sample is representative of the population how a single variable depends on other relevant variables how several variables depend on each other
how a single variable depends on other relevant variables
A point that "tilts" the regression line toward it, is referred to as a(n): influential point magnetic point extreme point explanatory point
influential point
When determining whether to include or exclude a variable in regression analysis, if the p-value associated with the variable's t-value is above some accepted significance value, such as 0.05, then the variable: is redundant is a candidate for exclusion is a candidate for inclusion does not fit the guidelines of parsimony
is a candidate for exclusion
The covariance is not used as much as the correlation because: it is not always a valid predictor of linear relationships it is difficult to calculate it is difficult to interpret of all of these options
it is difficult to interpret
Residuals separated by one period that is autocorrelated indicate: lag 1 autocorrelation time 1 autocorrelation simple autocorrelation redundant autocorrelation
lag 1 autocorrelation
Outliers are observations that: lie outside the sample render the study useless lie outside the typical pattern of points on a scatterplot disrupt the entire linear trend
lie outside the typical pattern of points on a scatterplot
In regression analysis, if there are several explanatory variables, it is called: simple regression multiple regression compound regression composite regression
multiple regression
The value k in the number of degrees of freedom, n-k-1, for the sampling distribution of the regression coefficients represents the: population size sample size number of coefficients in the regression equation, including the constant number of independent variables included in the equation
number of independent variables included in the equation
Suppose you forecast the values of all of the independent variables and insert them into a multiple regression equation and obtain a point prediction for the dependent variable. You could then use the standard error of the estimate to obtain an approximate: independence test confidence interval hypothesis test prediction interval
prediction interval
In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the other variables in the regression equation. sum ratio product mean
product
In linear regression, we fit the least squares line to a set of values (or points on a scatterplot). The distance from the line to a point is called the: fitted value residual correlation covariance estimated value
residual
The percentage of variation () can be interpreted as the fraction (or percent) of variation of the explanatory variable explained by the independent variable explanatory variable explained by the regression line response variable explained by the regression line error explained by the regression line
response variable explained by the regression line
In choosing the "best-fitting" line through a set of points in linear regression, we choose the one with the: smallest sum of squared residuals largest sum of squared residuals smallest number of outliers largest number of points on the line
smallest sum of squared residuals
Determining which variables to include in regression analysis by estimating a series of regression equations by successively adding or deleting variables according to prescribed rules is referred to as: elimination regression stepwise regression backward regression forward regression
stepwise regression
In linear regression, the fitted value is: the predicted value of the dependent variable the predicted value of the independent value the predicted value of the slope the predicted value of the intercept none of these choices
the predicted value of the dependent variable
Correlation is a summary measure that indicates: a curved relationship among the variables the rate of change in Y for a one-unit change in X the strength of the linear relationship between pairs of variables the magnitude of difference between two variables
the strength of the linear relationship between pairs of variables
When the error variance is nonconstant, it is common to see the variation increases as the explanatory variable increases (you will see a "fan-shape" in the scatterplot). There are two ways you can deal with this phenomenon. These are: the weighted least squares and the partial F stepwise regression and the partial F the weighted least squares and a logarithmic transformation the partial F and a logarithmic transformation
the weighted least squares and a logarithmic transformation
The term autocorrelation refers to: the analyzed data refers to itself the sample is related too closely to the population the data are in a loop (values repeat themselves) time series variables are usually related to their own past values
time series variables are usually related to their own past values
In linear regression, a dummy variable is used: to represent residual variables to represent missing data in each sample to include hypothetical data in the regression equation to include categorical variables in the regression equation when "dumb" responses are included in the data
to include categorical variables in the regression equation
A "fan" shape in a scatterplot indicates: unequal variance a nonlinear relationship the absence of outliers sampling error
unequal variance