Statistics Exam 3
Multiple regression equation
mathematical equation relating expected value or mean value of dependent variable to values of independent variables
Multiple regression model
mathematical equation that describes how dependent variable y is related to independent variables and an error term
Coefficient of determination
measure of goodness of fit of the estimated regression equation. it can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation
Adjusted multiple coefficient of determination
measure of the goodness of fit of the estimated multiple regression equation that adjusts for the number of independent variables in the model and thus avoids overestimating the impact of adding more independent variables
Multiple coefficient of determination
measure of the goodness of fit of the estimated multiple regression equation. can be interpreted as the proportion of variability in the dependent variable that is explained by estimated regression equation
Correlation coefficient
measure of the strength of the linear relationship between two variables
Least squares method
method used to develop estimated regression equation. it minimizes sum of squared residuals (the deviations between the observed values of the dependent variable and the estimated values of the dependent variable)
If Bi=0, this is the relationship between the independent variable xi and the dependent variable
no relationship
Dependent variable
variable that is being predicted or explained (denoted by y)
Independent variable
variable that is doing the predicting or explaining (denoted by x)
Consider the sample regression equation: y hat=50 - 10xi, with an R Squared value of 0.64. What is the value of the correlation coefficient between x and y?
-.8
If r squared = 0, then SSR equals this
0
The sum of error terms should approximate this number
0
This method minimizes the Sum of Squared Errors, ie. Minimizes SSE
Least Squares Method
The formula for Mean Squares Within is:
MSE=SSE/(nt-k)
Assumptions About Error Term in Regression Model
1. The error term is a random variable with a mean or expected value of zero 2. The variance of epsilon, denoted by sigma squared, is the same for all values of x 3. The values of epsilon are independent 4. The error term is a normally distributed random variable for all values of x
A regression analysis involved 6 independent variables and 25 observations. The critical value of t for testing the significance of each of the independent variable's coefficients will have
18
At least this many populations are under consideration for ANOVA
2
An ANOVA procedure is applied to data obtained from 5 samples where each sample contains 10 observations. The degrees of freedom for the critical value of F are
4 and 45
If the coefficient of correlation is -0.7, the percentage of variation in the dependent variable explained by the estimated regression equation is
49%
In a 1-way ANOVA, given R Squared = .85 and SSR = 700, this is the SST
700/.85
The mean square of the treatments given the SSTR=25.08 and df=3 is
8.36
Given the above and an alpha of 5%, if the parameter of X1 was statistically significant, its p-value would need to be this
< .05
Another term for R Squared
Coefficient of Determination
The null hypothesis to determine if the slope of Bi differs from 2
Ho: Bi=2
Null hypothesis to test whether or not there is a difference between treatments A, B, and C, a sample of 8 observations has been randomly assigned to the 4 treatments
Mu1=Mu2=Mu3=Mu4
SSR = SST -
SSE
What is the implication of the assumption that the error term, epsilon, is a normally distributed random variable?
The dependent variable, y, will also be a normally distributed variable
Which of the following is not true with respect to the error in a least squares linear regression?
The sum of squares due to error is minimized by the least squares method
In a simple linear regression, there are two ways to explain R Squared: the variation of the response variable can be explained by 1) the regression model or explained by this
Variation of the explanatory (independent) variable
ANOVA table
a table used to summarize the analysis of variance computations and results. It contains columns showing the source of variation, the sum of squares, the degrees of freedom, the mean square, the F value, and the p-value
Single-factor experiment
an experiment involving only one factor with k populations or treatments
Randomized block design
an experimental design employing blocking
Completely randomized design
an experimental design in which the treatments are randomly assigned to the experimental units
Residual analysis
analysis of residuals used to determine whether the assumptions made about the regression model appear to be valid. also used to identify outliers and influential observations
Response variable
another word for the dependent variable of interest
Factor
another word for the independent variable of interest
A regression analysis between supply (y in 10 widgets) and price (x in dollars) resulted in the following equation: =5 - 3x The above equation implies that if the price is increased by $1, the supply is expected to:
decrease by 30 widgets
ith residual
difference between observed value of dependent variable and value predicted using estimated regression equation
Treatments
different levels of a factor
Regression equation
equation that describes how mean or expected value of dependent variable is related to independent variable
Regression model
equation that describes how y is related to x and an error term
Estimated multiple regression equation
estimate of multiple regression equation based on sample data and least squares method
Estimated regression equation
estimate of the regression equation developed from sample data by using the least squares method
Scatter diagram
graph of bivariate data in which independent variable is on horizontal axis and dependent variable is on vertical axis
In regression analysis, the variable that helps to explain:
independent (or explanatory) variable
The adjusted multiple coefficient of determination is adjusted for the number of these type of variables
independent variables
Prediction interval
interval estimate of an individual value of y for a given value of x
Confidence interval
interval estimate of the mean value of y for a given value of x
P-value where value of F is 1.5 with df1, df2 equal to 5 and 10, respectively
look up
least squares method
procedure used to develop estimated regression equation
The error terms being "this" ensures they are not correlated with any of the explanatory variables
random, no pattern
Simple linear regression
regression analysis involving one independent variable and one dependent variable in which the relationship between the variables is approximated by a straight line
Multiple regression analysis
regression analysis involving two or more independent variables
Standard error of the estimate
square root of the mean square error, denoted by s. is is estimate of sigma, the standard deviation of the error term
Multiple comparison procedures
statistical procedures that can be used to conduct statistical comparisons between pairs of population means
Multicollinearity
term used to describe correlation among independent variables
Experimental units
the objects of interest in the experiment
Comparisonwise Type I error rate
the probability of a Type I error on at least one of several pairwise comparisons
Experimentwise Type I error rate
the probability of making a Type I error on at least one of several pairwise comparisons
Partitioning
the process of allocating the total sum of squares and degrees of freedom to the various components
Blocking
the process of using the same or similar experimental units for all treatments. the purpose of blocking is to remove a source of variation from the error term and hence provide a more powerful test for a difference in population or treatment means
Mean square error
the unbiased estimate of the variance of the error term sigma squared (denoted by s squared)