Chapter 8 Trendlines and Regression Analysis

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Confidence Intervals

(Lower 95% and Upper 95% values in the output) provide information about the unknown values of the true regression coefficients, accounting for sampling error.

R Square

- coefficient of determination, R2, which varies from 0 (no fit) to 1 (perfect fit)

Partial Regression Coefficients

- represent the expected change in the dependent variable when the associated independent variable is increased by one unit while the values of all other independent variables are held constant.

Standard Error

- variability between observed and predicted Y values.

Checking Assumptions - Homoscedasticity

-variation about the regression line is constant -residual plot shows no serious difference in the spread of the data for different X values.

Checking Assumptions - Normality of Errors

-view a histogram (bar chart) of standard residuals -residual histogram appears slightly skewed but is not a serious departure

Systematic Model Building Approach

1. Construct a model with all available independent variables. Check for significance of the independent variables by examining the p-values. 2. Identify the independent variable having the largest p-value that exceeds the chosen level of significance. 3. Remove the variable identified in step 2 from the model and evaluate adjusted R2. (Don't remove all variables with p-values that exceed a at the same time, but remove only one at a time.) 4. Continue until all variables are significant.

Standard Residual

= residual / standard deviation Rule of thumb: Standard residuals outside of ±2 or ±3 are potential outliers.

T-test

An alternate method for testing whether a slope or intercept is zero is to use a

Principle of Parsimony

Good models are as simple as possible.

Regression

Often used to identify(model) relationships between one or more independent variables and some dependent variable Predict future results

Excel Regression Tool

The independent variables in the spreadsheet must be in contiguous columns. Key differences: Multiple R and R Square are called the multiple correlation coefficient and the coefficient of multiple determination, respectively, in the context of multiple regression. ANOVA tests for significance of the entire model. That is, it computes an F-statistic for testing the hypotheses:

Exponential

Y=ab^x

Residual

actual Y value - predicted y value

Cross-Sectional Data

collected by observing many subjects (such as individuals, firms, countries, or regions) at the same point of time, or without regard to differences in time.

ANOVA

conducts an F-test to determine whether variation in Y is due to varying levels of X. used to test for significance of regression: H0: population slope coefficient = 0 H1: population slope coefficient ≠ 0

Checking Assumptions - Linearity

examine scatter diagram (should appear linear) examine residual plot (should appear random)

Simple linear Regression

involves a single independent variable.

Multiple Regression

involves two or more independent variables. Y=B0+B1X1+B2X2+...+BkXk+E

R^2

is a measure of the "fit" of the line to the data. The value of r2 will be between 0 and 1 A value of 1.0 indicates a perfect fit and all data points would lie on the line; the larger the value of R2 the better the fit.

Regression Analysis

is a tool for building mathematical and statistical models that characterize relationships between a dependent (ratio) variable and one or more independent, or explanatory variables (ratio or categorical), all of which are numerical.

Interactions

occurs when the effect of one variable is dependent on another variable. We can test for _____________ by defining a new variable as the product of the two variables,

Multicollinearity

occurs when there are strong correlations among the independent variables, and they can predict each other better than the dependent variable. When significant _____________ is present, it becomes difficult to isolate the effect of one independent variable on the dependent variable, the signs of coefficients may be the opposite of what they should be, making it difficult to interpret regression coefficients, and p-values can be inflated. Correlations exceeding ±0.7 may indicate multicollinearity

Adjusted R Square

reflects both the number of independent variables and the sample size and may either increase or decrease when an independent variable is added or dropped. An increase in adjusted R2 indicates that the model has improved. - adjusts R2 for sample size and number of X variables.

Checking Assumptions - Independence of Errors

successive observations should not be related. This is important when the independent variable is time Because the data is cross-sectional, we can assume this assumption holds.

Least Spuares Regression

the best-fitting line minimizes the sum of squares of the residuals.

Residuals

the observed errors associated with estimating the value of the dependent variable using the regression line:

Multiple R

where r is the sample correlation coefficient. The value of r varies from -1 to +1 (r is negative if slope is negative)

Polynomial (2nd order)

y = ax2 + bx + c

Polynomial (3rd order)

y = ax3 + bx2 + dx + e

Power

y = ax^b

Linear

y=a+b

Logarithmic

y=ln(x)


संबंधित स्टडी सेट्स

Interpersonal communication exam 2

View Set

Intro to Business-Chapter 9: Career Planning and Development

View Set

Leadership Hesi Adaptive Quizing

View Set

Chapter 61: Assessment of the Endocrine System

View Set

Network Pro Final Study Set - Part 1

View Set