Multiple Regression (Dummy Variables, Interaction Terms, Nonlinear Transformations)

Ace your homework & exams now with Quizwiz!

Common transformations

Among the many transformations available are the square root, the reciprocal, the square, and transformations involving the common logarithm (base 10) and the natural logarithm (base e).

When a categorical variable has m categories, how many dummy variables are required?

m - 1 dummy variables are required, with each dummy variable coded as 0 or 1.

The most frequently used nonlinear transformations in business and economic applications are the...

quadratic and logarithmic transformations.

Fireplace example with interaction term

In the regression model, we assumed that the effect the size of the home has on the assessed value is independent of whether the house has a fireplace. In other words, we assumed that the slope of assessed value with size is the same for houses with fireplaces as it is for houses without fireplaces. If these two slopes are different, an interaction exists between the size of the home and the fireplace. To evaluate whether an interaction exists, the following model is considered (see photo):

When does an "interaction" occur in multiple regression?

An interaction occurs if the effect of an explanatory variable on the response variable changes according to the value of a second explanatory variable. For example, it is possible for advertising to have a large effect on the sales of a product when the price of a product is low. However, if the price of the product is too high, increases in advertising will not dramatically change sales. The effect that advertising has on sales is dependent on the price. Therefore, price and advertising are said to interact.

3 cases with logarithms and interpretation

Case 1: Predicted-y = a + ...+ b log(x) + .... (x is log-transformed, y is not log-transformed) The expected change in y (increase or decrease depending on the sign of b) when x increases by 1% is approximately 0.01b. Example: Predicted y = 5.67 + 0.34 log x This regression equation implies that every 1% increase in x (for example, from 200 to 202) is accompanied by about (0.01)(0.34) = 0.0034 increase in y. Case 2: Predicted-log(y) = a + ...+ bx + ....(x is not log-transformed, y is log-transformed) Whenever x increases by 1 unit, the expected value of y changes (increases or decreases depending on the sign of b) by a constant percentage, and this percentage is approximately equal to b written as a percentage (that is, 100b%). Example: Predicted log y = 5.67 + 0.34x b = 0.34 and written as a percentage it is (100)(0.34) = 34%. When x increases by 1 unit, the expected value of y increases by approximately 34%. Case 3: Predicted-log(y) = a + ...+ b log(x) + .... (both x and y are log-transformed) The expected change in y (increase or decrease depending on the sign of b) when x increases by 1% is approximately b%. Example: Predicted log y = 5.67 + 0.34 log x For every 1% increase in x , y is expected to increase by approximately 0.34%.

estimated regression equation for the quadratic model

In this equation, the first regression coefficient a represents the y intercept; the second regression coefficient b₁ represents the linear effect; and the third regression coefficient b₂ represents the quadratic effect.

Interpretation of multiple regression with interaction term

Interpretation of b₂: The coefficient of the indicator variable b₂ = - 11.8404 provides a different intercept to separate the houses with and without a fireplace at the origin (where Size = 0 sq ft). Here it does not make great sense. Literally, it says that for houses with size of 0 sq ft, the value of a house without a fireplace is about $11,840 higher than the value of a house with a fireplace. Interpretation of b₃: The coefficient of the interaction term b₃ = 9.5180 says that the slope relating the size of the house to its value is steeper by $9,518 for houses with a fireplace than for houses without a fireplace. The two lines, (3) and (4), meet at (size, value) = (1.2440, 223.3550). Thus, the value of a house with a size greater than 1,244 sq ft is higher when the house has a fireplace.

When the quadratic term is significant but the linear term is not in a linear regression model, what should you do?

It can happen that in a quadratic model the quadratic term is significant and the linear term is not. In such situations (for statistical reasons not discussed here), the general rule is to keep the linear term despite its insignificance.

How can you tell graphically that a log transformation would be logical?

The log transformation evens out successively larger distances between values. If values on the x axis are clumped on the left end but successively get spread out towards the right, then the first case transformation (log of x) should be used. If values on the y axis are clumped near the bottom and successively get spread out towards the top, then the second case transformation (log of y) should be used. If values on the x axis are clumped on the left end but successively get spread out towards the right, AND values on the y axis are clumped near the bottom and successively get spread out towards the top, then the third case transformation (log of y AND log of x) should be used.

How do you represent a categorical variable for multiple regression?

Thus far, the examples we have considered involved quantitative explanatory variables such as machine hours, production runs, price, expenditures. In many situations, however, we must work with categorical explanatory variables such as gender (male, female), method of payment (cash, credit card, check), and so on. The way to include a categorical variable in the regression model is to represent it by a dummy variable.

Due to what transformation can we not compare regression statistics for the per-transformation model vs the post-transformation model?

Transformations of the explanatory variables do not create this type of problem. It is only when the Y VARIABLE is transformed that comparison becomes more difficult.

What happens to the regression statistics when we transform y (dependent variable) using a log transformation?

the interpretations of Se and r^2 are different because the units of the response variable are completely different. For example, increases in r^2 when the natural logarithm transformation is applied to y do not necessarily suggest an improved model. Because of the above, it is difficult to compare this regression to any model using y as the response variable.

Dummy variable

A dummy variable (also called indicator or 0 - 1 variable) is a variable with possible values 0 and 1. It equals 1 if a given observation is in a particular category and 0 if it is not.

Why should you use interaction terms sparingly?

If the correlation between interaction terms and the original variables in the regression is high, collinearity problems can result. In a regression with several variables, the number of interaction variables that could be created is very large and the likelihood of collinearity problems is high. Therefore, it is wise not to use interaction variables indiscriminately. There should be some good reason to suspect that two variables might be related or some specific question that can be answered by an interaction variable before this type of variable is used.

Purpose of nonlinear transformations

The purpose of nonlinear transformations is usually to "straighten out" the points in a scatterplot in order to overcome violations of the assumptions of regression or to make the form of a model linear. They can also arise because of economic considerations.

How do you deal with an interaction term?

To account for the effect of two explanatory variables xi and xj acting together, an interaction term (sometimes referred to as a cross-product term) xi*xj is added to the model (while still keeping xi and xj in the model as separate variables!)

When interaction between two variables is present, what happens?

When interaction between two variables is present, we cannot study the effect of one variable on the response y independently of the other variable. Meaningful conclusions can be developed only if we consider the joint effect that both variables have on the response.

Why are logarithmic transformation frequently used?

they can be interpreted naturally in terms of percentage changes

Interpretation of regression equation for value of house on house size and whether or not the house has a fireplace (assuming for simplicity that these variables don't interact, which they probably do) yˆ = 200.0905 +16.1858x1 + 3.8530x2 x1 = square feet (in thousands) x2 = dummy variable for fireplace or no fireplace

First, this equation really leads to two separate equations (see photo) Interpretation of a: The expected value of a house with 0 square feet and no fireplace is $200,091, which obviously does not make sense in this context. Interpretation of b1: The effect of x1 on y is the same for houses with or without a fireplace. When x1 increases by one unit, y is expected to change by b1 units for houses with or without a fireplace. Thus, holding constant whether a house has a fireplace, for each increase of 1 thousand square feet in the size of the house, the predicted assessed value is estimated to increase by 16.1858 thousand dollars (i.e., $16,185.80). Interpretation of b2: The slope of equations (1) and (2) is the same ( b1 = 16.1858), but the intercepts differ by an amount b2 = 3.8530. Geometrically, the two equations correspond to two parallel lines that are a vertical distance b2 = 3.8530 apart. Therefore, the interpretation of b2 is that it indicates the difference between the two intercepts 203.9435 and 200.0905. Thus, holding constant the size of the house, the presence of a fireplace is estimated to increase the predicted assessed value of the house by 3.8530 thousand dollars (i.e. $3,853).

WHen comparing models, what should you look at?

r^2 and and SE. The model with a larger r^2 and smaller SE is typically better.


Related study sets

Safety & Infection Control (427-453)

View Set

Lesson 5: Grounding Electrode Conductors

View Set

Chapter 52: Assessment and Management of Patients With Endocrine Disorders

View Set

HESI Comprehensive Review for the NCLEX-RN Examination

View Set

ATI Quality Improvement test 7, Informatics ATI, Evidence Based Practice, ATI collaboration and teamwork, Ati healthcare delivery systems, ATI wellness, Health promotion, and disease prevention test, Health Care Delivery ATI, ATI Health Policy

View Set

BLAW 3310 Chapter 12- Business Organizations

View Set

Dynamic Quiz Fundamentals Part 1

View Set

Chapter 12 Food Production and the Environment

View Set

Chapter Quiz 18: True/ False (Exam 3 Material)

View Set