Module 6 Regression with Spatial Autocorrelation

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

need for moving average Models

After fitting an autoregressive model, the residuals ideally would be spatially uncorrelated. That is, the errors would be spatially random with no pattern whatsoever, and we would be confident that the model captures as much of the spatial structure in Z that is possible. In practice the model errors might not be uncorrelated because the data contain measurement errors having spatial autocorrelation. Even more likely, it could be that there are variables which are not in the model that affect Z and possess spatial autocorrelation. It could be that the researcher doesn't know what other variables are related to Z and has left out some spatially autocorrelated variables. Alternatively, it could be that the variables are known, but observations are not available. Either way, those other variables cannot be included in the model. If missing variables have spatial autocorrelation, the errors will not be random.

Trend surface and GWR vs spatial autoregressive and moving average

As described above, trend surface models base predictions on location (X,Y) whereas GWR uses non-spatial auxiliary predictors (P1, P2, ...) as independent variables. This lesson considers models that use observations of the dependent variable Z to make predictions at unobserved locations. -The rationale for such models is seen in Module 3 on spatial autocorrelation. If autocorrelation is present, it means that there is spatial structure present in Z and the dependent variable is not spatially random. In principle, we should be able to make use of that relationship to build a prediction model.

All the models considered in this module have an error term. What are these "errors", and why are they necessary?

Error refers to the differnece between predicted and observed values.

What is the primary advantage of geographically weighted regression (GWR)?

GWR produces all of the usual regression statistics: SSE, RMSE, R2, etc. As with standard regression one can perform statistical tests for the significance of individual variables. However, GWR does not give a single value for each b. Instead it produces a function for every b. We cannot report a value for any given b, but we can map that b. This is, of course, the primary advantage of GWR

See section 2 figures of trend surfaces based on degree of polynomial

If only the first degree terms are present, the trend surface polynomial is a plane: The contour lines are necessarily straight and equally spaced because the surface has uniform slope. a quadratic 2nd degree surface is bowl-shaped: The bowl can be concave-upward like above (similar to a soup bowl), or concave-downward like a mountain. a 3rd degree trend surface: This particular surface has large values in the northeast, is nearly flat in the middle of the area, and decreases rapidly toward minimum values in the southwest.

The essence of GWR is spatial variation in the β's

It means that instead of a global model that applies everywhere, we have a model that adapts to local conditions. As the value of a particular β changes from one place to another, the effect of the corresponding independent variable changes. Of course, the β functions are not known and must be estimated from the data. Let us call these estimates b0(X,Y), b1(X,Y), b2(X,Y) and so forth. Each b is an estimate of a β . GWR creates the needed b functions using weighted sums of data values. For any location (X,Y), GWR estimates the effect of a given predictor by weighting nearby data values more heavily than those far away. Weighting is usually performed using inverse distance weights similar to those discussed previously. In effect, GWR fits a standard regression model to every location with individual observations contributing more or less depending on their distance to the location in question.

Autoregressive models rely on w. Use the subscript i to represent a particular observation. The observation consists of a location and a value of Z and so it consists of (Xi,Yi, Zi) .

Let us define the subscript j as a location. Thus j = (X,Y). That location could be a data point (Xi,Yi) or an arbitrary location in the study area. In other words, i is always used for an observation, whereas j need not be a place where Z is measured. With that, here are some possibilities for w: wij={1 if areas i,j are connected 0 otherwise wij=1/dij2 where dij is distance between locations i and j wij=proportion of perimiter of area j in contact with area j

Moving average models

Moving average models address this by including autocorrelated model error as a predictor. Not surprisingly, they are also called spatial error models. The logic behind using error as a predictor is as follows. Suppose we want to predict a value Zj, at location j. Imagine that we know the prediction errors for surrounding locations, and we know that error has spatial autocorrelation. We could use the surrounding errors to estimate the error at location j. That is, if we knew the expected error at j, we could adjust the predictions at j Again assuming there are two auxiliary predictors and letting ui be the error at a data location i, the moving average model is: Zj=β0+β1P1+β2P2+β3∑i=1nwijui+εj Fitting a model like this is more difficult than any of the others we've considered. The error values ui can't be known without fitting a model, but we need those values in order to make the predictions and find the best fitting model! We pick starting values for b0, b1, b2, b3,and set the ui initially to zero. The b's are used to generate predictions for all Zi along with corresponding prediction errors ui. The prediction errors are only provisional, because they are based on starting values for the b's. With the provisional ui in hand, a new set of b's is identified giving the best possible predictions. That model results in a new set of prediction errors, and those errors are used in the next iteration. The process continues until the b's stop changing from one iteration to the next.

Another measure is the coefficient of determination R2, which can be written as

R2=1−∑ei2 / ∑(Zi−Z)2 where Z¯¯¯Z¯ is the mean of the dependent variable. Recall that R2 is always between zero and one. A value of zero indicates no linear relationship between Z and X, whereas a value of unity pertains to a perfect prediction equation (all errors are zero). In our Iowa example R2 is 0.583. This can be interpreted to mean that 58.3% of the variance in elevation is explained (or captured) by the predictor variable longitude.

How to identify and measure the degree of spatial autocorrelation present in some variable Z

Recall that in 3.2 Z could be measured at points or it could pertain to areas. Recall also that a key concept in spatial autocorrelation is an adjacency variable wij that indicates the adjacency or connectedness of two places. This weight variable w could be a binary (0,1) index indicating whether or not two places are adjacent, or it could be a continuous variable reflecting the degree of adjacency or connectedness. Autoregressive models do not use the measures of autocorrelation described in Lesson 3.2, but they do rely on w.

Logic for GWR

The equation applies to all members of the population and expresses their values as the sum of a GWR-prediction and a residual ( εiεi ). Predicted values are based on a set of independent variables---the P's. For the i-th element of the population, the predictor values are P1i, P2i, P3i, ...etc. Notice that each β is evaluated at the same location, namely (Xi,Yi). This makes sense because the equation delivers a value of the dependent variable for that place.

Coefficient maps are often produced as part of GWR. What are these? What purpose do they serve?

The mapped value is the coefficient for the rainfall predictor, and thus the map shows the effect of rainfall on SOC. Notice that not only is there spatial variation in this b, but values actually change sign from place to place. Throughout most of the country values are positive, indicating that as rainfall increases so does SOC. However, there are negative vales in central Ireland, suggesting that rainfall is negatively related to SOC. Obviously, no global model could capture this kind of behavior. GWR can be used purely as a predictive tool, but it has much more to offer through maps like the above. Researchers hoping to understand relationships between predictors and dependent variables will be keenly interested in how individual b's vary, and as a GIS practitioner you will be expected to produce such maps.

root-mean-square-error, or RMSE.

The sum of the squared errors SSE for this example is 342673.47 square meters. More helpfully, we can find the average error and take the square root. The results in the so-called root-mean-square-error, or RMSE. If we made a prediction and wanted to provide an uncertainty value, this would be a good way to report the uncertainty. RMSE provides a measure of success in the sense that small values indicate a good fit between the observations and the sample prediction equation.

Describe what is meant by "data-driven" models.

The term "data-driven" refers to the fact that observations play a central role in model construction. In all cases observations of the dependent and independent variable are used to realize or instantiate the model. In everyday language, we say that the model is "fit" to the data. Without data the model can't be developed. Note that data-driven doesn't mean theory is excluded and has no role to play. Indeed, ideally a researcher's theory and domain knowledge would strongly inform the choice of model. At the very least, one should always select a model that is consistent with whatever information is available about the problem at hand.

statistical tests for model coefficients.

These can be used to decide if the values obtained are statistically significant or could have arisen by chance alone. Normally one will examine the so-called P-value for each coefficient, which gives the probability of the value obtained arising from a population with no relationship between the predictor and dependent variable. Small P-values mean the value obtained in the analysis is unlikely to happen by chance, or equivalently, that the predictor is in likely related to the dependent variable.

Describe and compare spatial autoregressive and moving average models.

This lesson is to describes ways to use information about the spatial structure of Z in prediction. The models discussed in sections 4.2 and 4.3 differ in exactly what is meant by "spatial structure of Z". Autoregressive models use values of Z directly as predictors. By contrast, moving average models are indirect in that they employ prediction error as a predictor variable.

What is geographically weighted regression (GWR)?

Trend surface analysis uses geographic location as a predictor and models the dependent variable as an explicit function of X and Y. Geographically Weighted Regression (GWR) uses one or more non-spatial variables for prediction In classic regression modeling, the effect of each predictor variable is constant. Thus for example, a classic model assumes that an additional bathroom would add the same amount to the value of a house everywhere in the study area. GWR differs from classic regression in that the effects of individual predictors vary over space.

Describe in general terms trend surfaces of degrees one through three.

Trend surface modeling (also called trend surface analysis) involves predicting Z based on geographic location. It resembles spatial interpolation in that regard, but differs in that the model is an explicit function of map coordinates X and Y. In particular, trend surface models are polynomials in X and Y. Remember that polynomials are functions based on integer powers of independent variables. If there is one independent variable (X), the polynomial is based on X1, X2, X3, etc. The highest power in the polynomial is called the degree of the polynomial. Trend surfaces use powers of two variables, X and Y As with linear regression the coefficients are denoted by the various β's but this time there are two subscripts to indicate the power of X and Y. For example, β21 means the term has a 2nd power of X and a 1st power of Y. Higher degree polynomials would have even more coefficients.

Equivalently, how do we assign value for the line's slope and intercept from bivariete regression?

We do not know the value of β0 and β1 , so the best we can do is estimate them based on the data in hand. Let us call these estimates b0 and b1. Zi=b0+b1Xi+ei=Zˆi+ei=Predictioni + Errori The unknown β's have been replaced with sample b's, and the population residual has been replaced with a sample error ei.

The GWR Model

We have used X and Y for geographic location. We use Z for the dependent variable. However, let us denote the non-spatial independent variables in GWR as P1, P2, P3, P4, etc. The subscripts on P are used to number these predictor variables. The model coefficients are denoted β0, β1, β2, β3, but they need to vary spatially. -Thus we will write β0(X,Y), β1(X,Y), β2(X,Y), ... to indicate they are not constants but rather are functions of X and Y. With this notation, the GWR model any element i of a population is: Zi=β0(Xi,Yi)+β1(Xi,Yi)P1i+β2(Xi,Yi)P2i+β3(Xi,Yi)P3i+...+εi

What is meant by "over-fitting"?

When using higher degree trend surfaces it is important to remember that they as polynomials, they are highly flexible functions. In everyday language, they are potentially very "wiggly". This not a problem when data are numerous and evenly spaced, but it can be a problem otherwise. High-degree polynomials often exhibit such behavior. Where not constrained by data they are free to oscillate and they will do so in order to minimize SSE. We would say the data are over-fit, meaning that the data are not able to support the high degree of model in use. This problem also arises in the spatial domain. If there are regions where the density of observations is low, the trend surface may exhibit unwarranted variation. You should watch for this by plotting the data locations on the trend surface map. If large excursions in the form of peaks and valleys occur where data are sparse, you have probably over-fit your dataset.

benefit of autoregressive model

Wouldn't a simpler model involving only P1 and P2 be better? The answer is no, because in that case the coefficient estimates would include the role of both the non-spatial variables (income and ethnicity) and the effects of spatial autocorrelation. In addition, the residuals from the regression would be autocorrelated, which violates an assumption of regression needed for statistical inference. By explicitly modeling the autoregressive effects, the role of the non-spatial variables is isolated. This is particularly important when the goal is to understand the role of the auxiliary variables rather than to simply make predictions for unobserved locations

The simplest form of an autoregressive model (also known as a spatial lag model) expresses values of Z as a function of connected values:

Zj=β0+β1∑i=1nwijZi+εj If w is binary as in (A), only connected areas are included in the average. Form (B) would discount observations far removed from location j, and (C) would consider the amount of contact between areas i and j. A more complex model would include both autocorrelation in Z as well as contributions from non-spatial variables P. Suppose for example, there are two non-spatial variables P1 and P2 that we want to include. The model would be formulated as: Zj=β0+β1P1+β2P2+β3∑i=1nwijZi+εj

We emphasize that in two dimensions (X and Y) the number of coefficients

grows rapidly with the polynomial degree. For example, a 5th degree trend surface has 17 more terms than a 3rd degree surface.

The goal of trend surface fitting

is to extract spatial structure from a dataset. If the fit is successful, there ought to be very little structure present in the residuals. That is, the pattern of residual values should be nearly random. If not, it would mean there are places where the surface systematically under-predicts and over-predicts Z. Such regions would constitute patterns not captured by the trend surface. The important message is that in addition to looking at the diagnostic statistics, one should examine the residuals. At a minimum you should make residual maps like above. Even better, you could examine the residuals for spatial autocorrelation as described in Module 4.

What statistics are commonly used as measures of success for a model that has been fit by least squares?

provides a measure of success in the sense that small values indicate a good fit between the observations and the sample prediction equation If we made a prediction and wanted to provide an uncertainty value, error would be a good way to report the uncertainty. The sum of the squared errors SSE Root-mean-square-error, or RMSE. coefficient of determination R2

Challenge: In the example elevation is measured in meters, but SSE is square meters. Please explain.

squared errors Christ

What is the principle of least squares?

the most widely used method to choose b0 and b1 is the principle of least squares. This approach says "Find the combination of b0 and b1 that minimizes the squared prediction error". In equation form, we look for values of b0 and b1 to minimize SSE =∑ei2 = ∑[Zi−(b0+b1Xi)]2 = ∑[Observationi - Predictioni] ^ 2 The intercept says that if we projected this trend all the way to the prime meridian in England, the elevation would be 2694 meters below sea level. Obviously, such an extrapolation would be fool-hardy. Every straight line intersects the Z-axis for some value of X, but that X-value need not be part of the dataset. If not, b0 should be interpreted with great caution..

bivariate regression

we hope to predict a dependent variable from a single independent variable. Here we use Z and X respectively for the predicted and predictor variable. Thus the model is of the form Z = f(X). The linear form means our conceptual model is: Zi=β0+β1Xi+εiZi=β0+β1Xi+εi This is obviously an equation for a straight line, with β0 and β1 the intercept and slope of the line. The model says any value of Z can thought of as the sum of a predicted value (given by the line) and a residual or departure from the predicted value denoted by εiεi . (The subscript i denotes an individual member of a population.) The residual term is important because we do not assume X is the only variable involved in Z. Other unspecified variables affect Z, and produce values above or below those predicted by X alone.

To fit a trend surface to data,

we need observations of X,Y,and Z at a variety of point locations in the study area. If the data are defined for areas (such as census tracts), we will need to assign the area centroid or some other point as the location for every observation. The principle of least squares can once again be used to fit the model For example, a 1st degree surface is found by choosing b00, b10 and b01 to minimize SSE The result is the "best fitting" plane in the since that the overall squared prediction error is as small as possible.


Ensembles d'études connexes

Cellular Respiration/Photosynthesis brain pop quiz

View Set

English Composition: Logical Fallacies

View Set

Immune and Hematologic Disorders

View Set

Genitourinary Disorders NCLEX 3000

View Set

Individual Tax Exam 1 Helpful Hints

View Set

Unit 4 (Hiemler) (Still 1450-1750)

View Set

Abeka 9th grade World Geography Chapter 2 Section 7 Review

View Set