Geog639 Test2
Wald Statistic
Used to test independent variable significance in regression when a normal distribution of the model residuals cannot be assumed (e.g. logistic regression, poisson regression, anything with ML estimation). It's like a t-test.
supervised backwards selection
not simply eliminating in order but using judgement to leave some in and try different orders of removal.
RHO
Need to be aware that you are introducing a new paramter. Could be a good thing and help explain model! -1 < p < 1 ... if rho is close to zero, won't change model much (because pWY term will be close to zero). If you don't need SAR, don't use it "just in case" because we want most parsimonious model. To know whether to use SAR, look at Moran's I for dependent variable/residuals.
How do we solve the problem of spatial dependence in regression analysis?
"Cure" the ill-conditioned variance matrix ..ill-conditioned matrix = heteroskedasticity or autocorrelation .."cure" using spatial autoregressive models Obtain an efficient, reliable estimator Reduce the variance in the estimates
W in SAR
*SAR modelling: The error and explanatory variables are multiplied by the inverse of a model of spatial dependence, specified for the dependent variable.* The autoregressive omponent, pWY, is an expression of spatial dependence: -estimate p -calibrate W W is the contiguity, proximity, or "nearness" matrix. It approaches omega. For spatial proximity can use polygons (shared borders), points (nearest neighbours) Can use binary (1 = contiguous, 0 = non-contiguous), or weights (w*1/d^2) - more complex **W in pWY expresses contiguity, like spatial weights matrix for Moran's I. how you set W matrix determines how much spatial autocorrelation you can remove... can really affect model outcome.
Spatial autoregressive modelling recap
*The error and explanatory variables are multiplied by the inverse of a model of spatial dependence, specified for the dependent variable.* The whole point is to reduce the variance of B, so we should not expect much if any change to goodness to fit. **Inflated variance, even though inthe denominator, counter-intuitively increases the t statistic, increasing the chance of false positives Stochastic properties of the estimator - remember that: -there exists a "true" unknown parameter b linking each independent variable to the dependent -each true parameter b is estimated by the stochastic estimator B -the stochastic properties of B are guaranteed by the regression assumptions Result of large variance is that it affects inference!!! In hypothesis testing, t and F statistics are a function of variance and/or standard deviation. Increasing variance/st dev unnecessarily will falsely increase the t or F statistic!! Inflated variance increases the tendency to reject the null hypothesis when it is actually true (say there is a difference/effect when there is none) We are led to trust unreliable parameters! We tend to retain non-significant variables in model!! *The error and explanatory variables are multiplied by the inverse of a model of spatial dependence, specified for the dependent variable.*
Spatial regression models
1) Conditional autoregressive model (CAR) - assumes gaussian 2) Simultaneous autoregressive model (SAR) 3) Spatial moving average (equivalent to temporal moving average)
GWR: some observations
1) reliability: each local regression is estimated on a small number of observations, therefore each regression has very few degrees of freedom. 2) Additionally, in all likelihood the spatial dependence is very high in each local neighbourhood: as a consequence, in an attempt to reduce the negative consequences of spatial stationarity, GWR aggravates those of spatial dependence. -high autocorrelation because at local scale, likely that things are similar 3) An additional, important observation is that spatial dependence and spatial non-stationarity tend to occur simultaneously in empirical geographical phenomena. -both techniques (SAR and GWR) are "borrowed" from other disciplines -non offers a comprehensive solution to the problem(s) 4) The estimation method is OLS. Model selection (backward or forward) is done globaly: -each local regression is likely to contain variables that are not significant locally; and vice versa -Correlation is tested globally, but may vary locally, possibly leading to local multicollinearity -these are additional concerns for local model reliability and robustness. 5) Sometimes GWR "takes care of" spatial autocorrelation. No evidence if/how GWR can do it. Recall, an appropriate independent variable takes care of spatial autocorrelation in the heart disease regression.. Sometimes when you apply GWR, spAC gets worse, but sometimes it disappears magically. Can think of spAC and non-stationarity as two sides of the same coin.... 6) What good is inference at that scale? 7) How can you interpolate b/w two points? Unless things are stationary, which goes against the whole point.
multilevel models
A combined model at the individual and aggregate levels. A set of parameters and a combined error to account for individual and aggregate behaviours.
Inference and significance in SAR
A commonly used test in logistic regression is the likelihood ratio statistic (G^2). The test is distributed asymptotically as a chi square. Equation on slide. An alternative formulation of the test is used to evaluate the entire regression model (similar to the F test): Comparing a regression no the constant only vs a regression on the entire set of explanatory variables: lambda = L0/Lmax. Also distributed chi square, with k degrees of freedom, where k is the number of regressors. Testing the significance of individual regressors (variables): Recall t test relies on the normality assumption. The *Wald Statistic* does not require the normality assumption. It is asymptotically distributed chi square.
Logistic Regression
A form of regression applicable to categorical data. Commonly used when the dependent variable is binary or categorical. Useful and popular in social sciences (qualitative data) and wildlife studies (presence/absence). LR models the probability of occurrence of each category, using a logarithmic function (non-linear). Observations on categorical data are transformed into probability of occurrence of each category. Probability is a continuous variable defined in the interval 0 to 1. Probability can be modeled as a dependent variable in a regression model. Any independent variable (nominal, ordinal, interval, or ratio) can be used. Many of the known probability dist functions (pdfs) have one trait in common: -prob does not vary linearly b/w 0 and 1 -Frequently, or "normally" higher frequencies tend to cluster in the center of the pdf, or away from the limits of 0 and 1. Owing to the characteristics of the DV, some assumptions of the OLS model would be violated: -DV has limited range -rel'n b/w Y and Xs is not linear -The error is not normally distributed and is heterschedastic. *Linear Probability Models* exist, but it violates assumptions. Instead we use the *Logit Model*, because it: -goes from 0 to 1 -models a gradual transition -lower frequencies near extremities -higher frequencies in the middle Check L9 S48-49 for formulas ESTIMATION The DV is non-linear, so not all assumptions are met. Thus, OLS estimates are not appropriate. Usually ML is used. ML involves "speculating" on which alternative values of the set of parameters would "best" reproduce the population parameters. ML consists in estimating the unknown parameters in such a way that the prob of observing the given parameters is as high as possible. Under the ML criterion, the parameters are estimated so as to maximize the prob of generating of obtaining the observed sample. INFORMATION CRITERIA AIC weights the logLikelihood by # ind vars AIC is popular w/ logistic regression INFERENCE A commonly used test in logistic regression is the likelihood ratio statistic. The test is distributed asymptotically as chi-square. An alternative formulation of the test is used to evaluate the entire regression model (similar to F test): Comparing a regression on the constant only vs a regression on the entire set of explanatory variables. Also distributed as chi square with k degress of freedom (k = # regressors). For testing individual regressions, recall that t-test requires normality. However, the *Wald statistic* does not require normality. It is asymptotically distributed as chi-square. Logistic regression provides a probability surface that is useful in predicting the dependent variable
Stationarity
A spatial or temporal process is stationary if it is constant, or "still" over its entire duration. Check out formulas for stationarity requirements!!! -mean and variance constant and independent on i -covariance is independent on i, only length and orientation of k For a spatial process, an additional property is required, called isotropy: directional invariance How does non-stationarity affect assumptions? -non stationarity violates the "identical distribution" sub-assumption (sub assumption of variance of error is constatnt and finite (IID)) As with autocorrelation, you affect the identity matrix, which inflates variance, which makes the model no longer "best" - no longer BLUE - as it has implications for inference.
Stationarity and isotropy
A spatial process is stationary if it is constant, or "still" over its entire duration. For a spatial process, in addition to stationarity, a further property is required (isotropy).. Isotropy is directional invariance (read about isotropy) Because it is so hard to achieve perfect stationarity, we have developed two kinds: 1) Second Order, or Weak Stationarity & Isotropy (defined only by a bounded region) 2) First Order, or Strong Sationarity & Isotropy (defined on the entire domain of the process) latin (stationary) = still greeg (isotropic) = equal direction Conceptually: A stationary and isotropic process has equal intensity and no directional bias over the entire region (but it is different from a poisson distribution!!!!) What is anisotropy?
Stochastic Process
A spatial stochastic process is a model of spatial variation. Has a functional form in which things are dependent on one-another. *Observations in a stochastic process are not independent* A stochastic process is a probabilistic model defined by a collection of random variables. The take place in space y(i) = f(y1, y2, ... , yn) ...and time y(t) = f(yt-1, yt-2, ... , yt-n) Or both combined!
Stationarity (time)
A temporal process is stationary if it is constant, or "still" over its entire duration. i. Estimate of X and time t = the mean for all t ---mean constant and independent of t ii. Variance is finite and independent of t iii. Covariance of X at t and X at t+k equals gamma(k) for all t and all k ---Covariance is independent of t and depends only only the length of interval k. If it is non-stationary... -the mean will increase or decrease over time -the variance will change with time -the covariance (oscillations not all same distance apart) For stationarity in space, covariance is a little more complicated - instead of only length, it is dependent on length and orientation of k.
Punctual Kriging
Assumption: stationary regionalized variable (because we are estimating value at unknown location based on values at known location.. weighted average) Estimation of values at unknown locations, based on values at known locations (Weighted average). -want estimator to be unbiased -want mean error to equal zero for large samples **the crucial variable is the weight, W** -with regresstion, can only work with beta, here can only work with weight. Attempting to find combination of weights W such that the sum of all W = 1, and error variance = minimum. Need a solution to a system of equations, constained optimization problem. *study the equations!!* We need to know gamma for each distance h to determine optimal weights, w that yield an unbiased and best IDW interpolator. We add Lagrange multipliers as constraints so that we can get minimum variance. Solving the system for the vector W provides the optimal set of weights for the interpolation equation. Important! Punctual Kriging assumes stationarity! -non-stationarity in the mean can be dealt with -non-stationarity in the variance affects the properties of the kriged surface (by increasing the error variance) But the key assumption is *stationarity in the covariance* -the semivariogram is an expression of the covariance over the study area -the semivariogram function can only be defined if the covariance is stationary
Spatial autoregressive modelling
Autoregressive models are popular in time series analysis. SAR comes from Temporal AR. A spatial autoregressive model is Y = pWY + e -p (rho) is the autoregressive parameter; -it represents a correlation coefficient (b/w -1, 1) -W is the spatial weight matrix Multivariate SAR models are Y = XB + pWY + e **understand slides 34-37 *In the spatial autoregressive specification, the error becomes independent The estimates have minimum variance, so the estimator is "some shade of" blue -But autoregressive model with autocorrelated error is only almost blue, because dependent variable is on both sides so not linear!
Spatial data
Bi-deminesional distribution No "natural" order Not regularly spaced (unless raster) Varying size and shape (if areas) Exhibits spatial dependence and non-stationarity.
Estimation of SAR model
Can't use OLS because independent variables are not supposed to be stochastic, therefore unlikely to be independent of the error. Instead use maximum likelihood. Estimates of B in max lik. and OLS are the same (if distribution is normal) - only variance is slightly different.
Stationarity in a stochastic process (conceptually and computationally)
Conceptually: A stationary process has equal intensity throughout the entire time period or spatial area. Computationally: A stochastic process is stationary if 3 properties are met: -constant mean -constant variance -constant covariance For covariance, look at periodicity (can be covariance or autocovariance). We are talking about frequency (distance between waves at different times).
Geostatistics and regionalized variables
Definition (1973, France): Problems that arise when conventional statistical theory is used in estimating changes in ore grade within a mine. -Geostatistics involves estimating the form of a regionalized variable in one, two, or three dimensions -the basic statistical measure of geostatistics is the semivariance *Regionalized Variable*: A variable that has intermediate properties between random and deterministic (natural phenomena with geographic distribution).. e.g. elevation of ground surface, changes in grade within an ore body. -they are continuous, but changes are too complex to be described by a deterministic function -they are spatial continuous, but values are known only at samples, taken at specific locations -recall *spatial stochastic processes* -size, shape, orientation, and spatial arrangement of samples are defined as the *support* for regionalized variables -changes in any of these properties affects the characteristics of the variable
Lagrange multipliers
Finds maximum or minimum in a function that has constraints. It is used in kriging to solve for the local maxima and minima in the interpolated surface. Lagrange multipliers are also used in SAR to predict whether a spatial lag or a spatial error model would be more suitable for correcting for spatial autocorrelation in the model.
SAR MODEL VS SE MODEL (spatial autoregressive model vs spatial error model)
For both types of model, we are trying to fix the variance-covariance matrix of the error so that it doesn't inflate the model variance during parameter estimation. We are trying to turn it back into an identity matrix by eliminating the spatial dependencies inherent in the data, which because present in the error and make for non-zero off diagonal terms in the identity matrix. In SAR, an autoregressive term (pWy) is added to the regression model such that: y = pWy + Bx + error Where p (rho) is an autoregressive coefficient that must be solved for, W is a spatial weights matrix, and the error is assumed to have a mean of zero and be iid. The idea is that this added term counteracts the spatial dependencies in y, the dependent variable, so that the error doesn't end up being non-iid. Rather than SAR, SEM can be applied when there appears to be significant spatial autocorrelation, but tests (e.g. Lagrangian Multiplier) for spatial lag effects do not suggest that inclusion of the latter would provide a significant improvement. The idea here is that the standard regression model is used, except that the error term contains the spatial weights matrix: y = XB + error, where error = pWerror + u Where p and W are the same as above, and u is iid. The idea here is that instead of fixing the dependent variable in order to indirectly fix the error, you're fixing the error directly by assuming that it consists of a vector of "good" IID error (u), and a vector of spatially dependent error (pWerror)
Spatial autocorrelation indices
GEARY'S C -a paired comparison of spatial autocorrelation that relates closely to the semivariogram If Geary's c statistic is less than 1, similar attributes tend to cluster in geographic space If Geary's c statistic is greater than 1, dissimilar attributes tend to cluster in geographic space - checkerboard pattern If Geary's c statistic MORAN'S I Moran's I statistic is the spatial equivalent of the Pearson product moment correlation coefficient If Moran's I statistic is close to 1, similar attributes tend to cluster in geographic space If Moran's I statistic is close to -1, dissimilar attributes tend to cluster in geographic space; checkerboard pattern, high "competition" If Moran's I statistic is close to 0, attributes are randomly located in geographic space ** Both have a "w" in the formula - they are weighted. **always perform one of these spatial autocorrelation tests before any spatial analysis!! The W is spatial proximity (a.k.a. C - contiguity) -two ways to meausure... either using binary adjacency test (based on polygons), or binary test based on distance threshold (based on points)
GWR
GWR aims at minimizing the variance induced by spatial non-stationarities. In order to capture the variability of the relationship in space, GWR computes many local regressions, generally as many as the spatial units in the sample. Each local regression is estimated on a subset of neighbouring units around each unit in the sample. When the same stimulus provokes a different response in different parts of the study region: realtionships vary over space. -GWR takes into account spatial non-stationarities -by calibrating varying relationships among variables over space -the study area is split into many sub-regions, and a local regression is estimated over each sub-region While SAR tries to fix error, GWR tries to minimize variance. Non-stationarities involve: -not simply a spatial process -or a set of spatial processes ...but most importantly... -the relationships between spatial processes -or b/w dependent and independent variables The reason for spatial autocorrelation is NOT caused by: -s.p.A.C error -or s.p.A.C. dependent *it is the relationship b/w the two For GWR, equation remains the same, but coefficient specified by coordinates. -study equations Determination of W: Unlike in the computation of W in spatial regression, these are *simply* local weights! There is no autoregressive parameter rho. With GWR you can get a map of B, t-value, R2 (local), etc. Each regression area has these parameters. Before using GWR, test for non-stationarity in the GWR software using clustering tests (Getis-Ord Gi, Ripley's K, etc) - to see if necessary *Look up and know Breusch-Pagan test
What is the difference between correlation, autocorrelation, spatial autocorrelation, semivariance, Geary's C, and Moran's I? What does an empirical semivariogram look like compared to a covariogram? Compared to a correlogram?
Good questions for the test... look at formulas!!
Getis - Ord's Gi*
Hotspot/clustering considers both nearby similarity and magnitude the local mean for Moran's I includes only neighboring features, whereas the local mean for Getis-Ord Gi* includes all features, including the one in question. They are very similar measures, with slightly different formulas.
SAR review
In SAR we are concerned about error - spatial dependence in error. We need to estimate parameters, and assumptions are in errors Good practice to test dependent variable, because B is make of X and Y, so if dependence is in Y then will affect B. However, concerned mostly with dependence in error *If error is "clean", assumptions are met, even if spAC exists in dependent variable.
Spatial dependence in regression
Indication of a spatial stochastic process, not a random one. Intrinsic property of geographic data and processes Spatial autocorrelation: statistical measure of SD. Necessary condition for any spatial analysis, but also a property that weakens all statistics. STATISTICAL INEFFICIENCY -When observations are independently distributed, n observations provide n units of information. When observations are spatially dependent, n observations provide <n units of information. When we violate independence assumption, parameter estimates have larger variance. Best case: blurry (uncertain). Worst case: non-existing relationships (inflated inference test). Error must be normally distributed with a mean of zero. Otherwise estimates are not unbiased. *understand Lec7, slide 24-25 The equation Var(e) = variance times identity matrix is only true if the error is independently (1) and identically (2) distributed (IID) Otherwise, the identity matrix in the formula shall be replaced with a matrix omega where: -if (1) the off-diagonal elements are not 0 -if (2), the diagonal elements are not 1 Sub-assumptions 1 and 2 tend to be violated by spatial data. (1) is violated by autocorrelation; (2) is violated by non-stationarity.
Localized regressions
Instead of going to the extreme with a GWR, why not try to define "meaningful" subregions? We have to be arbitrary somewhere... Localized regressions... -omit non-significant variables in local models. -can expand model selection to include variables that are not significant globally but are significant locally - not done in the example shown -correlation analysis should be performed locally to guide model selection and test again multicollinearity
Global Modelling
Jones: Global modelling denies geography and history: Everywhere and anytime is basically the same! it is an impoverished representation of reality... Up to now, using global. In local modelling, study area is broken into parts and variables can vary throughout the area. Way of making space more important. -Multilevel models help decide b/w local and global models -local indices are proportional to global indices of spAC In local spatial analysis, we are philosophically refuting the search from general laws. Quantitative analysis beyond general laws.
Equation for Global regression, Global SAR, and GWR...
Know these equations!! Is W the same in SAR and GWR?
Kriging introduction/overview
Kriging is a type of IDW.. but it is better (it does more). IDW only weights by distance, but not by attribute(?). Kriging is local (as opposed to global), exact and stochastic. Kriging is about making IDW stochastic to optimize properties. Distance-weighted interpolator Explicit model of spatial dependence Estimation uncertainty First calculate spatial autocorrelation (semivariance) Q: What is dif b/w autocorrelation and semivariance? Then model semivariance (semivariance function) Then find optimal weights for IDW interpolation It is an interpolation/contouring technique based on the semivariogram. Optimal properties: -exact estimator -predicts sample points with 0 error -provides measure of uncertainty of contoured surface
Land-use regression
LUR models are unlike dispersion models LUR estimate local scale variability in urban air polution Recommended by Health Effects Institute for exposure LUR models yield fine spatial resolution estimates by regressing measured pollutant concentrations vs land-use characteristics of sampling locations, such as -traffic volume -industrial sources -population density. GW-WIND MODEL -Residual spAC as a signof a missing variable -Wind speed and direction capture spAC -The wind model is linear, has many variables as a SAR -GWR specification of wind model pratically addresses both sources of inflated variance, in a local linear model. *sometimes when you add a new variable to "fix" autocorrelation, you have to change entire model toavoid cross correlation (e.g. heart disease model). But sometimes it doesn't affect/interact with other variables and it just works!!
LISA
Local Indicators of Spatial Association LISA is a set of tools for analyzing and visualizing spatial association. It utilizes local indicators to assess significant spatial clustering. The sum of LISAs for all observations is proportional to global indicators of spatial association (i.e. spatial autocorrelation). LISAs -Local Moran's I -Getis and Ord's G* -Boxplots, Histograms, LISA Maps ---local pockets of non-stationarity ---contribution of each observation to the global index ---descriptive, exploratory analysis (Arc, GeoDa, R) RIPLEY'S K-FUNCTION -the K function for a distance d is the average number of events found in a circle of radius d around the event, divided by the mean intensity of the process (# events divided by area) GETIS AND ORD'S G* -tests if a point event and the regions surrounding it are clustered (above average G* on a variable X) or not clustered (below average values) LOCAL MORAN'S I (check out formula) In all cases when LISA vary over the study area... -the spatial process displays non-stationarity (otherwise LISA would be the same everywhere and coincide with the global index) Non-stationarity can affect a single process or the relationships among processes...
Maximum likelihood estimator
Maximum likelihood is a technique that involves "speculating" on which alternative values of the set of parameters would "best" reproduce the population parameters. The maximum likelihood method consists in estimating the unknown parameters in such a way that the probability of observing the given parameters is as high as possible (or maximum) Under the max likelihood criterion, the parameters are estimated so as to maximize the probability of generating or obtaining the observed sample. The ML estimator relies on less stringent assumptions that OLS: 1) it does not need the assumption of normality (once a distribution is assumed, the parameters can be estimated) 2) It does not need IID assumption on the error variance *The ML estimator for the unknown parameters is derived from the multivariate probability density function of e.* For example, if we assume that e follows a multivariate normal distribution: e~N( 0, var*I).. Knowing the normal probability density function, we can express the pdf of e. Given the parameters B and sig^2, the multivariate pdf for y can be written as L(B, var^2; Y) - known as the likelihood function, which expresses the unknown parameters B and var^2 in terms of the vector Y, or given the observed vector Y. We need to calculate the first (partial) derivatives of ln(L) for B and var^2. The solution to this system of complex equations is B = (X'X)-1(X'Y), and st.dev(sigma) = (e'e)/n **Assuming normality of the distribution, B(ML) = B(OLS). However, var^2(ML) is biased for small samples. ML estimators are unbiased only for large samples, and have minimum variance, so are asymptotically BLUE! (Johnston 1984) The ML method does not provide a measure of goodness of fit for a model. The "logLikelihood" parameter is generally negative and large in absolute value. In general, as logLikelihood goes down, the model is better. However, this parameter doesn't tell us how well the model fits the data. Anselin proposed a pseudo R^2, computed as the squared correlation b/w the fitted and observed variables: Pseudo R2 = (cor(Yfitted, Yobserved))^2 Pseudo R2 should be considered only as an indication of the model's goodness of fit, not as a test!! Statistically more sound indicators are indices known as Information Criteria such as Akaike's (AIC) and Schwartz (SIC or SC)- equations on slide -they are based on the idea of imposing a penalty for adding regressors to the model -AIC weights the loglikelihood by the number of independent variables. AIC is popular with logistic regression; all ICs should be applied and interpreted in conjunction with other model diagnostics
Ripley's k-function
Measure of clustering - similar to Moran's I but can be used at multiple scales
Likelihood ratio statistic G*
Not sure where this is used. Figure it out.
ordinary vs universal kriging
Ordinary kriging assumes a stationary mean, while universal kriging accounts for a non-stationary mean by removing the drift using a trend surface analysis.
Spatial stochastic process example
People living in the same or nearby neighbourhoods tend to have similar age, income, access to health care; hence similar rates of disease incidence: positive spatial autocorrelation within a distance. The disease incidence is not constant over space, or the process is non-stationary: incidence rates vary from young wealthy neighbourhoods to retirement communities (i.e. inconstant mean); the variability in young neighbourhoods is higher than in older ones (i.e. inconstant variance); the spatial extent of the similarity (spatial dependence) varies, in the city, from densely populated central areas to suburban communities, to areas of mixed residential & commercial land use (inconstant covariance). Non-Stationarity i. The disease incidence is not constant over space: incidence rates vary from young and wealthy neighbourhoods to retirement communities (i.e., inconstant mean); ii. The variability in young neighbourhoods is higher than in older ones (i.e., inconstant variance); iii. The spatial range of similarity (spatial dependence) varies from densely populated central areas to suburban communities, to areas with mixed land use (i.e., inconstant covariance). Anisotropy iv. The spatial extent of the similarity (spatial dependence) varies in different directions, driven by the road network, residential patterns, physical barriers such as rivers, or other features, such as parks (i.e., anisotropy)
Classification of interpolation procedures
Point/Areal Global/Local Exact/Approximate Stochastic/Deterministic Gradual/Abrupt Interpolation is deterministic when you don't have error. With stochastic you assume an error and try to minimize it.
Multilevel (hierarchical) models
Regression at many nested scales. The models are linked through the error - think of it as a system. Have to solve all equations at once - instead of a variance/covariance matrix, you get a cube... think of the multicollinearity and non-stationarity problems you get in multidimensional space. These models typically include a bayesian specification.
Universal Kriging
Release the assumption of a mean-stationary variable. Find a linear estimator that is not unbiased in the presence of a trend. A non-stationary variable has 2 components: drift, residual. (check out formulas) Universal Kriging estimation is a stepwise procedure: 1) estimate and remove drift (using TSA) *note the different meaning of the word "residual" in Krigin vs TSA 2) Krige stationary residuals (estimate stationary variable) 3) Combine estimated residuals with drive to estimate actual surface
Semivariance
Semivariance is a measure of the degree of spatial dependence between samples at a specific support. -expresses the sum of the squared differences between pairs of points separated by the distance h. -since the number of points is n, the comparison between pairs of points is n-h (check out equation) The semi-variance is simply half the variance of a spatial process. The terms inside the expression, X, are attribute features taken at intervals of size or distance h. The semivariance (gamma) is a function of distance, h. It measures the difference between attribute values as a function of their spatial separation. A *semivariogram* is obtained from calculating and plotting the semivariance for different values of h. -at a certain distance, distance doesn't matter anymore (after a certain threshold, they are still related, but not because of distance). The *empirical semivariogram* provides a description of how the data are related (correlated) with distance. The *semivariogram function* was originally defined by Matheron (1971) as half the average squared difference between points separated by the distance h. Semivariance is like variance, but not using mean.. takes the difference between each attribute and a second attribute h distance away.
SAR considerations and issues
Spatial autoregressive modelling: the error and the explanatory variables are multiplied by the inverse of a model of spatial dependence, specified for the dependent variable. Statistically, the SAR solution is flawless: -the cause of inefficiency is removed -the optimality of the estimates is restored In the case of spatial data, -the cause of inefficiency is spatial dependence -i.e. the most fundamental property of spatial phenomena Removing spatial dependence is not geographically flawless: -ridding the model of its inefficiency implies ridding it of its spatial componenet. The focus of the analysis should then be shifted to the process of constructing the matrix omega^-1 Constructing the matrix omega means specifying a model of spatial dependence. _________________- Problems.. -we still don't know the cause of spatial autocorrelation! -you "fixed" everything but at a high cost --- the only way you can fix your model is by taking out the most important part ---the baby goes out with the bathwater Can we just accept increased variance?? NO - b/c of the effect on inference. *At least with SAR you have parameters you can trust.
Spatial dependence
Spatial dependence is defined as "the existence of a functional relationship between what happens at one point in space and what happens elsewhere". Spatial dependence is the basic property of spatial data, and the condition for the definition of spatial processes.
Spatial regression and spatial econometrics
Spatial regression is roughly the same as spatial econometrics.. originated late 70s Econometrics: measures parameters of an economic relationship based on empirical data, usually time series. It is empirical testing of economic theory. Spatial econometrics: adapting econometrics methods and techniques to spatial data. Spatial regression: unlike econometrics, focuses on data (less on theory) - data driven approach, common in geography. Applied to other branches of geography (not economics) and focuses on spatial relationships.
Spatial regression models and reliability
Standard regression models of spatial data are often uncertain or unreliable. In the best cases, unreliable models provide decision makers with a blurry picture of the factors they need to manage, leading to ineffective decisions. In the worst cases the picture is so blurry that it may lead to management decisions that are not just ineffective but harmful. When we violate independence assumption, parameter estimates have larger variance. Best case: blurry (uncertain). Worst case: non-existing relationships (inflated inference test).
Choosing the weighting matrix in GWR
The *bandwidth* defines the threshold distance, or number of nearest neighbours, that enter each local regression. Can keep bandwidth fixed kernal (constant distance) or adaptive kernal (constant number of nearest neighbours) Bandwidth often defined by software using cross-validation or AIC
Testing for spatial autocorrelation in the SAR model residuals
The Moran's I index is not an appropriate index because the residuals are estimated from the model - they are not data. A more appropriate estimator is the Lagrange multiplier test. Refer to Anselin 1988 (Lagrange multiplier test diagnostics for spatial dependence and spatial heterogeneity), Anselin 2003 (An introduction to spatial regression analysis in R).
Poisson and Logistic regression - SPATIAL
The PR model assumes independence of observations. It does not assume homoschedasticity or stationarity of observations. If the assumption of independence is not met, spatial models should be used (e.g. SAR or GWR) SAR versions of Poisson and logistic can be computed using R scripts. GWR software can calculate local Poisson and logistic model. The spatial neighbourhood (i.e. contiguity, bandwidth, etc) require different calculations for different types of data (e.g. points, polygons, centroids, raster cells, etc)
Poisson Regression
The dependent variable consists of positive integers, with few non-null low values and a large proportion of null values. Such variables are best represented by the Poisson distribution, which describes rare events, independently distributed in time and space. With a Poisson distributed dependent variable, some assumptions of the linear model would be violated: -Normality of the error distribution -Linearity of relationship b/w dependent and independent variables -Identical error distribution -moreover, the linear model would predict negative values In alternative to the linear model, a common practice is the log transformation of the dependent and the consequent estimation of a linear model on the transformed dependent. The log of the dependent variable changes linearly with equal increment increases in the independent variables. The log of the DV is expressed as a linear function of the predictors Changes is the DV, from combined effects of different independent variables, are multiplicative Applying a simple mathematical transformation, this is best expressed by (see formula on L9 S37). The poisson distribution is characterized by the property known as equidispersion. Hence, Poisson Regression assumes equidispersion of the dependent variable. If this assumption is violated, other models or appropriate corrections should be applied. Given the properties of the Poisson model, the classical regression assumptions are violated, so we use... MAXIMUM LIKELIHOOD ESTIMATOR Knowing the logLikelihood parameter we don't know "how well the model fits the data" For the Poisson model, MacFadden pseudo R2 is often used, it provides an indication of the model's goodness of fit. McFadden pseudo R2 is computed as the ratio b/w the null model (constant only) and the selected set of predictors. It can be adjusted to take into consideration the number of predictors and, similar to AIC, favour more parsimonious models. Inference on the whole regression and individual parameters is guided by the same principle as int he logistic model. The Wald statistic is often used to assess the significance of the individual parameters.
drift
The drift is the average of a regionalized variable within a neighbourhood: it is a slowly varying, non-stationary part of the surface The residual is the difference b/w actual measurement and drift. If the drift is subtracted, the regionalized variable is stationary in the mean (this is how you account for non-stationarity in the mean assumption for Kriging)
Spatial Interpolation
The rationale behind spatial interpolation is the observation that points close together in space are more likely to have similar values than points far apart (think spatial dependence). Spatial interpolation is the procedure of estimating the value of properties at unsampled sites within the area covered by existing observations; A process of determining the characteristics of objects from those of nearby objects. Objects can be points, lines, areas. Generally the attribute is interval or ratio
Modelling the semivariogram
The semivariogram is known only at discrete points (distance delta-h). For analytical purposes, the experimental semivariogram can be modelled with a continuous function that can be evaluated at any required distances. The semivariogram is characterized by 3 parameters: *sill*: the flat region (reached at the range). Characterized by constant and high variance (measured in gamma) *range*: the distance at which the curve approaches the process variance (measured in distance) *nugget*: the erratic variance over short distances (it is the initial gamm at lag distance = 0) -nuggets are due to very large differences over very small distances. Different models say different things about the points on a semivariogram... -a parabolic form shows excellent continuity, while a linear form shows moderate continuity (why???) -with a horizontal line, no spatial autocorrelation -nugget effect means high variance over small distance The parameters of the semivariogram (Sill, range, nugget) are used as functional parameters in the various models. There are many different models... linear, exponential, power, gauss, spherical... know the difference!
covariograms and correlograms and semivariograms
They all have distance (lag) on the x axis. A covariogram has covariance on the y-axis, which goes up to however high the covariance is. A correlogram has correlation on the x axis, and i stays between -1 and 1. In theory, both should approach zero as distance increases. In a semivariogram, semivariance is on the y axis. It should approach the sill (semivariance at which distance no longer matters).
Analyzing the semivariogram
When change in h is.... 0, X-X = 0, gamma = 0. small, X1,X2 are similar, gamma = small large, difference is large, gamma is large critical, relatedness = 0, gamma = process variance (max)
Non-stationarity vs autocorrelation.. which to address?
When we talk about non-stationarity, or spatial autocorrelation, we could be talking about either the dependent variable or the error. However, while non-stationarity in the error is the actual problem, this doesn't occur until we actually havea model. We can look at non-stationarity in the dependent before we have a model as an indication or prediction of whether it will exist in the error. There is no one single model to solve both non-stationarity and spatial autocorrelation. If you have both, there are a number of things you can do. First, look to see which one is worse, and address that one preferentially. Try different things too - different kinds of models to see what the outcomes are. Can try adding a variable to fix everything (e.g. heart disease example). This can potentially remove all sp. AC and leave only non-stationarity. SAR models ignore non-stationarity, leaving a portion of 'spatial' variance unaddressed. GWR models ignore spatial autocorrelation, leaving a portion of 'spatial' variance unaddresed. *Both techniques are only partial solutions* SAR models 'get ride' of spatial variation GWR models yield 'too many' models GWR models may correct or aggravate spAC At some point you need to make arbitrary decisions and trade offs in attempting to correct for spatial dependence and non-stationarity.
equidispersion
mean = variance
atomistic fallacy
modelling spatial behaviour purely at the individual level. Results in missing the context in which the individual behaves.
Breusch-Pagan test
test for heterskedasticity in a linear regression model The null hypothesis for this test is that the error variances are all equal. The alternate hypothesis is that the error variances are not equal. More specifically, as Y increases, the variances increase (or decrease). The Breusch-Pagan-Godfrey Test (sometimes shorted to the Breusch-Pagan test) is a test for heteroscedasticity of errors in regression. Heteroscedasticity means "differently scattered"; this is opposite to homoscedastic, which means "same scatter." Homoscedasticity in regression is an important assumption; if the assumption is violated, you won't be able to use regression analysis. A weakness of the BP test is that it assumes the heteroskedasticity is a linear function of the independent variables. Failing to find evidence of heteroskedasticity with the BP doesn't rule out a nonlinear relationship between the independent variable(s) and the error variance. Additionally, the BP test isn't useful for determining how to correct or adjust the model for heteroskedasticity.