Structural Equation Modelling (SEM)
What is SEM?
A procedure where we: Measure an important variable, like a DV Specify some theoretical relationship Estimate the variables that fit the relationship Then check how strongly the estimated variables agree with the actual variables (model fit - chi sq) It is an extension of Path Analysis
Power and Sample Size
SEM dependent upon the number of variables used. The number of subjects should exceed the number of variables (observations) N > v If N < v = singularity N greater than or equal to 50 + ( v . 8) (eight people for every variable)
Model Misspecification
Model chi square is increased by the degree the model is misspecified. Changed one line and chi square changes a lot. If theory put relationship in wrong direction then theory no longer fits.
Types of Identification
Model consists of observations (measures variables), v and parameters (q) Over-identified v > q v(v+1)/2 > q Just-identified v = q v(v+1)/2 = q Under-identified v < q v(v+1)/2 < q
Why use SEM
Allows the use of esoteric designs not available in normal techniques ie things you could not imagine doing. Its a better way to do things. Makes clearer, more pure estimates of variances, covariances and if asked, means and intercepts. Can have multiple DV's. Can include multiple intervening/mediating variables while regressing these variable onto multiple DVs. Allows researcher to hypothesise how items of scales load onto factors. Can allow researcher to explicitly model latent variables.
Power and Sample size Regression Approach Estimation techniques
Asymptotically Distribution Free Criterion: if data non-parametric and requires at least 500 subjects, better with 2000. Satorra Bentler approach for bias normal data or non-parametric data and requires at least 500 subjects. Maximum Likelihood, Ordinary Least Squares, General Lest Squares: requires at least 100 subjects.
Model Specification
Features to be estimated can include: means and intercepts 'variances & error covariances & factor loadings. Models consist of: Inferences latent variables errors, variance of observed variables disturbances, variance of latent or unobserved variables Variables may be: observed or unobserved exogenous (unexplained variables) or endogenous (explained variables)
Identification
Observations = things we measure. Parameters - things we specify (b-weights, correlations etc) If we have more parameters than observations, we are trying to measure more than we know. A model with more parameters than observations has df<=0 & cannot be measured. This model is not identified. A model requires sufficient observations to enable it to be identified. A model cannot be identified if some of its parameters cannot be uniquely estimated. One observation with two parameters to be estimated cannot be resolved.
Model Structure
The measured model - all the measured variables and their error terms. The structural model - all the latent variables and the relationships between them, and between them and the measured model ie everything inferred.
Specifying the model - relationships
Three classes of relationships exist between variables: 1)Covariances or correlations 2) Direct factor loadings (regression weights) 3) Reciprocal factor loadings
Degrees of freedom
Under-identified df < q or df < 0 Over-identified df > q or df = 1 or more Just-identified df = q or df = 0
Model Components
Unobserved-exogenous (unmeasured inferred variables: typically errors, residuals or disturbances) Observed-endogenous (measured variables that contribute to other variables: explained by reference to a exogenous variable) Unobserved-endogenous (unmeasured variables - contribute to other variables, explained by reference to a exogenous variable) Observed-exogenous (measured variables that are explained no further, for instance an IV)
When would it be used.
When we have complex questions to answer. When we want to confirm or explore a theory. How well does your model (hypotheses) fit the data collected. When we have latent variables.