Chapter 6: Quantitive data Analysis
1. regression model will not premit the examination of time constant independent variables 2. standard errors of fixed effects estimates may be larger than those for random effects estimates
2 disadvantages of fixed effects model?
parametric and non parametric statistics
2 major categories statistical tests can be classified into?
T-test for independent samples T-test for paired cases Sign test Mann-Whitney U-test Pearson correlation Spearman coefficient
A variety of tests are available to test differences of means (6)
Ordinary least-squares regression (for standard causal relationships) Ordinary least-squares regression with dummy variables (for simple conditional relationships) Moderated regression (for moderating variables) Path analysis (partial regressionfor intervening variables)
Alternative forms of regression analysis (4)
the fixed effects model should be used
If the Hausman tests are significant, so that we reject the null hypothesis, then which model should be used?
heteroskedasticity
If the variance is unequal across values of explanatory variable, then the situation is known as
serial correlation
In a panel data set, the which correlation is likely to have a more substantial influence on the estimated covariance matrix of the least squares estimator than heteroskedasticity?
t test for paired cases
In this case we can observe non-zero differences between the two samples, but are these differences in the same direction and big enough not to be attributed to chance? which test?
-The groups being identified are clearly separate; -The explanatory variables are close to being normally distributed, or can be transformed to be so, -There is no multicollinearity between the explanatory variables.
Linear discriminant analysis model is used when (3)
-lack of formal training of most practitioners; -criticism of the statistical assumptions underlying the models; -failure to include non-financial variables widely accepted as useful discriminators.
Simnett and Trotman (1992) identify three key reasons for non-use of financial distress models in practice:
regression analysis
The degree of association between variables and any causal relationships between those variables, in order to develop an explanatory relationship which allows us to show how and why key variables are changing.
the measurement level,
The most appropriate measure of association is again determined by what?
the percentage of the variation in one variable explained by changes in the other.
The square of pearson correlation coefficient is called the coefficient of determination (R2), and indicates what?
hausman test
This test posits a null hypothesis that the random effects estimates are identical to the fixed effects estimates.
discriminant anlysis
We use this model when the variables we want to explain are not of continuous nature.These can be quantified by assigning dummy variables of the (1,2,3) or (0,1) variety to reflect the alternative states, but in each case these are the only values that the dependent variable can take.
we want to compare the observed value with what we expected to find and judge whether the diff is big enough to be attributed to chance or not
What are we trying to compare in statistical testing?
t test for paired cases
Where the samples drawn are not independent, but represent a before-and-after situation involving the same subjects (usually people), then we have a repeated measures situation for which more powerful statistical tests are available. Which test?
unrepresentative
Where the sampling is not random, the resulting samples will be potentially.
cross-tabulation
Which analystical tool is this? Most popular analytical tool Allows us to compare "observed" with "expected" We need the following data: Mean, St. Dev., CI and Range
cross-tabulation
Which analystical tool is this? If "observed" value is outside the range, reject H0 Find equivalent Z score and compare with CV, reject H0 if Z score>CV
4
Which step is this in the test procedure? Choose the appropriate test statistic
5
Which step is this in the test procedure? Compare the "observed" and "expected" values and compute the test statistic
7
Which step is this in the test procedure? Compare the Test Statistic with the Critical Value (if TS>CV reject H0)
6
Which step is this in the test procedure? Identify CV, df (n-1)
1
Which step is this in the test procedure? State the H0
3
Which step is this in the test procedure? Choose the level of significance (i.e. 0.05, 0.01 etc.)
2
Which step is this in the test procedure?Identify the most appropriate test
use non parametric stats
Which test do we use when there is doubt about the quality of the data of the underlying assumptions?
inefficient & biased
With the presence of heteroskedasticity, consistent estimates of the regression coefficients can still be produced; nevertheless, these estimates are "x" and the standard errors of the estimates will be "x"
conduct a single test - a one-way analysis of variance
in anova how do we determine whether the variation be the samples is greater that the variation evident within the samples?
significant & cannot
if the variation bw 2 samples is bigger than the variation within the samples than we observe a difference that is statistically "significant/not significant" And it "can/cannot" be attributed to a chance occurance
spearman correlation
based on the ranked values for each variable rather than the raw data. often used to evaluate relationships involving ordinal variables.
= z statistic at 5%significance
critical value in sign test?
LISREL
embraces path analysis and structural equation modelling, among others, and can handle dependent and independent variables, which may be nominal, ordinal, interval or ratio scale measures.
manova
evaluates the differences between the multivariate means (centroids) of several populations, on the null hypothesis of equality1. Univariate F-tests run on each of the dependent variables, or 2. Multiple discriminant analysis. which analysis
pearson correlation
evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable.
the samples are drawn from different populations, and reject the null hypothesis.
for independent samples, If the difference between the samples is greater than that within the samples then we would infer that:
We answer this by testing the difference between the two sample means, relative to the variation that exists within the samples.
how do we answer the q for independent samples?
7 steps
how many steps in a test procedure?
Measurement level and the way in which the sample has been drawn is how to determine which test to use
how to determine which test to use
Multivariate analysis of variance (MANOVA)
similiar to anove but it can handle more than one dependent variable
the between-sample variance is sufficiently greater than the within-sample variance for us to reject the null hypothesis and infer that the samples are indeed drawn from different populations. Smith, Malcolm. Research Methods in Accounting (p. 87). SAGE Publications. Kindle Edition.
in anova when do we reject null hypothesis
test statistic exceeds the critical value, leading us to reject the null hypothesis and infer that the samples are indeed drawn from separate populations.
in kruskal wallis when do you reject null hypothesis?
mann-whitney U test
nonparametric alternative where we have indepedent samples? which test?
able to distinguish between failed and non-failed firms with very high degrees of accuracy
purpose of LDA equation
could these three samples conceivably have been drawn from the same population, or are the differences between them too large for that to be realistic?
question we want to answer in ANOVA
PARAMETRIC STATISTICS
require that data be drawn from normal distributions, which are smooth, bell-shaped symmetrical curves, defined by mean and standard deviation measures.
Random Effects Model
which model? Homoskedasticity refers to the assumption that dependent variables have equal levels of variance across the range of explanatory variables.
NONPARAMETRIC STATISTICS
statistics make no such assumptions regarding the underlying distribution; they describe relationships in terms of frequencies, rankings, and directional signs, rather than means and standard deviations. Smith, Malcolm. Research Methods in Accounting (p. 74). SAGE Publications. Kindle Edition.
= observed-expected/(standard deviation)
test statistic in sign test =?
If a model fails to perform well for the hold-out sample
these Two possibilities exist IF WHAT?: (1) the model is sample-specific, or (2) the hold-out sample is not representative of the population from which it was drawn. Since data are difficult to collect, especially matched samples, hold-out samples are often seen as a luxury.
panel data
typically provide time observations for a number of different individual variables ( a cross sectional dimension and a time series dimension) refers to what type of data?
pearson correlation and spreaman correlation
what are the two measures of association?
1. create dummy variables or 2. constructing mean deviations (simpler)
what are two alternative methods to achieve a fixed effects approach?
longitudinal data
what is panel data often referred to as?
is: are these samples similar enough to each other for us to judge that they could conceivably have been drawn from the same population?
what is the q we want to answer in independent samples?
descriptive studies
what type of studies often record simple proportions, cross-tabulations and measures of association, even where there is no formal hypothesis testing or model building. Smith, Malcolm. Research Methods in Accounting (p. 74). SAGE Publications. Kindle Edition.
descriptive/simple statistics
what type of study? We want to know if the observed values differ significantly from what we would expect if, in fact, no relationship existed at all, Smith, Malcolm. Research Methods in Accounting (p. 74). SAGE Publications. Kindle Edition.
When test statistic>critical value null hypothesis cannot be rejected, the critical value (1.96) comfortably exceeds the test statistic (1.38).
when do you reject null hypothesis in a mann whitney u test?
when test statistic>critical value
when do you reject null hypothesis in a sign test?
when drawing individuals randomly from a large population in order to make inferences about the characteristics of the population
when is random effects model appropiate?
Ordinary least-squares regression
which Alternative forms of regression analysis is this? The strength of any linear relationship: Y = a + bX measures the vertical deviation of points away from the fitted line
parametric preferred
which is preferred, parametric or nonparametric stats?
the random effects model is simpler bc it will lead to more efficient estimates
which is simpler: random effects model or fixed effects model & why
Random Effects Model
which model? Assumption- Unobserved variable is uncorrelated with each explanatory variable, regardless of whether they are fixed over time.
Fixed effects model
which model? Form of regression that can control variables that have not been measured or which cannot be measured.
Random Effects Model
which model? Increasing panel-data literature indicate that these models are likely to have substantial cross-sectional dependence in the errors, because of the presence of common stocks & unobserved components.
Fixed effects model
which model? Mean deviation method--Each time-varying variable, the means over time are computed; then are subtracted from the observed values of each variable.
Fixed effects model
which model? Treats unobserved time-invariant differences of individuals as a set of fixed parameters, which can be directly estimated.
Fixed effects model
which model? Where zero values results, the time constant explanatory variables are dropped from the equation and are controlled. This is a simple method.
Random Effects Model
which model? Advantage- Time constant explanatory variables need not be eliminated because this model assumes that the unobserved variable is uncorrelated with each explanatory variable
kruskal wallis test
which test? dictate that we treat the data as ordinal, rather than ratio, and employ non-parametric methods.
kruskal wallis test
which test? use the rank order of the data, rather than actual values,
sign test
which test? As a non-parametric alternative Determine the number of positive and negative difference we have in paired cases
bc it assumes no correlation bw the unobserved variables and the observed variables - assumptions that may introduce bias if they are shown to be false
why did Allson (2009) sugest that the random effects model does not really control for unobserved heterogeneity?
to determine whether the biases inherent in the random effects approach are smlal enough to ignore or whether the less restrictive fixed effects model is more appropiate
why does allison suggest the use of the hausman specification test?