Quantitative Analysis
Nonparametric statistics
procedures that allow inferences to be derived from small samples that were not drawn for the normal population
when r = -1 or +1
relationship between two variables is deterministic
Q1 and Q3 position
round the answers to (n + 1)/4 and 3(n + 1)/4
Q1 and Q3 using interpolation
same calculation but use calculated difference between the two data points instead of rounding
X with line over it
sample mean
n
sample size
S
sample standard deviation
S^2
sample variance
calculate R square if left blank in regression model
SSR/SST, or "Multiple R"^2
Two model building statistics that can be read directly from Excel's regression analysis output are:
Se and adjusted R-squared
µ
population mean
N
population size
σ
population standard deviation
σ^2
population variance
Z score calculation
(data point - mean)/standard deviation
calculate coefficient if left blank in regression model
(standard error of variable) x (t stat of variable)
The entire area under a probability density function equates to
1
Adjusted R-Sqared formula
1 - SSE/SST with some extra n-1 inputs after each
Possible consequences of overfitting a regression model
1) an inflated estimate of population variance 2) excessively wide prediction confidence intervals 3) multicollinearity
Characteristic of a standard normal probability density function
1) extends to infinity in both directions 2) mean is 0 3) variance is 1
3 reasons a client might value statistical analysis
1) greater confidence in the value opinion 2) understanding of reasonable price ranges 3) understanding of the appraisal work product
5 reasons appraisers should care about statistics
1) may improve work product 2) may be valued by clients 3) understanding of AVMs 4) Keeping up to date with valuation literature 5) respond to and review the work of others
3 criteria for assessing normality
1) mean and median are about equal 2) IQR is 1.33 times S 3) distribution is bell shaped
five assumptions underlying linear regression modeling area:
1) the relationship between y and x is linear 2) expected value of regression errors is 0 3) variance of regression errors is constant 4) regression errors are normally distributed 5) regression errors are independent
A sample dataset will optimally include ________ observations for each predictor variable included in a linear regression model
10 to 15
How many classes are generally recommended for histograms
5 to 15
1S, 2S, 3S percentages on a normal distribution
68%, 95%, 99%
coefficient
Adjustment factors in the sales comparison approach are analogous to regression
Population; sample
Consists of all items being studies; is selected from a population
In regression output, how do you calculate regression error?
Difference between actual data point and regression output based on an x input
one-tailed statistical test
Ho: µ ≤ 0
In regression output, how do you determine if the correlation between y and x is direct or inverse?
If coefficient on x is positive it's direct
Three most typically used measures of central tendancy
Mean, Median, Mode (mean is most apt to be impacted by an extreme value)
ratio of SSR to SST =
R-squared
t statistic is used in lieu of the z statistic whenever:
S is used to estimate σ
What is the correlation between the data's actual and predicted values?
The "Multiple R"
Representative Data
The primary determinant of the validity of an opinion of value derived from a linear regression model
If x = 20, µ = 10, and σ = 8
Z score is 1.25 - (x-µ / σ)
panel data
a combination of cross-sectional data (common point in time) and time-series data (multiple points in time)
probability density function (or probability distribution)
a mathematical function that defines a continuous curve where the total are under the curve equals one and the area under the curve within a given interval equates to the probability of an outcome being within the interval
subjective probability
a nonscientific personal evaluation of the relative likelihood of unknown events
time trend
a simple linear regression line fit to a time series scatter plot
trendline
a straight or curved line superimposed on a scatter plot indicating the nature of the relationship between two variables
using indicator variables and interaction variables together
allows an analyst to derive multiple equations having different intercepts and slopes
using indicator variables to account for a time construct allows an analyst to:
account for market conditions in panel data
Indicator (dummy) variables
allow an analyst to derive multiple equations having similar slopes but different intercepts
time series data
consist of change in a variable of interest over time
alternative hypothesis and research hypothesis are
always the same
At a given level of confidence, prediction intervals are:
always wider than prediction confidence intervals
coefficient of variation (COV)
an expression of the sample standard deviation as a percentage of the sample mean
P(-0.5 ≤ Z ≤ 1)
answer to a probability question where you are given a mean, a standard deviation, and two data points where: Z = (data point-mean)/standard deviation
cross-sectional data
consists of observations on similar events at a common point in time
When variables are not correctly being measured they lack
construct validity
scatter plot
can provide a pictorial illustration of the strength of a relationship between two variables, variable ranges, and how the subject property value conclusion comports with the data
which sampling is appropriate for a situations involving naturally occurring geographic groups
cluster sampling
Multicollinearity
correlation among two or more explanatory variables (x, independent, predictor) in a multiple regression model
Standard A refers to production of a (an) ___________ appraisal
credible
squared deviation
deviation^2
constant error variance
expected error should be constant across a range of x values, a scatter plot which expands as x increases contradicts this assumption
Data that are not representative of the population being studied lack
external validity
What type of validity is threatened when the data is not representative of the population being studied?
external validity
use two norm.dist() functions in excel to:
find the area under a distribution curve between two input values (by calculating difference between results)
why are forecasts inherently more uncertain than predictions?
forecasts predict outside the data range
way to reduce the prospect of sampling error
increase sample size
deviation
individual data point - population mean
In a multiple linear regression model, the t statistic is used to:
individually test the null hypotheses that each of the regression coefficients is 0
it is unethical to crease an _________ misleading chart
intentionally
IQR
interquartile range - Q3 - Q1
linear regression is considered "best fitting" when:
it minimizes the sum of squared errors
valid measures
lack bias and capture the true meaning of what is being measured
ideal chart uses the ______ ink possible
least
if data is left-skewed, mean is _____ than median
less than
probability
likelihood that a particular event will occur
An outcome variable has been growing exponentially, and growth was estimated to conform to the equation ln y = 1.5 0.2 +0.2t. What is the extrapolated forecast y value for future time period t = 10?
ln y = 1.5 + 2 = 3.5. y = e^3.5 = 33.12
in a simple linear regression, prediction confidence intervals are always narrowest at:
mean of x and y
We assess model improvement using the Se and adjusted R-square by
minimizing Se and maximizing adjusted R-squared
five-number summary
minimum, Q1, median, Q3, maximum
histogram classes should be
mutually exclusive and collectively exhaustive
unlike histograms, bar charts and pie charges represent:
nominal categories
Frequency distribution tables are used to:
organize numerical data by ordinal categories called "classes"
sample standard deviation
square root of sample variance
In regression output, what is the y, x correlation coefficient
square root of the "R square" cell
which sampling is appropriate for situations where the analyst wants to assure proportional sample representation for subpopulation
stratified random sampling
sample variance
sum of the squared deviations from the mean, divided by n-1
In a multiple linear regression model, the F statistic is used to:
test the null hypothesis that all of the regression coefficients are 0
population variance
the average of the squared deviations in a population
Reliability
the extent to which the same results would have been obtained in repeated trials
correlation; correlation coefficient
the extent to which two variable move together; can range from -1 to +1
Forecasting
the primary exception to not predicting outside of the date range
conditional probability
the probability of an event occurring when the probability of occurrence DOES depend on the outcome of a prior event
simple probability
the probability of an event occurring when the probability of occurrence does NOT depend on the outcome of a prior event
P-value reported in linear regression output indicates:
the probability that the true population coefficient value be 0 (lower P value indicates stronger correlation)
ratio of SSR to SST = R-squared means:
the proportion of variance in the response variable accounted for by the regression equation model
Central Limit Theorem
the sampling distribution of the mean is normal when n >= 30
The research design process is similar to
the valuation process
extrapolative regression model
uses past values of the dependent variable to generate forecasts
Sampling error
variance from the expectation that the central tendency and shape of a sample will mirror the parent population
a simple linear regression line always passes through
x and y with line over it (means)
In the linear equation y = a + bx + e
y = dependent (outcome, response) x = independent (predictor, explanatory) a = y intercept b = slope e = regression error
uniform probability distribution
y = f(x) = 1/(b-a), when a <= x <= b and 0 elsewhere
A variable is considered significant in a regression model when:
you can reject the null hypothesis that the coefficient is 0 (when the 95% confidence does not include 0)