Quantitative Analysis

¡Supera tus tareas y exámenes ahora con Quizwiz!

Nonparametric statistics

procedures that allow inferences to be derived from small samples that were not drawn for the normal population

when r = -1 or +1

relationship between two variables is deterministic

Q1 and Q3 position

round the answers to (n + 1)/4 and 3(n + 1)/4

Q1 and Q3 using interpolation

same calculation but use calculated difference between the two data points instead of rounding

X with line over it

sample mean

n

sample size

S

sample standard deviation

S^2

sample variance

calculate R square if left blank in regression model

SSR/SST, or "Multiple R"^2

Two model building statistics that can be read directly from Excel's regression analysis output are:

Se and adjusted R-squared

µ

population mean

N

population size

σ

population standard deviation

σ^2

population variance

Z score calculation

(data point - mean)/standard deviation

calculate coefficient if left blank in regression model

(standard error of variable) x (t stat of variable)

The entire area under a probability density function equates to

1

Adjusted R-Sqared formula

1 - SSE/SST with some extra n-1 inputs after each

Possible consequences of overfitting a regression model

1) an inflated estimate of population variance 2) excessively wide prediction confidence intervals 3) multicollinearity

Characteristic of a standard normal probability density function

1) extends to infinity in both directions 2) mean is 0 3) variance is 1

3 reasons a client might value statistical analysis

1) greater confidence in the value opinion 2) understanding of reasonable price ranges 3) understanding of the appraisal work product

5 reasons appraisers should care about statistics

1) may improve work product 2) may be valued by clients 3) understanding of AVMs 4) Keeping up to date with valuation literature 5) respond to and review the work of others

3 criteria for assessing normality

1) mean and median are about equal 2) IQR is 1.33 times S 3) distribution is bell shaped

five assumptions underlying linear regression modeling area:

1) the relationship between y and x is linear 2) expected value of regression errors is 0 3) variance of regression errors is constant 4) regression errors are normally distributed 5) regression errors are independent

A sample dataset will optimally include ________ observations for each predictor variable included in a linear regression model

10 to 15

How many classes are generally recommended for histograms

5 to 15

1S, 2S, 3S percentages on a normal distribution

68%, 95%, 99%

coefficient

Adjustment factors in the sales comparison approach are analogous to regression

Population; sample

Consists of all items being studies; is selected from a population

In regression output, how do you calculate regression error?

Difference between actual data point and regression output based on an x input

one-tailed statistical test

Ho: µ ≤ 0

In regression output, how do you determine if the correlation between y and x is direct or inverse?

If coefficient on x is positive it's direct

Three most typically used measures of central tendancy

Mean, Median, Mode (mean is most apt to be impacted by an extreme value)

ratio of SSR to SST =

R-squared

t statistic is used in lieu of the z statistic whenever:

S is used to estimate σ

What is the correlation between the data's actual and predicted values?

The "Multiple R"

Representative Data

The primary determinant of the validity of an opinion of value derived from a linear regression model

If x = 20, µ = 10, and σ = 8

Z score is 1.25 - (x-µ / σ)

panel data

a combination of cross-sectional data (common point in time) and time-series data (multiple points in time)

probability density function (or probability distribution)

a mathematical function that defines a continuous curve where the total are under the curve equals one and the area under the curve within a given interval equates to the probability of an outcome being within the interval

subjective probability

a nonscientific personal evaluation of the relative likelihood of unknown events

time trend

a simple linear regression line fit to a time series scatter plot

trendline

a straight or curved line superimposed on a scatter plot indicating the nature of the relationship between two variables

using indicator variables and interaction variables together

allows an analyst to derive multiple equations having different intercepts and slopes

using indicator variables to account for a time construct allows an analyst to:

account for market conditions in panel data

Indicator (dummy) variables

allow an analyst to derive multiple equations having similar slopes but different intercepts

time series data

consist of change in a variable of interest over time

alternative hypothesis and research hypothesis are

always the same

At a given level of confidence, prediction intervals are:

always wider than prediction confidence intervals

coefficient of variation (COV)

an expression of the sample standard deviation as a percentage of the sample mean

P(-0.5 ≤ Z ≤ 1)

answer to a probability question where you are given a mean, a standard deviation, and two data points where: Z = (data point-mean)/standard deviation

cross-sectional data

consists of observations on similar events at a common point in time

When variables are not correctly being measured they lack

construct validity

scatter plot

can provide a pictorial illustration of the strength of a relationship between two variables, variable ranges, and how the subject property value conclusion comports with the data

which sampling is appropriate for a situations involving naturally occurring geographic groups

cluster sampling

Multicollinearity

correlation among two or more explanatory variables (x, independent, predictor) in a multiple regression model

Standard A refers to production of a (an) ___________ appraisal

credible

squared deviation

deviation^2

constant error variance

expected error should be constant across a range of x values, a scatter plot which expands as x increases contradicts this assumption

Data that are not representative of the population being studied lack

external validity

What type of validity is threatened when the data is not representative of the population being studied?

external validity

use two norm.dist() functions in excel to:

find the area under a distribution curve between two input values (by calculating difference between results)

why are forecasts inherently more uncertain than predictions?

forecasts predict outside the data range

way to reduce the prospect of sampling error

increase sample size

deviation

individual data point - population mean

In a multiple linear regression model, the t statistic is used to:

individually test the null hypotheses that each of the regression coefficients is 0

it is unethical to crease an _________ misleading chart

intentionally

IQR

interquartile range - Q3 - Q1

linear regression is considered "best fitting" when:

it minimizes the sum of squared errors

valid measures

lack bias and capture the true meaning of what is being measured

ideal chart uses the ______ ink possible

least

if data is left-skewed, mean is _____ than median

less than

probability

likelihood that a particular event will occur

An outcome variable has been growing exponentially, and growth was estimated to conform to the equation ln y = 1.5 0.2 +0.2t. What is the extrapolated forecast y value for future time period t = 10?

ln y = 1.5 + 2 = 3.5. y = e^3.5 = 33.12

in a simple linear regression, prediction confidence intervals are always narrowest at:

mean of x and y

We assess model improvement using the Se and adjusted R-square by

minimizing Se and maximizing adjusted R-squared

five-number summary

minimum, Q1, median, Q3, maximum

histogram classes should be

mutually exclusive and collectively exhaustive

unlike histograms, bar charts and pie charges represent:

nominal categories

Frequency distribution tables are used to:

organize numerical data by ordinal categories called "classes"

sample standard deviation

square root of sample variance

In regression output, what is the y, x correlation coefficient

square root of the "R square" cell

which sampling is appropriate for situations where the analyst wants to assure proportional sample representation for subpopulation

stratified random sampling

sample variance

sum of the squared deviations from the mean, divided by n-1

In a multiple linear regression model, the F statistic is used to:

test the null hypothesis that all of the regression coefficients are 0

population variance

the average of the squared deviations in a population

Reliability

the extent to which the same results would have been obtained in repeated trials

correlation; correlation coefficient

the extent to which two variable move together; can range from -1 to +1

Forecasting

the primary exception to not predicting outside of the date range

conditional probability

the probability of an event occurring when the probability of occurrence DOES depend on the outcome of a prior event

simple probability

the probability of an event occurring when the probability of occurrence does NOT depend on the outcome of a prior event

P-value reported in linear regression output indicates:

the probability that the true population coefficient value be 0 (lower P value indicates stronger correlation)

ratio of SSR to SST = R-squared means:

the proportion of variance in the response variable accounted for by the regression equation model

Central Limit Theorem

the sampling distribution of the mean is normal when n >= 30

The research design process is similar to

the valuation process

extrapolative regression model

uses past values of the dependent variable to generate forecasts

Sampling error

variance from the expectation that the central tendency and shape of a sample will mirror the parent population

a simple linear regression line always passes through

x and y with line over it (means)

In the linear equation y = a + bx + e

y = dependent (outcome, response) x = independent (predictor, explanatory) a = y intercept b = slope e = regression error

uniform probability distribution

y = f(x) = 1/(b-a), when a <= x <= b and 0 elsewhere

A variable is considered significant in a regression model when:

you can reject the null hypothesis that the coefficient is 0 (when the 95% confidence does not include 0)


Conjuntos de estudio relacionados

Transition to Modern Times: Unit Test Review

View Set

Economics Ch: 9.1 Introduction to Monopolistic Competition and Oligopoly

View Set

InQuizitive: Chapter 25: The New Deal, 1933-1939

View Set

Research Methods & Statistics Review 2

View Set

Macroeconomics Mid-Term Practice

View Set