bstat multiple choice
required sample size
(Za/2(standard deviation)/desired margin of error (E))squared
4 characteristics of mean
-we need at least *interval* data -unique -sum of deviations = 0 -can be affected by outliers
calculated by dividing a data set's standard deviation by its mean,
CV is a unitless measure that allows for direct comparisons of mean -adjusted dispersion across different data sets
the squaring of differences from the mean emphasizes larger differences more than small ones while
MAD weighs large and small differences equally
interquartile range
Q3 - Q1
computing x for given probabilities
X = u + Zo
alternative hypothesis Ha
contradicts the default state or status quo
range is NOT a good measure of dispersion because
it only focuses on the extremes
reject the null if the p-value is less than a
not reject the null if the p-value is higher than a
weighted mean
relevant when some observations contribute more than others
the higher the sharpe ratio,
the better the investment compensates its investors for risk
covariance is 0 when
y and x have no linear relationship
test statistic
z = estimated value - hypothesized value / standard error
normal distribution is
-completely described by two parameters (mean and variance)
empirical rule
1 standard deviation: 68% 2 standard deviations: 95% 3 standard deviations: almost 100%
the variability between samples is measured by the standard error of
X bar if standard error is small, it implies that the sample means are not only close to one another, they are also close to the unknown population mean
t distribution
X bar - mean / S/square root of n
z distribution
X bar - mean/ o/ square root of n
standard transformation
Z = X - u/o
sharpe ratio
a ratio calculated by dividing the difference of the mean return from the risk-free rate by the asset's standard deviation -used to characterize how well the return of an asset compensates for the risk that the investor takes
standard normal distribution
a special case of the normal distribution with a mean of 0 and a standard deviation of 1
exponential distribution
a useful nonsymmetric continuous probability distribution is the exponential distribution used when were interested in times or distrances nonnegative
calculating the pth percentile
a. smallest to largest b. Lp = (n+1) p/100 c. if it's an integer, Lp denotes the location d. if not, you have to interpolate
margin of error
accounts for the standard error of the estimator and the desired confidence level of the interval
mean median and mode
all equal at the center of the distribution
covariance
an objective numerical measure that reveals the direction of the linear relationship between two variables
we interpret the geometric mean return as the
annualized return
transformation of normal random variables
any normally distributed random variable can be transformed into the standard normal variable with mean of zero and sd of 1
continuous uniform distribution
appropriate when the underlying random variable has an equally likely chance of assuming a value within a specified range
t distribution consists of a family of distributions where the actual shape of each one depends on the degrees of freedom
as df increases, the t becomes similar to the z distribution; it is identical when reaches infinity
it is also
asymptotic which means the tail gets closer and closer to the axis but never touches it
Mean Absolute Deviation
average of all absolute differences between the observations and the mean
variance (s2 and o2)
average of the squared differences between the observations and the mean
chapter 6 a continuous random variable
characterized by uncountable values because it can take on any value within an interval
main difference between these two rules
chebyshev's applies to all data sets whereas empirical is appropriate when the distribution is symmetric and bell-shaped
normal distributions serve as the
cornerstone of statistical inference
lognormal distribution
defined with reference to the normal distribution positively skewed and it is relevant for a positive random variable useful for describing variables such as income, real estate values, and asset prices
variance describes
dispersion (shape)
most researchers favor the p-value approach since
every statistical software package reports p-values
normal probability distribution also referred to as the "gaussian distribution"
familiar bell-shaped distribution closely approximates the probability distribution for a wide range of random variables of interest
if a random sample is taken from a normal population with a finite variance, then the t statistic
follows the t distribution with (n-1) degrees of freedom
approximately 100-p percent have values
greater than the pth percentile
examples of random variables that follow a normal distribution
heights and weights of newborn babies scores of SAT cumulative debt of college graduates advertising expenditure of firms rate of return on investment
a z-score is a unitless measure since
its numerator and denominator have the same units, which cancel each other out
approximately p percent of observations have values
less than the pth percentile
mean describes
location
finding a z value for a given probability
look up probability in the body of the table
finding a probability for a given z value
looking at the z-chart
positive skewness negative skewness symmetric
mean > mode (+) mode > mean (-) mode = mean
the modes usefulness seems to diminish with data sets that have
more than three modes
two + modes
multimodal
geometric mean
multiplicative average as opposed to an additive average
selecting n to estimate p
n=(Za/2/E)squared (p(1-p)
4 levels of measurement ( lowest - highest)
nominal: placing into categories, not measuring ordinal: ranking involved, one has more or less *must be mutually exclusive* interval: equal interval between categories ratio: showing absence of what is being measured
the p-value is the
observed probability of making a type I error
if the minimum and maximum values of the population are available, a rough approximation for the population standard deviation is given by
range/4
implementing a two-tailed test using a confidence interval
reject null if the mean does not fall within confidence interval
hypothesis test for a population proportion
test statistic for p sample proportion - hypothesized value /square root of hypothesized value (1- hypothesized value)/n
chebyshev's theorem
the proportion of observations that lie within k standard deviations from the mean is at least 1-1/k2 where k is any number greater than 1
if no other reasonable estimate of the population proportion is available, we can use .5
the required sample is largest when p=.5
range
the simplest measure of dispersion greatest value - smallest value
the margin of error in a confidence interval depends on
the standard error of the estimator and the desired confidence level
unlike cumulative probabilities in the z table,
the t table provides probabilities in the upper tail of the distribution
chapter 3 central location relates to
the way quantitative data tend to cluster around some middle or central value
for a given confidence level and population standard deviation o, *the smaller the sample size n*,
the wider the confidence interval
for a given confidence level and sample size n, *the larger the population standard deviation o*,
the wider the confidence interval
for a given sample size n and population standard deviation o, the greater the confidence level,
the wider the confidence interval
the precision is directly linked with the width of the confidence interval
the wider the interval, the lower the precision
width of a confidence interval
two times the margin of error
one mode
unimodal
z-scores
used to find the relative position of a sample value within the data set by dividing the deviation of the sample value from the mean by the standard deviation z = x-mean/s
characteristics of a mode
useful for *nominal* + level data not affected by outliers not unique (you can have more than one)
rejecting the null at 1% significance level
very strong evidence that its false
by reducing the likelihood of a type 1 error,
we increase the likelihood of a type 2 error and vice versa
a two-tailed test
when the alternative hypothesis includes "not equal to"
type 2 error
when we do not reject the null when it is false
type 1 error
when we reject a true hypothesis
all t distributions have slightly broader tails than the
z distribution
the probability that a continuous random variable assumes a particular value x is
zero because we can't assign a nonzero probability to each of the uncountable values
rejecting a null at 5% significance level
strong evidence that its false
confidence coefficient (1-a)
the probability that the estimation procedure will generate an interval that contains u
for a continuous random variable,
it is only meaningful to calculate the probability that the value of a random variable falls within some interval
significance level, alpha
the probability that the estimation procedure will generate an interval that does not contain u
chapter 8 when a statistic is used to estimate a parameter,
it is referred to as a point estimator and a particular value of the estimator is called a point estimate
characteristics of the median
-at least *ordinal* level data -not affected by outliers -unique (only one)
correlation coefficient
-describes both the direction and the strength of a linear relationship between x and y -unit-free -value falls within 1 and -1
probability density function f(x) has the following properties
-f(x) > for all possible values of x of X -the area under f(x) over all values x of X equals one
three steps when formulating the competing hypothesis
1. identify relevant population parameter of interest 2. determine whether its one sided or two sided 3. include some form of the equality sign in the null hypothesis and use alternative to establish a claim
the median is also called the
50th percentile
selecting the required sample size
if we are able to increase the size of the sample, the larger n reduces the margin of error for the interval estimates
a positive value indicates a positive linear relationship
if x is above its mean, then y tends to be above its mean and vice versa
negative value indicates negative linear relationship
if x is above the mean, y tends to be below and vice versa
a one-tailed test
involves a null hypothesis that can only be rejected on one side of the hypothesized value
the allowed probability of making a type 1 error (rejecting a true hypothesis)
is a or the significance level
like the z distribution, the t distribution
is bell shaped and symmetric around 0 with asymptotic tails
main advantage of chebyshev's theorem
it applies to all data sets, regardless of the shape of the distribution
informally, we can report with 95% confidence that u lies in the interval
it is not correct to say that there is a 95% chance that u lies in the given interval
confidence interval for population proportion
p +- Za/2(square root of p(1-p)/n)
hypothesis test for population when standard deviation is known
p value approach and the critical value approach
the confidence interval for the population mean and the population proportion is constructed as
point estimate +- margin of error
mean-variance analysis
postulates that the performance of an asset is measured by its rate of return, and this rate of return is evaluated in terms of its mean and variance. higher average returns = higher risk
percentiles
provide detailed information about how data are spread over the interval from smallest to largest values
confidence interval or interval estimate
provides a range of values that with a certain level of confidence contains a population parameter of interest
chapter 9 we use hypothesis testing to
resolve conflicts between 2 competing hypotheses on a particular population parameter of interest
sample mean vs population mean
sample: x bar; statistic population: u, parameter
coefficient of variation (s/x bar) or (u/o)
serves as a relative measure of dispersion and adjusts for differences in the magnitudes of the means
rejecting a null at the 10% significance level
some evidence that its false
standard deviation (s and o)
square root of variance
standard error
standard deviation / square root of sample size
converting sample data into z-scores is called
standardizing the data
the degrees of freedom determine
the extent of broadness of the tails of the distribution; the fewer degrees of freedom, the broader the tails
unlike the exponential distribution whose failure rate is constant,
the failure rate of the lognormal distribution may increase of decrease over time
p-value
the likelihood of obtaining a sample mean that is at least as extreme as the one derived from the given sample, under the assumption that the null is true
null hypothesis Ho
the presumed default state of nature or status quo
the arithmetic mean
the primary measure of central location "average"
for a discrete random variable, we can compute
the probability that it assumes a particular value x.