Descriptive Statistics

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A ________ can be displayed using a table or a diagram

distribution

The ____ variable is labeled X and is on _____ axis; Y is the _______ variable and is mapped on the _____ axis Y=f(x)

independent, horizontal, dependent, vertical

Negative kurtosis

indicates a relatively flat distribution

Positive kurtosis

indicates a relatively peaked distribution

We _______ something about the _________ using the _________ through statistical ________.

infer, population, sample, inference

Statistical inference

is the process of making an estimate, prediction, or decision about a population based on a sample.

Volatility clustering

large changes tend to be followed by large changes, of either sign, and small changes tend to be followed by small changes." (Mandelbrot 1963)

for small changes in p

ln(p_t )-ln(p_(t-1) ) ≈(p_t-p_(t-1))/p_(t-1)

Stock prices are assumed to follow a _______; as a result, the log of its prices has normal distribution and ________

lognormal distribution, log returns are normally distributed

A ________ is some characteristic of a population or sample, e.g. inflation, gender, stock price

variable

Statistical analysis plays an important role in _____

virtually all aspects of business and economics.

How to calculate b0 and b1?

y ̂=b_o+b_1 x b_1=(Covariance (x,y))/(Variance(x)) b_o=y ̅-b_1 x ̅

Interval-ratio variables

§Values are real numbers. §All calculations are valid. §Data may be treated as ordinal or nominal.

Nominal variables

§Values are the arbitrary numbers that represent categories. §Only calculations based on the frequencies of occurrence are valid. §Data may not be treated as ordinal or interval.

Ordinal variables

§Values must represent the ranked order of the data. §Calculations based on an ordering process are valid. §Data may be treated as nominal but not as interval.

Mean

μ x ̅

Coefficient of correlation

ρ_xy r_xy

Standard deviation

σ s

Variance

σ^2 s^2

Covariance

σ_xy s_xy

Sample

— A sample is a set of data drawn from the population. — Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day

Population

— a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters

Nature of time series data II

•A sequence of random variables indexed by time is called a stochastic process or time series process (data generating process) •Stationary time series: It denotes a time series whose statistical properties are independent of time: §The process generating the data has a constant mean §The variability of the time series is constant over time •What is the population in case of time series data? §When we collect time series data, we obtain one possible outcome (realisation) of the stochastic process §We can only see a single realisation of all possible realisations that might have occued if certain conditions in history had been different §The set of all possible realisations of a time series process plays the role of the population in cross-sectional analysis

Systemic and firm-specific risk

•Beta (the slope coefficient) is a measure of the stock's market related (or systemic) risk - it measures the volatility of the stock price that is related to the overall market volatility •The coefficient of determination (R²) measures the proportion of the total risk (=market-related risk + firm-specific risk) that is market related -R² = 0.65 à 65% of GE's total risk is market related and 35% are firm-specific (or nonsystemic or idiosyncratic) risk -The firm-specific risk is attributable to variables that are not included in the market model (eg GE's managerial competencies,..) -The firm-specific risk (market-related risk) can (cannot) be diversified away by creating a diversified portfolio of stocks

Coefficient of Correlation I

•Coefficient of correlation (r_xy) measures the strength of linear association between two numerical variables •r_xy is between [-1,+1] and r_xy=r_yx •Correlation coefficient is defined as the covariance divided by the standard deviations of the variables oPopulation coefficient of correlation: ρ_xy=σ_xy/(σ_x σ_y ) oSample coefficient of correlation: r_xy=s_xy/(s_x s_y ) •Correlation between assets is highly relevant for diversification oMost correlations in finance are positive as there is a mutual dependence on the economy (business cycle) oCorrelation between most pairs of companies is between 0.2 to 0.3

Inference statistics 2 techniques

•Estimation and hypothesis testing are the two techniques of inference statistics.

Correlation vs causation

•If two variables are linearly related it does not mean that X is causing Y: Correlation is not Causation •Establishing causality in social sciences is very challenging •The most convincing way to search for causal effects of X on Y would be to run an experiment with a treatment and a control group •A well designed experiment controls for confounding variables •The opposite of experimental studies are observational studies - they use nonexperimental (observational) data

Log-returns

•In quantitative finance, returns are usually calculated by using the natural log (continuously compound return): r_t=ln(p_t )-ln(p_(t-1) )=∆〖ln⁡(p〗_t)

Coefficient of Determination (R²)

•In the case of simple linear regression, R² is calculated by squaring the coefficient of correlation, i.e. r² •The coefficient of determination measures the amount of variation in the dependent variable that is explained by the variation in the independent variable •Example: R² = 0.65, i.e. 65% of the variation of Y is explained by the variation of X

Autocorrelation

•In time series data, the value of Yt in one period is typically correlated with its past value Yt-1 and its future value Yt+1 •The correlation of a series with its own lagged values is called autocorrelation or serial correlation •The first (j^th) lag of a time series Y_t is Y_(t-1) 〖(Y〗_(t-j)) •The first (second) autocorrelation coefficient r_1(r_2) is the correlation between Y_t and Y_(t-1)(Y_(t-2))

What is Statistics?

"Statistics is a way to get information from data"

Measures of reliability to make inferential statistics more correct

-For this reason, we build into the statistical inference "measures of reliability", namely confidence level and significance level •Confidence level: Proportion of times an estimating procedure will be correct (eg. 95%) •Significance level: Measures how frequently a conclusion (result of a hypothesis test) about the population will be wrong (eg 5%)

Why use statistical inference?

-Large populations make investigating each member impractical and expensive. -Easier and cheaper to take a sample and make estimates about the population from the sample.

•Application in Finance

-Sharpe Ratio -Market Model

Cons of statistical inference

-Such conclusions and estimates are not always going to be correct.

Symmetrical distribution

0

normal distribution

0

empirical rule

68%, 95%, 99.7%

Covariance

A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship

Anscombe's quartet

A straight line is not always the best way to represent a bivariate distribution

Market model for General Electric

Beta = 1.6166: A 1% increase in S&P index return leads to an increase of the GE return of 1,6166%. Beta > 1 à GE is more volatile and hence riskier than the S&P index (volatile security)

Measures of Central Location

Arithmetic Mean, Geometric Mean, Median, Mode

________ can take on any value and are not confined to take specific numbers. Their values are limited only by precision: eg rental yield on a property could be 6.2%, 6.24%, or 6.238%.

Continuous data

Measures of Linear Relationship

Covariance, Correlation, Coefficient of Determination, Least Squares Line

________ are the observed values of a variable, e.g. student marks: {67, 74, 71, 83, 93, 55, 48}

Data

Two modes of statistical analysis

Descriptive statistics, Inference statistics

________ can only take on certain values, which are usually integers: eg number of people in a particular underground carriage or the number of shares traded during a day.

Discrete data

Distribution of a categorical variable:

Groups data into categories and records the number of observations that fall into each category

Distribution of a quantitative variable:

Groups data into intervals (bins, classes) and records the number of observations that fall into each interval

________ of a distribution measures how much mass is in its tails; the greater the ______, the more likely are outliers

Kutosis

Numerical descriptive techniques

Measures of Central Location Measures of Variability Measures of Relative Standing Measures of Linear Relationship

size

N n

Descriptive statistics

One form of descriptive statistics uses graphical techniques. •Another form of descriptive statistics uses numerical techniques to summarize data (eg mean, variance).

Population

Parameters

Measures of Relative Standing

Percentiles, Quartiles

Skewed right distribution

Positive Skewness. mean > median

Measures of linear relationship

Provide information as to the strength & direction of a linear relationship between two variables (1) Covariance (2) Coefficient of correlation (3) Coefficient of determination

Simple return of an asset

R_t=(p_t-p_(t-1))/p_(t-1)

Measures of Variability

Range, Standard Deviation, Variance, Coefficient of Variation

i.i.d.

Simple random sampling results in independently and identically distributed random (i.i.d.) variables, i.e. two or more independent random variables have the same distribution

_________ measures the lack of symmetry of a distribution

Skewness

_________ measures the average distance of the values from the mean.

Standard deviation

Sample

Statistic

The __________ are the range of possible values for a variable, e.g. student marks (0..100)

values of the variable

Geometric mean

The geometric mean is aka the compound annual growth rate (CAGR)

Scatter diagram

a graph that shows the degree and direction of relationship between two variables

A distribution (or frequency distribution) of a variable shows

all the possible values of the variable and how often they occur

Financial econometrics

application of statistical techniques to problems in finance

Ordinal and nominal variables are also called

categorical or qualitativ variables.

The _______ measures the amount of variation in the dependent variable that is explained by the variation in the independent variable

coefficient of determination

panel data

collected over time for the same statistical unit(s) (at least two) on one or more variables ie repeated cross sections over time

cross sectional data

data collected at the same or approximately the same point in time for one or more statistical unit(s) on one or more variable(s)

time-series data

data collected over several time periods

Descriptive statistics

deals with methods of organizing, summarizing, and presenting data in a convenient and informative way.

Skewed left distribution

negative skewness. mean < median

Interval variables are also known as

numerical or quantitative variables.

Coefficient of Correlation II

positive linear + negative linear - independent 0 curvilinear 0

Finance

quantitative oriented field of research

Statistical inference is only valid if the sample data are collected via _________

random sampling.

A ________ _________ is a variable, whose outcome is uncertain

random variable

The Gaussian distribution gives a poor estimate for the occurence of

rare events

A random sample is said to be a _________ __________

representative sample

risk on the bell

risk is low at a high level of a skinny bell. the wider the bell and further away from the average, the more risky

The _____ is a subset of a ________.

sample, population

In the case of __________, R² is calculated by squaring the coefficient of correlation, i.e. r²

simple linear regression

A _____________ ________ _________ is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen

simple random sample

Quantitative Finance

the application of probability and statistics to finance

very large losses/gains occur much more frequently than predicted by

the normal distribution "Fat Tails"

With random sampling, the value of

the variable for the next random draw is uncertain

Random sampling does not work in the same way with _____________

time series data

With ______ _____ _____, we ususally do not know future values, hence they are uncrtain

time series data

Statistics is a ______ for creating ____ ________ from a set of numbers.

tool, new understanding

The least squares method

•LS-method gives you the intercept and slope of a straight line so that the squared sum of the deviations between the data points and the line is minimized: min∑▒〖(y-y ̂ )^2=min∑▒〖residuals〗^2 〗 •The estimated line is y ̂=b_o+b_1 x with b_o and b_1 estimated by the LS method -b_o is the intercept, i.e. the value of y ̂ when x=0 b_1is the slope and shows the change in y if x increases by one unit: b_1=dy/dx

The Market Model

•Market model is the empirical counterpart of the theoretical CAPM: (E(R_i )-r_f )=β_i (E(R_m )-r_f ) •Market model assumes that the return on a stock i is linearly related to the rate of return on the stock market index: R_i=α+β_i R_m+ε •If CAPM holds, α should be zero •The market model says that the return on a stock depends on (1)the return on the market portfolio (stock market index) and the extent of the stock's responsiveness to changes in the overall market as measured by beta (2)as well as on conditions that are unique to the firm

Nature of time series data I

•Sample size of time series data is the number of periods over which we observe the variable of interest •Data frequency denotes the interval at which time series data are collected (yearly, quarterly, monthly,daily, real time) •In contrast to cross-sectional data where the assumption of identically and independently distributed data is key, time series data are serially correlated (autocorrelated) - there are dependencies between past and future values •If the behavior of the times series data of the past is expected to continue in the future, it can be used as a guide in selecting an appropriate forecasting method

Covariance matrix (variance-covariance matrix)

•Symmetric array of numbers •The covariance matrix generalizes the notion of variance to multiple dimensions.

Sharpe Ratio

•The Sharpe ratio (named after Nobel Laureate William Sharpe) is used to characterize how well the return of an asset compensates for the risk that the investor takes •Investors are often advised to pick investments that have high Sharpe ratios - the higher the Sharp-ratio, the better the investment compensates its investors for risk •The Sharpe ratio (Sr) measures the extra award per unit of risk: Sr=(x ̅_I-R ̅_f)/s_I •x ̅_I...mean return for the investment •R ̅_f...mean return for a risk-free assets (short-term government bonds, T bill) •x ̅_I-R ̅_f...excess return:Measures the extra reward investors receive for the added risk taken •s_I...standard deviation for the investmen, measures the amount of risk

Beta of a portfolio

•To estimate beta for a portfolio we need to average the betas of the portfolio's stocks •If an investor believes that she is in a bull (bear) market, a portfolio with a beta greater (smaller) than 1 is desireable. •Risk averse investors may prefer portfolios with a beta below 1, i.e. the portfolio is made up of defensive securities.

Inference statistics

•is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data.


Ensembles d'études connexes

Photosynthesis & Cellular Respiration

View Set

Anatomy 5 - Femur, Patella, Tibia, Fibula & Foot

View Set

Honors Chemistry Chapter 11 Assessment

View Set

Advanced MS - Chapter 36 & 37 WB

View Set