Exam 1 - Econ Stats

Ace your homework & exams now with Quizwiz!

empirical rule

for symmetrical, bell-shaped frequency distribution, 68% of observations will lie within +/- one standard of mean; about 95% of observations will lie within +/- 3 standard deviations of mean

ratio

named, natural order, equal interval btwn variables, has "true zero" - zero means absence --> ratio van be calculated (ex: height)

experiment

observation of some activity or act of taking some measurement

independent

occurrence of one event does not affect occurrence of another event; all three equivalent statements hold: P(A|B) = P(A) P(B|A) = P(B) P(A and B) = P(A)P(B)

combination

order is *not important*

outcome

particular result of experiment

measures of location

pinpoint center of distribution of data. only describes center of data, does not tell us anything about *how well* data is concentrated around the center (that is measure of dispersion), we need to consider both (there are 5: arithmetic mean, weighted mean, geometric mean, median, mode)

sample

portion, or part, of population of interest. used to obtain reliable estimates of population parameters

histogram

presents numerical data. no gaps btwn bars, represents frequency distribution of continuous variables. based on quantitative data. useful for large data sets

P(A)

probability of A

P(A and B)

probability of A and B

P(A|B)

probability of A given B has happened

P(A or B)

probability of A or B

P(~A)

probability of not A

discrete

quantitative variable that can take finite number of values. assume on certain values, there are "gaps" between values (number of children)

contnuous

quantitative variable that can take infinite number of values within particular range (ex: otuside temp, air pressure in tire)

correlation coefficient

r, used to measure direction and strength of linear relationship between two variables (ranges from -1 to 1)

coefficient of variation (CV)

ratio of *population* standard deviation σ to population mean μ

coefficient of variation (cv)

ratio of *sample* standard deviation s to population mean x̅

qualitative variable

recorded as nonnumeric characteristic (ex: eye color)

general rule of multiplication

refer to events that are *not* independent P(A and B) = P(A)*P(B|A)

special rule of multiplication

refers to events that are independent P(A and B) = P(A)*P(B)

rule of addition

refers to probability that any of two or more events can occur (special rule, general rule, complement rule)

statistics

science of collecting, organizing, presenting, analyzing, interpreting data to assist in making more effect decisions (two types: descriptive, inferential)

coefficient of skewness

shape of data (four shapes: symmetric, positively skewed, negatively skewed, bimodal)

box plot

shows general shape of variable's distribution (based on 5 descriptive statistics: max and min, first and third quartiles, and median)

frequency polygon

shows shape of distribution. consists of line segments formed by intersections of class *midpoints* and class frequencies

sample standard deviation

square root of sample variance

bivariate

studying relationship btwn two variables

sample variance

s², given sample mean (x̅) and sample size (n) (n-1) is to not underestimate population variance

stem-and-leaf plot

technique used to display info in condensed form while providing more info than frequency distribution (get identity of each value, can see distribution) two parts: stem (leading digit, vertical axis); leaves (trailing digit, horizontal axis)

complement rule

to determine probability of event happening by subtracting probability of event not happening from 1 P(A) = 1 - P(~A)

Chebyshev's theorem and empirical rule

two empirical results that allow us to characterize data dispersion around mean

hypergeometric distribution

used for problems with fixed n, probability for each trial changes (because of no replacement), without replacement used when samples are small compared to population. binomial is easier and gives good approximation if you have large population (ex: 30 people apply for two jobs. what is probability both positions are filled by women)

binomial distribution

used for problems with fixed number of trials, known p (prob of success is constant from trial to trial), with replacement used when you know *exact* probability of event happening; you want to find probability of that even happening k times out of n (ex: number of defects in box of 1,000 factory produced widgets)

continuous probability distribution

any value in interval (3 types: uniform, normal, exponential)

rule of multiplication

applied when two or more events occur simultaneously (special, general)

population and sample mean are examples of...

arithmetic mean

permutation

arrangement in which order of objects selected from specific pool of objects *is important*

geometric mean

finding avg of percentages, ratios, indexes, or growth rates over time rate of increase: avg percentage change over period nth root of (value at end of period/value at start of period)

Chebyshev's theorem

for any set of observations (sample or population, proportion of values that lie within k standard deviations of mean is at least 1 - 1/k^2 (k is any value greater than 1)

interquartile range

IQR = Q3 - Q1

arithmetic mean

1. data must be measured at interval/ratio level 2. influenced by extremely high and low values 3. all values included when computing 4. there is only one 5. sum of deviations of each value from mean is 0

subjective probability

based on whatever subjective info is available, relies on individual knowledge and assessment

harmonic mean

calculate avg value when value involves rates (value/unit) or ratios (index) (ex: speed in km/hr or price-earnings ratio)

quantitative variable

can be recorded numerically (ex: number of children in fam, outside temp, balance in checkings) (two types: discrete & continuous)

measures of dispersion

capture variation or spread in data. two distributions can have same average but different spreads (range, variance, coefficient of variation, chebyshev's theorem & empirical rule)

relative class frequencies

captures relationship between class frequency and total number of observations (fraction)

variable

characteristic of statistical unit being observe that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (2 types: qualitative & quantitative)

discrete probability distribution

characterized by all values x and associated probabilities (3 types: binomial, hypergeometric, Poisson) 1. sum of all probabilities is 1 2. probability of particular outcome is [0,1] 3. outcomes are mutually exclusive

normal distribution

characterized by mean (mu) and variance; useful for determining probabilities for any normally distributed random variable; find z value for particular value x of random variable based on mean and standard deviation of distribution

pie chart

chart that shows proportion/percentage that each class represents total number of frequencies. shows qualitative info

3 approaches to computing probabilities

classical, empirical, subjective

levels of measurement

classify data according to levels. level determines type of statistical analysis we can perform on data. 4 levels: nominal, ordinal, interval, ratio

event

collection of one or more outcomes of an experiment

weighted mean

compute arithmetic mean when we have several observations of same value

contingency table

cross-tabulation that simultaneously summarizes two variables of interest (enables classification of data according to 2 identifiable characteristics)

mean

describes central value of data (5 types: population, sample, geometric, weighted, harmonic)

measures of position

describes spread of data by determining position of values that divide observations into equal parts (quantiles)

range

difference btwn max and min values in data set (only considers max & min --> leaves out info)

dot plot

displays dot for each observation along horizontal number line indicating possible values of data; shows shape of distribution, value about which data tend to cluster, and largest and smallest; helpful for smaller data sets (when we organize data into classes w histogram, we lost exact value of observs) *if identical observs or observs are too close to be shown, dots are stacked on top of each other

population

entire set of individuals or objects of interest / measurements obtained from all individuals or objects of interest

bar chart

graph that shows qualitative classes on horizontal axis and class frequencies on vertical axis. class frequencies are proportional to heights of bars. most common graphic form to present qualitative variable. presents categorical data

scatter diagram

graphical technique used to show relationship between two variables measured with interval or ratio scales. one variable on vertical axis and other on horizontal (bivariate)

frequency table

grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing number of observations in each class

frequency distribution

grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing number of observations in each class (decide on number of classes, determine class interval, set individual class limits, tally vehicle profits into classes and determine number of observations in each class)

collectively exhaustive

if at least one of events must occur when experiment is conduct; sum of all probabilities of collectively exhaustive is equal to 1

mutually exclusive

if one event happens, the other cannot

multiplication rule

if there are m ways one event can happen and n ways another event can happen, then there are mn ways that two events can happen

conditional probability

likelihood that event will happen, given that another event has already happened

joint probability

likelihood that two or more events will happen at same time

negatively skewed

mean < median and mode

positively skewed

mean > median and mode

parameter

measurable characteristic of population (we rely on sample data to learn about population parameter)

statistic

measurable characteristic of sample (sample mean = best estimate of population mean)

variance

measures mean amount by which values in population, or sample, vary from mean (two types: population & sample

symmetric

median = mode = mean

descriptive statistics

methods or organizing, summarizing, presenting data in informative way (data collection, data presentation, summarizing data - surveys, graphs, tables)

inferential statistics

methods used to estimate a property (mean, proportion, etc) of a population on basis of sample (estimation and hypothesis testing). limited set of data

median

midpoint of values after they have been ordered from min to max values (2 midpoints--> find mean of two numbers) 1. unique for each data set 2. not affected by extremely large or small values (measure of location when such values do occur) 3. can be computed on ordinal, interval, and ratio level

uniform distribution

models events that are equally likely to occur within given range/interval; characterized by min value a, max value b, equal probability of 1/(b-a) of any value in that range to occur; rectangular in shape and symmetric (described by min value a and max b)

exponential distribution

models time btwn occurrences of event in sequence; actions occur independently at constant rate per unit of time/length; nonnegative, positively skewed, declines steadily to right, asymptotic

nominal

named (ex: eye color)

ordinal

named, natural order (ex: level of satisfaction)

interval

named, natural order, equal interval btwn variables (ex: temperature)

Poisson distribution

used for unknown n (it is random variable) and potentially infinite, unknown p for each trial (but known average p), with replacement used when you known the *mean* probability of an event and want to find probability of n events happening (ex: number of innocent people convicted of a crime)

special rule of addition

used when events are mutually exclusive P(A or B) = P(A) + P(B)

general rule of addition

used when events are not mutually exclusive P(A or B) = P(A) + P(B) - P(A and B)

three counting rules

useful in determining number of outcomes in experiment (multiplication rule, permutation, combination)

probability

value btwn 0 and 1 inclusive that represents likelihood a particular event will happen

mode

value of observation that appears most frequently 1. not always unique mode for each data set (can have multiple) 2. not affected by extremely large or small values (measure of location when such values do occur) 3. can be computed for nominal, ordinal, interval, ratio levels

outlier

value that is more than 1.5x IQR smaller than Q1 or larger than Q3

empirical probability

when number of times event happens is divided by number of observations

classical probability

when there are n equally likely outcomes to an experiment

sample mean

x̅, sum of all values of x in sample divided by number of values in sample n

population mean

μ, sum of all values of x in population divided by number of values in population N

population standard deviation

σ, square root of population variance larger standard deviation = more variance

population variance

σ², arithmetic mean of squared deviations from mean (μ) when population size is N


Related study sets

EMPA Mid Term 1 of 2 (Combined MEM6210-13 Sets)

View Set

MTA 98-349 Windows Operating System Fundamentals, MTA Operating System Fundamentals, MTA Operating System (1-10), MTA Operating System (81-90), MTA Operating Systems (91-100), MTA Operating System (111-120), MTA Operating System (131-140), MTA Operat...

View Set

PrepU Assessment and Management of Patients with Diabetes

View Set

Learning outcomes for Chapter 21, Yoost

View Set