Business Analytics (Evans text) Quiz 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

correlation coefficient formula

(Cov of A and B) / [( STD of A) x (STD of B) ] Aka Pearson Product Moment correlation coefficient

Triangular Distribution

3 outcomes - most likely, optimistic, and pessimistic scenarios

ogive

A chart that displays the cumulative relative frequency

event

A collection of one or more outcomes from a sample space

Exponential Distribution

A continuous distribution that models the time time between randomly occurring events p 203 EXAMPLE - number of hits on a news subject..... grows fast at first then levels out In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate.

contingency table aka cross-tabulation

A data matrix that displays the frequency of some combination of possible responses to multiple variables; cross tabulation results

cumulative distribution function

A function giving the probability that a random variable is less than or equal to a specified value.

Unimodal

A histogram with one peak (mode). A bell curve is unimodal but a skewed curve could be too.

Data profile (fractile)

A measure of dividing data into sets

coefficient of variation

A measure of relative variability... raltive to the average .....instead of just using Std Dev, which you can't compare as easily across populations. The reciprocal is "return to risk" computed by (Std dev/mean) x 100

Coefficient of skewness

A measure of the degree of asymmetry of observations around the mean p 144 calculation If the number is Positive, the distn is positively skewed (long tail to the right) If the number is negative, the distn is negatively skewed (long tail to the left) If it's between -0.5 and 0.5, the skew is low and there is relative symmetry

covariance

A measure of the linear relationship between two variables, x and y, DOES depends on the units of measurement Correlation is easier to use because it doesn't depend on unit of msmt., and is just -1 to 1

Correlation

A measure of the linear relationship between two variables, x and y, which (unlike covariance) does not depend on the units of measurement The value of a correlation coefficient ranges between -1 and 1. You could have a very strong relationship and get a very low Corr. coeff. because the relationship is not LINEAR

interval estimate

A method that provides a range for a population characteristic based on a sample

estimation

A method used to assess the value of an unknown population parameter such as a population mean, population proportion, or population variance using sample data.

continuous metric

A metric that is based on a continuous scale of measurement. There are no jumps in the data.

Judgment sampling

A nonprobability method of sampling whereby elements are selected for the sample based on the judgment of the person doing the study.

standard normal distribution

A normal distribution with a mean of 0 and a standard deviation of 1. (z scores)

probability density function

A probability distribution is a list of outcomes and their associated probabilities. ... A function that represents a discrete probability distribution is called a probability mass function. A function that represents a continuous probability distribution is called a probability density function it's like a histogram on steroids

frequency distribution

A table that shows the number of observations in each of several non-overlapping groups. p 147 ...When data are summarized in a frequency distribution we can use the frequencies to compute the mean and variance

cumulative relative frequency distribution

A tabular summary of cumulative relative frequencies

Outlier

A value much greater or much less than the others in a data set

Discrete Uniform Distribution

A variation of the uniform distribution for which the random variable is restricted to integer values between a and b (also integers) A good example of a discrete uniform distribution would be the possible outcomes of rolling a 6-sided die. The possible values would be 1, 2, 3, 4, 5, or 6. In this case, each of the six numbers has an equal chance of appearing.

empirical probability distribution

An approximation of the probability distribution of the associated random variable. ... the ratio of the number of outcomes in which a specified event occurs to the total number of trials, not in a theoretical sample space but in an actual experiment.

Sample

a subset of the population in ALL statistical analysis we study the characteristics of the SAMPLE so that we can state something useful about the POPULATION (which is usually too large to study). parameters of a sample don't use greek letters like mu sigma and pi for average, std/ dev. and ; instead we use x-bar,

relative frequency distribution

a table that presents the relative frequency of each category

metric

a unit of measurement that provides a way to objectively quantify performance

"kth" percentile

a value at or below which "kth" percent of the observations lie

proportion

fraction of data having a certain characteristic p = x/n Proportions are always between zero and one Key descriptive statistics for categorical data

Coefficient of Kurtosis (CK)

measures the degree of kurtosis of a population (kurtosis is "peakness" or flatness of a histogram) If the CK is less than three, the distribution is mostly flat with a wide degree of dispersion...If the CK is greater than three the distribution is peaked and has less dispersion. Note that in Excel they subtract three...Therefore if the value is less than zero, It's relatively flat and greater than zero means it is peaked pg 145 formula

descriptive statistics

measures used to describe and summarize data using tabular, visual, and quantitative techniques

estimators

measures used to estimate population parameters

Ratio Data

data that are continuous and have a natural zero eg Length, Width, Height Ratio data can be both continuous and discrete. ?

degrees of freedom

n-1. number of scores that can vary in the calculation of a statistic Video --- https://www.youtube.com/watch?v=VIlVWeUQ0vs FOR categorical values, if k = categories, df= k-1 and n-k for the df for our errors The general formula for degrees of freedom in ANY field of math is (d.f. = # of variables - # of constraints).

measure

numerical value associated with a metric

Discrete vs. Continuous distributions

one is a histogram, the other is a curve How many countries have you been to? (answers are finite) vs. how much do you weigh? (answers are infinite) VIDEO https://www.youtube.com/watch?v=bPFNxD3Yg6U

Dispersion

the degree of variation in the data, spread

sampling error

the difference between the results of random samples taken at the same time there will always be samping error LARGER sample sizes have less sampling error

relative frequency

the fraction or percent of the time that an event occurs

degrees of freedom (df)

the number of independent pieces of information remaining after estimating one or more parameters.... if there are a lot of df, there are many possible lines n-1 video: https://www.youtube.com/watch?v=Cm0vFoGVMB8

Marginal Probability

the probability of a single event without consideration of any other event

Joint Probability

the probability of the intersection of two events

Conditional Probability

the probability that one event happens given that another event is already known to have happened

confidence interval

the range of values within which a population parameter is estimated to lie along with a probability that the interval correctly estimates the true (unknown) population parameter NOT Correct - "there is a 90 % probability that the true population mean is within the interval" CORRECT - "there is a 90 % probability that any given confidence interval from a random sample will contain the true population mean

Population

the set of all elements of interest in a particular study; parameters of a population use greek letters like mu sigma for average, and std/ dev. , unlike samples where instead we use x-bar, and s

sample space

the set of all possible outcomes of an experiment

standard deviation

the square root of the variance; the Units of measure are the same as the Unit of data.So it is easier to interpret than variance is

cumulative relative frequency

the sum of previous relative frequencies up through, and including, the category of interest eg 20% of customers account for 80% of total sales

rules for normal distribution

1. skew is zero 2. mean=median=mode 3. x has no bounds, tails go to infinity 4. All data will fall within +/-3SD of the mean. Actual % may be higher or lower Under this rule, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the mean. The Empirical Rule is an approximation that applies only to data sets with a bell-shaped relative frequency histogram. ... Chebyshev's Theorem is a fact that applies to all possible data sets. It describes the minimum proportion of the measurements that lie must within one, two, or more standard deviations of the mean.

what affects the WIDTH (or range) of the confidence interval

1. the size of the sample 2. the amount of variation in the population from which we drew our sample 3. The LEVEL of confidence, ie 90%? 95% Bigger samples and less variation mean our confidence INTERVAL can be narrow

Ordinal data example

1=tallest, 2=next tallest, 3=third tallest data that can be ranked according to some relationship

Types of Discrete Probability Distributions (histogram)

Bernoulli, bimodal, Poisson

Chi-square test

Chi-square test for independance. testing if two categorical variables are independent; the same number of girls and boys prefer brand 1 over brand 2 H0 - they are independent H1 - they are dependent Specifically, it tests for the equality of two frequencies or proportions.

Interval Data

Data that are ordinal but have constant differences between observations and have arbitrary zero points. Examples of ordinal variables include: socio economic status ("low income","middle income","high income"), education level ("high school","BS","MS","PhD"), income level ("less than 50K", "50K-100K", "over 100K"), satisfaction rating ("extremely dislike", "dislike", "neutral", "like", "extremely like")

categorical data

Data that consists of names, labels, or other nonnumerical values Hierarchical - categorical data cannot be converted into ratio data (p. 118)

mutually exclusive

Events with no outcomes in common.

Chebyshev's formula

For any set of data, the proportion of values that are within "k" standard deviations of the mean is at least 1 - (1/k^2) so he says 75% of data are +/- 2 std dev from the mean and 89% of data are +/- 3 std dev from the mean The Empirical Rule is an approximation that applies only to data sets with a bell-shaped relative frequency histogram. ... Chebyshev's Theorem is a fact that applies to all possible data sets. It describes the minimum proportion of the measurements that lie must within one, two, or more standard deviations of the mean.

cluster sampling example

Instead of drawing random samples, Cluster sampling is a method of probability sampling that is often used to study large populations, particularly those that are widely geographically dispersed. Researchers usually use pre-existing units such as schools or cities as their clusters.

uniform distribution

Like the probability of rolling a 1, 2, 3, 4, 5, or 6, the uniform distribution has the same probability for measuring each value. EQUIPROBABLE

Types of Continuous Probability Distributions (curve)

Normal, Student's T distn, Chi squared (no neg values; starts at zero), Logistic

ORDINAL data vs INTERVAL data

Ordinal data are most concerned about the order and ranking while interval data are concerned about the differences of value within two consecutive values. ... Ordinal data place an emphasis on the position on a scale while interval data are on the value differences of two values in a scale.

Is the Poisson distribution a discrete function

The Poisson distribution is a discrete function, meaning that the variable can only take specific values. It is used to model the number of occurrences in some unit of measure,like the number of customers arriving between noon and 1, or machine failures per month

Variance

The average of the squared differences from the mean. A common measure of dispersion. The variance for a population sums up all the squared differences between the each observation and the mean, and then divides by n. The variance for a sample sums up all the squared differences between the each observation and the mean, and then divides by (n-1). So it is always larger

complement

The complement of an event E, denoted E′​, is the set of outcomes in the sample space that are not in E. For example, suppose we are interested in the probability that a horse will lose a race. If event W is the horse winning the race, then the complement of event W is the horse losing the race

sample correlation coefficient

The correlation coefficient is (Cov of A and B) / [( STD of A) x (STD of B) ] The sample correlation coefficient, r, estimates the population correlation coefficient, ρ.

multiplicative law of probability

The multiplication law of probabilities states that if event A happening is independent of event B, then the probabilities of A and B happening together is simply pA×pB.

sample correlation coefficient formula

The only difference vs the POPULATION correlation coefficient is that you use the STD of the samples! (Cov of A and B) / [(STD of ASample) x ( STD of BSample) ]

return to risk

The reciprocal of the coefficient of variation, it equals: Mean/Std DeV while the coefficient of variation equals: StdDev/Mean

statisics

The science of uncertainty and the technology of extracting information from data; impt element of business given large growth of data

Std dev and variance

The standard deviation is the square root of the variance.

expected value

The weighted average of all of the possible outcomes of a probability distribution, where the weights are the P's. Mean, average

cluster sampling

a sampling technique in which clusters of participants that represent the population are used

Binomial Distribution (Conditions)

VIDEO https://www.youtube.com/watch?v=b9a27XN_6tg ...a binomial dist'n models "n" independent repetitions of a Bernoulli experiment, each with a "p" probability of success 1. Binary- Trials can be classified as success/failure BUT, unlike Bernoulli, we have many iterations. So a coin toss is Bernoulli, a series of them is a binomial distribution 2. Independent? Trials must be independent. 3. Number? The number of trials (n) must be fixed in advance 4. Success? The probability of success (p) must be the same for each trial.

discrete random variable

Variable where the number of outcomes can be counted and each outcome has a measurable and positive probability The number of eggs that a hen lays in a given day (it can't be 2.3, which would be a CONTINUOUS random variable)

Union

a composition of all outcomes that belong to either of TWO events

intersection

a composition with all outcomes belonging to both events

tree diagram

a diagram used to show the total number of possible outcomes in a probability experiment

t distribution

a family of bell-shaped curves based on degrees of freedom, similar to the standard normal distribution with the exception that the variance is greater than 1; used when you are testing small samples and when the population standard deviation is unknown have more/less probability in the tails (fat tails) and more/less in the center than does the standard normal the bigger your sample and more degrees of freedom, the closet the t curve goes to normal (p232)

Histogram

a graphical representation of a frequency distribution in columns can help make better decisions than just using average; see p 136 on repair times; % of them are under ____ weeks is better than using avg

combination

a grouping of items in which order does not matter A combination is a selection of all or part of a set of objects, without regard to the order in which objects are selected. For example, suppose we have a set of three letters: A, B, and C. we might ask how many ways we can select 2 letters from that set. for a combination, selecting A and then B is the same as selecting B and then A is the for a permutations, order matters and they are different electing A and then B is the see p 177 for formulas

z-score

a measure of how many standard deviations you are away from the norm (average or mean) (x - avg)/ std dev = z aka Standardized Value

Skewness

a measure of the degree to which a distribution is asymmetrical

discrete metric

a metric derived from counting something

random variable

a numerical description of the outcome of an experiment

metric vs measure

a numerical way to objectively quantify performance

statistical thinking

a philosophy of learning and action based on the following fundamental principles: all work occurs in a system of interconnected processes, variation exists in all processes, and understanding and reducing variation are keys to success

goodness of fit

a procedure that attempts to draw a conclusion about the nature of a distribution chi-square goodness of fit determines whether sample data are representative of some prob dist'n if the chi sq statistic is <= the critical value, then the data can be reasonably assume to come from a normal distribution having the same sample mean and std dev. O/W, the normal dist is not appropriate to model the data..... Normality is a requirement for the chi square test that a variance equals a specified value but there are many tests that are called chi-square because their asymptotic null distribution is chi-square such as the chi-square test for independence (WHICH YOU USE ON NOMINAL DATA) in contingency tables and the chi square goodness of fit test.

experiment

a process that results in an outcome

continuous random variable

a random variable that may assume any numerical value in an interval or collection of intervals A continuous variable is a variable whose value is obtained by measuring. Examples: height of students in class. weight of students in class. time it takes to get to school. distance traveled between classes.

midrange

average of the lowest and highest values in a data set (Max + Min)/2

probability distribution

list of possible outcomes with associated probabilities

chi-square statistic

goodness of fit? p 208 USED for data where you have no negative values.....The chi-square statistic is used to compare two categorical variables to see if they are related. Calculating the statistic involves looking the figure up in the chi-square table. The chi-square table is similar to other distribution tables; You need a couple of pieces of information to look up the statistic. In the case of chi-square, you'll need to know degrees of freedom and probability (both of which are usually supplied in the question).

confidence interval video

https://www.youtube.com/watch?v=tFWsuO9f74o

Process Capability Index

index that measures the potential for a process to generate defective outputs relative to either upper or lower specifications Cp = (Upper specification minus the lower specification) / by total variation

IQR

interquartile range, Q3-Q1 where Q3= 75th percentile (the median of the 'top' half) and Q1= 25th percentile (the median of the 'bottom' half), gives the spread of the central (middle) 50% of the data set aka "Midspread"

ratio data

is continuous and has a natural zero, like dollars and time - ratio data can be converted into interval data, ordinal data or categorical data

Bernouilli distribution

is the simplest case of a binomial distribution; only 2 possible outcomes Ex: Favorable (40%)view of Prex vs Unfavorable (60%) to do calculations, assign 1 and zero to these categories and then do the calculations https://www.khanacademy.org/math/statistics-probability/random-variables-stats-library/binomial-mean-standard-dev-formulas/v/mean-and-variance-of-bernoulli-distribution-example random variable, 2 outcome, equal P'

Central Limit Theorem

see pg 228 and if the sample size is large enough, the sampling distn of the mean will be normally distd even fi the population is not, AND the mean of the sampling dist'n will eqaul the mean of the population if the population is normally distributed, then the sampling dist'n of the mean will be normal for any sample size VIDEO https://www.youtube.com/watch?v=_YOr_yYPytM Lets say you have a population of 1 million and you keep taking samples of 100. Each time you take a sample, you calculate the average of that sample. If all of those sample averages form a normal dist'n, you also know that the population has a normal distribution x-bar is the mean of a sample - we don't know the mean of the population n is the size and s is the std dev of the sample

statistic

summary measure of data

Measurement

the act of obtaining data associated with a metric


Conjuntos de estudio relacionados

Social Media Marketing Certification Review

View Set

Health Chapter 5: Infectious Diseases

View Set

8. klass U5 Safe Online Shopping

View Set

Managing for Quality and Performance Excellence: Chapter 3 Customer Focus

View Set

Cal Poly CPE-202 Big-O time complexity Quiz

View Set