Business Analytics Final exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Correlation is always between

-1 and +1

Time series plot

A display of values against time

Uniform

A distribution whose histogram doesn't appear to have any mode and in which all the bars are approximately the same height

Unimodal

A distribution whose histogram has one main peak

Positive pattern

A pattern running from the lower left to the upper right

Negative pattern

A pattern that runs from the upper left to the lower right

Always pair the median with

IQR

Skewed

If one tail stretches out farther than the other: skewed to side of where tail is

Linear

If there is a straight-line relationship, it will appear as a cloud or swarm of points stretched out in a generally consistent, straight form

Modes

Peaks or humps seen in a histogram

Data=

Predicted + Residual

how to determine outliers

Q3 +1.5IQR or Q1- 1.5(IQR)

Before using correlation, you must check

Quantitative Variable Condition Linearity Condition Outlier Condition

Correlation coefficient

Since x's and y's are paired, multiply each standardized value of x by the standardized value it is paired with and add up those cross products. Divide by n -1.The ratio of the sum of the product zxzy for every point in the scatterplot to n - 1

leptokurtic

concentrated in one place

What is affected by outliers/skewness?

correlation, mean, and standard deviation

Kurtosis

curve of the distribution

Evaluate: "outliers do not affect the correlation" a. true. Correlation coefficients remain the same if outliers are removed or kept b. False. Correlation coefficients remain the same if outliers are removed or kept c. True. Correlation coefficients change if outliers are removed or kept d. False. Correlation coefficients change if outliers are removed or kept e. All of the above answers are correct

d

if mean>median=

skewed right

more unusual

higher z score from zero

Boxplot

highlights several features of the distribution of the variable, including the quartiles, the median, and any outlying values

Strength

how much scatter or cluster

The five-number summary of a distribution

reports its median, quartiles, and extremes (maximum and minimum)

The median is

resistant

correlation does not cause..

results or change

Mean if larger if skewed

right

Relating least to the sum according to least square criteria to find regression is..

smallest residual value

We place the explanatory or predictor variable on

the x-axis

We place the response variable on

the y-axis

If the correlation were 1.0

then the model predicts y perfectly, the residuals would all be zero and have no variation.

Multimodal

three or more peaks

The mean is a natural summary for

unimodal, symmetric distributions

Outlier

unusual observation, standing away from the overall pattern of the scatterplot

quartiles

values that frame the middle 50% of the data. One quarter of the data lies below the lower quartile, Q1, and one quarter lies above the upper quartile, Q3

Stationary

when a time series is without a strong trend or change in variability

Explanatory variable

x in the regression line equation

Response variable

y hat

A linear model can be written in the form

y hat= b0+b1x where b0 and b1 are numbers estimated from the data and is the predicted value

Residual

(e) The difference between the predicted value and the observed value, y

Variance

(s^2) average of the squared deviations; sum of (y value minus the mean)^2 / n-1

Is an outlier if how far from standard deviation?

+ or - 3

Tukey method

1. order values 2. split data at the median (if n is odd include value in both) 3. find the median of both halves 3.5 if even number add two values/2 4. answers are Q3 and Q1

Playtkurtic

Amodal; flat

Quantitative Data Condition

Before making a histogram or stem-and-leaf display; the data must be values of a quantitative variable whose units are known

Judgment call

Characterizing the shape of a distribution

Quantitative Variables Condition

Correlation applies only to quantitative variables

Linearity Condition

Correlation measures the strength only of the linear association. If the underlying relationship is curved, summarizing its strength with a correlation would be misleading.

Residual=

Data - Predicted

Tails

The thinner ends of a distribution

Smooth trace

To better understanding the trend of times series data

Bimodal

Two main peaks

The subway runs every 15 minutes. You arrive at the station and cannot locate the timetable or the current time. The probability that it will arrive in the next minute can be calculated based on a model. a. binomial b. uniform c. gemetrioc d. poisson e. the answer is not above

Uniform

Outlier Condition

Unusual observations can distort the correlation. When you see an outlier, it's often a good idea to report the correlation both with and without the point.

According to Crovitz, how can big data be misused a. when governments use big data to prevent protests and arrest dissidents b. when health providers and governments use big data to find out what time of year people are sick most c. when health providers use big data to identify treatments for premature babies d. when governments use big data to reduce fire hazards e. all of the above

a

The R^2 ranges from ___. a. 0 to 1.00 b. -1.00 to +1.00 c. -100 to +100 d. none of the above is correct

a

Shape

a distribution in terms of its modes, its symmetry, and whether it has any gaps or outlying values

Histogram

a graph for a quantitative variable; we usually slice up all the possible values into bins and then count the number of cases that fall in each bin

Lurking variable

a third variable that is simultaneously affecting both of the variables you have observed

If fairly symmetric/ symmetric mean will be..

about the same at the median

A linear model is just

an equation of a straight line through the data

Stem-and-leaf displays

are like histograms, but they also give the individual values

There are two independent variables X and Y with the respective means of 40 and 20, and the respective standard deviations of 3 and 5. If you added 3 to Y, what would e the standard deviation of the new distribution? a. 3 b. 5 c. 6 d. 8 e. 4.24

b

When reviewing a scatterplot, it is noted that the independent variable is the Banner Identification Number and the dependent variable is height in inches. One can conclude that the ____ assumption has been violated, and calculating a correlation is not appropriate. a. Qualitative data b. Quantitative data c. Homoscedasticity d. Linearity e. None of the above are correct

b

Slope of the least squares line

b1= r(sy/sx)

mesokurtic

bell curve; popeye

Central 50% of values is

between Q1 and Q3

Which of the following correctly reflects the condition of the outcomes "gender of customers at the ATM?" a. independent because the outcome of one event does influence the outcome of another b. not independent because the outcome of one event does influence the outcome of another c. independent because the outcome of one event does not influence another event d. not independent because the outcome of one event does not influence the outcome of another e. the answer does not appear above.

c

Equal Spread Condition

check a residual plot for equal scatter for all x-values

In the text the New England Journal of Medicine published a report saying that eating chocolate could improve one's intelligence. What misconceptions did Velickovic suggest were present in their report? a. the idea that correlation implies causality b. it is possible to generalize a correlation found on a group level to an individual level c. it is necessary to infer from a correlation found on one group level to all other groups on any level d. both A and B e. All; A,B,C

d

There are two independent distributions. Distribution A N(54.8,4.3) and Distribtuion B N(45.6, 5.1). if the two distributions are added to each other, the mean of the new distribution is ___. a. 50.2 b. 8.9 c. -8.9 d. 100.4 e. not listed above

d

The interquartile range (IQR)

defined to be the difference between the two quartile values; Q3-Q1

y variable

dependant

Consultants from Southpark Corp have been analyzing consumer behaviors and noted that there was a strong negative correlation between satisfaction with checkout ( 1 being highly satisfied and 100 being high dissatisfied) and the length of forms required by transactions are measured in numbers of banks to be filled in. They have interpreted this to reflect that customer satisfaction increases when more blanks in the entailed paperwork of the associated transactions must be filled in. as a diligent scholar of analytic interpretation, you a. Disagree- because it is not a perfect correlation b. Agree, because correlation proves causation c. Disagree, because it is fallacious reasoning of causality d. Concur that they could be correct in their interpretation e. Disagree, because the relationship is the opposite of stated

e

Merce Motors large equiptment operators know that with their Model 3 front loader Mean the Tread Depth is 20 cm with a standard deviation of 3.5 cm, and the Mean Miles Travelled is 1820 miles with a standard deviation of 25.3 miles. The slope between Tire Tread Depth (dependant variable) and the Miles Traveled is -0.1333. The intercept is ___. a. -242.06 b. -202.06 c. +222.06 d. -222.06 e. the answer does not appear above

e

x variable

independant

When describing a distribution, attention should be paid to

its shape, center, spread

Relating most to the sum according to least square criteria to find regression is..

largest residual value

Median in larger if skewed

left

next step of analysis

logarithmic transformation might make distribution more symmetric

Range

max-min; not resistant to unusual observations

Correlation

measures the strength of the linear association between two quantitative variables

R^2

percent of variance that is accounted for by regression; represents strength

Relative frequency histograms

percentages of each bin in the histogram

Scatterplot

plots one quantitative variable against another, is an effective display to look for trends, patterns, and relationships between two quantitative variables

Always pair the mean with

standard deviation

Form

straight, curved, exotic

Mean

sum of y values (or x values)/ number of variables

Correlation treats x and y

symmetrically

Standard deviation

takes into account how far each value is from the mean; appropriate for symmetric distributions; square root of the variance

Correlation is not affected by changes in

the center or scale of either variable

Symmetric

the halves on either side of the center look, at least approximately, like mirror images

The x- and y-variables are sometimes referred to as

the independent and dependent variables

Line of best fit/ least squares line

the line for which the sum of the squared residuals is smallest

The more symmetrical..

the lower the standard deviation

If the shape is unimodal and symmetric

the mean and standard deviation and possibly the median and IQR should be reported

If a distribution is skewed, contains gaps, or contains outliers, then it is better to use

the median

If the shape is skewed

the median and IQR should be reported

If the correlation were 0

the model would predict the mean for all x-values. The residuals would have the same variability as the original data

Z-score

the standardized value tells how many standard deviations each value is above or below the overall mean; x minus the mean/ standard deviation

Correlation measures

the strength of the linear association between the two variables.


Ensembles d'études connexes

Chapter 9 - Lifespan Development

View Set

Chapter 57: Management of Patients with Burn Injury - ML3

View Set

Ch.45 Mgmnt of pts w/ oral esophageal disorders

View Set