Business Analytics

¡Supera tus tareas y exámenes ahora con Quizwiz!

Examples of Unbiased Questions

Do you believe that current popular music is better, worse, or about the same quality as popular music from 20 years ago? Do you think women should be drafted into the military? How often do you eat spinach, kale, or other leafy green vegetables?

Mean

Equal to sum of all data points in the set divided by the number of data points

Standard Deviation

Equal to the square root of the variance. Te same units as the data itself

Skewness

Measures the degree of a graph's asymmetry.

Variance

Measures the size of the standard deviation relative to the size of the mean

Several probability expressions for a normal distribution

P(μ-σ≤x≤μ+σ)=68% P(μ-2σ≤x≤μ)=47.5%. P(μ+2σ≤x)=2.5% P(μ-2σ≤x≤μ-σ)=13.5%

95% of the probability is contained in the range reaching two standard deviations (1.96 to be exact) away from the mean on either side, that is,

P(μ−2σ≤x≤μ+2σ)≈95%

99.7% of the probability is contained in the range reaching three standard deviations away from the mean on either side, that is,

P(μ−3σ≤x≤μ+3σ)≈99.7%

68% of the probability is contained in the range reaching one standard deviation away from the mean on either side, that is,

P(μ−σ≤x≤μ+σ)≈68%

Z Value

Point x is the distance x lies from the mean, measured in standard deviations

Cross sectional data

Provides a snapshot of data across multiple groups at a given point in time

Correlation Coefficient

We can quantify the strength of a linear relationship between two variables by calculating the ....

Distribution of Sample Means

closely approximates a normal curve as we increase the number of samples and/or the sample size. The mean of the Distribution of Sample Means equals the mean of the population distribution. The standard deviation of the Distribution of Sample Means equals the standard deviation of the population distribution divided by the square root of the sample size. Thus, increasing the sample size decreases the width of the Distribution of Sample Means.

Central Limit Theorem

if we take enough sufficiently large samples from any population, the means of those samples will be normally distributed, regardless of the shape of the underlying population.

heteroskedasticity

there is more variability at the lower values at the higher values.

How to avoid bias results?

- phrasing questions neutrally; - ensuring that the sampling method is appropriate for the demographic of the target population; and pursuing high response rates. -It is often better to have a smaller sample with a high response rate than a larger sample with a low response rate.

Calculate the Mean on Excel

=AVERAGE(B2:B193) and =SUM(B2:B193)/192 (Numbers are random)

Conditional Mean Excel

=AVERAGEIF(range, criteria, [average_range])

Correlation Coefficient

=CORREL(array 1, array 2)

Normal Distribution Excel Formula

=NORM.DIST(x, mean, standard_dev, TRUE)

Percentile Formula Excel

=PERCENTILE.INC(array, k)

Standard Deviation Calculation on Excel

=STDEV.S(number 1, [number 2], ...)

Histogram

A histogram's x-axis represents bins corresponding to ranges of data; its y-axis indicates the frequency of observations falling into each bin.

Low R-squared, Low p-value

A low R-squared and low p-value indicates that the independent variable explains little variation in the dependent variable and the linear relationship between the two variables is significant.

Normal Distribution

A unique symmetrical shape whose center and width are determined by its mean and standard deviation respectively. Due to the normal ______symmetric shape, 50% of the probability lies below the mean, and 50% lies above the mean. For every normal distribution, the probability of being within a specified number of standard deviations from the mean is the same.

Outlier

A value that falls far from the rest of the data

According to the Central Limit Theorem, the means of random samples from which of the following distributions will be normally distributed, assuming the samples are sufficiently large?

According to the Central Limit Theorem, if we take large enough samples, the distribution of sample means will be normally distributed regardless of the shape of the underlying population.

ALL of the ways you can reduce the width of the confidence interval

Increase the sample size Decrease the confidence level

Examples of Biased Questions

Isn't Daft Punk a better band than Oasis? Research has linked carbon emissions to global warming. Do you think the US government should enact legislation to limit carbon emissions? Do you enjoy the work of such literary giants as William Shakespeare? Do you think people benefit from taking overpriced diet supplements?

How do you make sound inferences ?

Make sure the sample is representative of the population by choosing members randomly to ensure that each member of the population is equally likely to be included in the sample.

Confidence Interval Formula

Margin of Error=CONFIDENCE.NORM(alpha, standard_dev, size). The lower bound of the 95% confidence interval is the mean minus the margin of error The upper bound of the 95% confidence interval is the mean plus the margin of error

Let's return to the movie theater example and focus on the sample taken after the manager changes the theater's artistic focus. Suppose the average satisfaction rating of the sample is 9.9 out of 10. Which of the following do you think would be the correct conclusion? Remember that H0:μ=6.7 and Ha:μ≠6.7.

Reject the null hypothesis The null hypothesis is that the average satisfaction rating has not changed, that is, that the population mean μμ is still equal to 6.7. Drawing a sample with an average satisfaction rating of 9.9 from a population that has an average rating of 6.7 is extremely unlikely, so we would almost certainly reject the null hypothesis and conclude that the average satisfaction rating is no longer 6.7.

If outliers exist in a data set, one should...

Research the data points and then make a decision based on the findings. National Musuem of America Question Example Quiz 1: Q6: The consultant should delete or change data points only if careful examination of the data and the data sources indicates that the data points are incorrect or irrelevant to the research at hand. The consultant must use his or her experience and knowledge of the research question to make decisions on a case-by-case basis. Doing business analytics effectively requires judgment. In this case, the National Museum of American History underwent renovations which reduced significantly the number of visits to the museum in 2007 and 2008. The data points for 2007 and 2008 are correct and should not be changed. However, the fact that the museum was closed during most of that two year period should be considered when drawing conclusions from this data set.

Scatter Plot

Reveals the relationships between two variables, or data sets.

Coefficient Formula Excel

Standard Deviation/ Mean

Coefficient of variation

Standard Deviation/Mean

Which of the following formulas would calculate the statistic that is MOST APPROPRIATE for comparing the variability of two data sets with different distributions?

Standard Deviation/Mean. Explanation: This is the formula for the coefficient of variation, the best statistic to compute to compare the variability of two data sets with different distributions. Dividing by the mean provides a measure of the distribution's variation relative to the mean.

How to find the standard deviation with incomplete info? then to find cumulative probability.

Step 1: subtract the lower bound from the mean and divide by 1.96 Step 2: Use the Excel function NORM.DIST(x, mean, standard_dev, TRUE Example: "X" can be a student's score

Alternative Hypothesis

The alternative hypothesis (the opposite of the null hypothesis) is the theory or claim we are trying to substantiate. If our data allow us to nullify the null hypothesis, we substantiate the alternative hypothesis.

Which of the following is the MOST LIKELY result of using a survey with biased questions?

The data in your sample will differ in a systematic way from data based on unbiased random selections from the population.

Conditional Mean

The mean of a subset of the data. We apply a condition and calculate the mean for values that meet that condition.

Null Hypothesis

The null hypothesis is a statement about a topic of interest. It is typically based on historical information or conventional wisdom. We always start a hypothesis test by assuming that the null hypothesis is true and then test to see if we can nullify it—that's why it's called the "null" hypothesis. The null hypothesis is the opposite of the hypothesis we are trying to prove (the alternative hypothesis).

One side P Value

The one-sided p-value is half of the two-sided p-value. Since the two-sided p-value is 0.0040, the one-sided p-value is 0.0040/2=0.0020.

Type 1 Error

The probability of a type I error is equal to the significance level, which is 1-confidence level. A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% chance of making a type I error.

What happens to the sample mean and standard deviation as you take new samples of equal size?

The sample mean and standard deviation vary but remain fairly close to the population mean and standard deviation. EXPLANATION :Since each sample is randomly selected, the mean and standard deviation vary from one sample to the next. However, since the sample size is fairly large, each sample's mean and standard deviation are fairly close to the population mean and standard deviation. We'll learn more about how to select a good sample later.

Confidence interval

The sample mean is only a point estimate. We can construct a range around the sample mean, called a _______ which contains the true population mean with a certain level (e.g., 95%) of confidence For a 95% confidence interval, on average, 95% of samples drawn from the population will have the population mean within the confidence interval. Note that a confidence interval's level of confidence does not tell us the chance, probability, or likelihood that an individual confidence interval contains the true population mean. The width of the confidence interval depends on the level of confidence, our best estimate of the population standard deviation, and the sample size. We control only the level of confidence and the sample size.

By removing outliers from the data set, the standard deviation....

The standard deviation decrease. The standard deviation gives more weight to observations that are further from the mean. Therefore, removing the outliers would decrease the standard deviation.

Mode

Value that occurs most frequently in the data set. A data set may have multiple of these

Time Series

When one of the variables is time, the relationship is known as a....

Bimodal

When the distribution of data is bimodal, the mean may be less than, equal to, or greater than the median.

Skewed Left

When the distribution of data is skewed to the left, the mean is most likely less than the median. The extreme values in the left tail pull the mean towards them.

skewed right

When the distribution of data is skewed to the right, the mean is most likely greater than the median. The extreme values in the right tail pull the mean towards them.

Data Symmetric

When the distribution of data is symmetric, the mean and median are equal.

Percentile

another value of interest

Median

middle value of the data set. 50th percentile of the data set.


Conjuntos de estudio relacionados

CIT226 Win Server Management Exam Chapters 1 - 3

View Set

FILIPINO - PANGUNGUSAP NA WALANG PAKSA

View Set

IB Biology HL - Unit 3: Cellular Biology

View Set

Multiple Choice Questions on Restriction enzymes

View Set

Simulatanéité, Postériorité, Antériorité

View Set

Fundamentals Practice Test A with NGN

View Set