Business Analytics 2 - Final Exam

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Confidence Interval

An interval that encloses an unknown population parameter with a certain level of confidence

ANOVA Basic Idea

- Compares two types of variation to test equality of means - Comparison basis is ratio of variances - If treatment variation is significantly greater than random variation then means are not equal - Variation measures are obtained by partitioning total variation

ANOA f-test

- Test the equality of two or more (k) population means - Variables have one nominal scaled independent variable, two or more (k) treatment levels or classifications, one interval or ratio scaled dependent variable - Used to analyze completely randomized experimental designs

The central limit theorem is where sample statistics are:

- normal - centered at the population mean - the standard deviation is equal to the population standard deviation divided by the square root of the sample size - It is central to most hypothesis testing and confidence interval construction

Obs. value dependent = 25.7 Predicted value = 38.1 Mean of 29.8 Standard deviation of 2.3 for independent variance of 45 What is the residual?

-12.4

Use the finite population correction factor when n/N >

.05

Conditions required for a valid large sample confidence interval for μ

1. A random sample is selected from target population 2. The sample size n is large. Due to the central limit theorem, this condition guarantees that the sampling distribution of x bar is approximately normal. Also, for large n, s will be a good estimator of σ.

Conditions required for a valid small-sample confidence interval for μ

1. A random sample is selected from the target population 2. The population has a relative frequency distribution that is approximately normal

Interval Estimation Points

1. Provides a range of values 2. Gives information about closeness to unknown population parameter 3. Example: Unknown population mean lies between 50 and 70 with 95% confidence

Point Estimator points

1. Provides a single value based on observations from one sample 2. Gives no information about how close the value is to unknown population parameter 3. Example: sample mean xbar= 3 is the point estimate of the unknown population mean

5 Step Hypothesis Testing

1. Specify the Null Hypothesis 2. Specify the Alternative Hypothesis 3. Set the Significance Level (a) 4. Calculate the Test Statistic and Corresponding P-Value 5. Drawing a Conclusion

Parameter

A numerical descriptive measure of a population. Because it is based on all the observations in the population, its value os almost always unknown.

Sample Statistic

A numerical descriptive measure of a sample. It is calculated from the observations in the sample.

Type 2 Error

occurs if the researcher accepts the null hypothesis when, in fact H0 is false. The probability of committing this error is denoted by B.

Type 1 Error

occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis when, in fact, H0 is true. The probability of committing a this error is denoted by a.

Point Estimator

of a population parameter is a rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the population parameter

Sampling Distribution

of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic.

Rejection Region

of a statistical test is the set of possible values of the test statistic for which the researcher will reject H0 in favor of Ha.

Treatments

of an experiment are the factor level combinations used

Confidence interval for population proportion: The mean of the sampling distribution of p̂ is p; that is, p̂ is an unbiased estimator of ____.

p

Sampling distribution of a statistic

the theoretical probability distribution of the statistic in repeated sampling

Target Parameter

the unknown population parameter (e.g mean or proportion) that we are interested in estimating

Two way analysis of variance

there is evidence that there are difference between the weeks, but not the days @ alpha 0.05. Week 5 shows largest means and turkeys HSD results should be investigated

Confidence interval for population proportion: The standard deviation of the sample distribution of p̂ is

this equation

Large sample confidence interval for p̂

this equation

Sample size determination for 100(1-a)% confidence interval for μ

this equation

When σ is unknown and n is large (n>30), the confidence interval is approximately equal to, , where s is the sample standard deviation.

this equation

Factors

those variables whose effect on the response is of interest to the experimenter - also referred to as independent variables

Small sample confidence interval for μ

where t a/2 is based on (n-1) degrees of freedom:

p̂ =

x/n

Null Hypothesis

H0, represents the hypothesis that will be accepted unless the data provides convincing evidence that it is false. This usually represents the "status quo" or some claim about the population parameter that the researcher wants to test.

Correct answer about pizza ->

H0= Beta (sub dollars off) = 0 Ha= Beta (sub dollars off) is not equal to 0 Predicator variance off / dollars off

Alternative Hypothesis

Ha, represents the hypothesis that will be accepted only if the data provides convincing evidence of its truth. This usually represents the values of a population parameter for which the researcher wants to gather evidence to support.

Why must CI output need: 1. count to be > or equal to 15 2. total - count > or equal to 15

If these fail, we cannot draw a CI, b/c insufficient sample size

σ2 =

Population Variance

contingency table: which has most unusual results

Processors & lawsuits, much lower

Coefficient of Determination =

R^2

x bar =

Sample Mean

In general, we express the reliability associated with a confidence interval for the population mean μ by specifying the ______ ______ within which we want to estimate μ with 100(1-a)% confidence . The _____ _____ then is equal to the half-width of the confidence interval.

Sampling Error

How do you check for the linearity condition in a simple linear reg. model?

Scatterplot of independent variance against dependent variance

Check for linearity in multiple regression?

Scatterplots of y against each of the predictors

Degrees of Freedom

The actual amount of variability in the sampling distribution of t depends on the sample size n. A convenient way of expressing this dependence is to say that the t statistic has (n-1) DF.

If the mean of the sampling distribution is not equal to the parameter, the statistic is said to be a __________ __________ of the parameter.

biased estimate

Assumptions

clear statements of any assumptions made about the population(s) being sampled

When in doubt about outliers, the most conservative approach to take is...

create & report 2 linear regression models one with outliers, one without

Regression to mean ->

each predicted y tends to be closer to its mean (mean of y) than corresponding x was

One-tailed, lower tailed

ex. Ha: μ < 2,400

One-tailed, upper tailed

ex. Ha: μ > 2,400

Two tailed

ex. Ha: μ ≠ 2,400

In order for chi square to have sufficient sample size, the

expected value for any cell must be at least 5

T-Statistic

has a sampling distribution very much like that of the z-statistic: mound shaped, symmetric, with mean 0. - The primary difference between the sampling distributions of t and z is that the t-statistic is more variable than the z-statisitc.

Confidence Interval/Interval Estimator

is a formula that tells us how to use the sample data to calculate an interval that estimates the target parameter

Test Statistic

is a sample statistic computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypothesis

Statistical Hypothesis

is a statement about the numerical value of a population parameter

Confidence Level

is the confidence coefficient expressed as a percentage

Experimental Unit

is the object on which the response and factors are observed or measured

Response variable

is the variable of interest to be measured in the experiment - also known as the dependent variable quantitive

w/o pooled variance ->

no reason to know variability. Generally, not pooled

Confidence interval for population proportion: For large samples, the sampling distribution of p̂ is approximately normal. A sample size is considered large if both

np>15 and nq<15

SE =

Width/2

Factor Levels

are the values of the factor used in the experiment

Qualitative Factors

are those that are not (naturally) measured on a numerical scale

Proportion one 2,204,000 people in jail 13.6% for murder.... how many for murder?

178,976 (confused about this one but going with it)

If our confidence level is 95%, then in the long run, 95% of our confidence intervals will contain μ and

5% will not.

If you are concerned, as a student, whether the instructor proportional assignment of letter grades is same for bother genders... which test?

Chi square of homogeneity

Finite Population Correction Factor

In some sampling situations, the sample size n may represent 5% or perhaps 10% of the total number N of sampling units in the population. When the sample size is large relative to the number of measurements in the population, the standard errors of the estimators of μ and p should be multiplied by this factor

Advantages of ANOVA

Investigator can look at several factors impact on dependent variables

Design Study

Is one for which the analyst controls the specification of the treatments and the method of assigning the experimental units to each treatment

Confidence Coefficient

Is the probability (1-a) that a randomly selected confidence interval encloses the true value of the population parameter

In order to determine whether you should calculate an independent or dependent sample t-test, you must know

No data collection methods and research design data

If conditions are not satisfied for ANOVA, one should use a __________ _________ _________ such as krystal-wallis H test.

Non-parametric stat method

Which hypothesis is currently believed to be true -> "published" or "historically"

Null Hypothesis

Observational Study

One for which the analyst simply observes the treatments and the response on a sample of experimental units

For a confidence coefficient of 95%, the area in the two tails is .05. To choose a different confidence coefficient we increase or decrease the ________ (called a) assigned to the tails.

area

p-value

The observed significance level for a specific statistical test is the probability (assuming H0 is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis and supportive of the alternative hypothesis, as the actual one compute from the sample data

Test that is most robust against the relation of the need for nearly normal, unimodal population

Two-tailed

If the sampling distribution of a sample statistic has a mean equal to the population parameter the statistic is intended to estimate, the statistic is said to be an _____________ _________ of the parameter.

Unbiased Estimate

Completely Randomized design

a design in which the experimental units are randomly assigned to the K treatments or in which independent random samples of experimental units are selected for each treatment - subjects are assumed homogenous - one factor or independent variable - two or more treatment levels or classifications - analyzed by one way analysis of variance (ANOVA)

Unbiased Estimator

a statistic with a sampling distribution mean qual to the population parameter being estimated

Quantitative Factors

are measured by numerical scale

_______ of a test is the probability of observing a value that is at least as extreme as the computed test statistic

p-value

μ =

population mean

σ =

population standard deviation

p =

proportion

Confidence Interval of proportions center their sampling distributions on ....

mean =

quantitive

proportion =

quantitive

0 is not included in CI, so there are no ____________

reservations

Mean of the sampling distribution equals mean of ...

sampled population

Standard deviation of the sampling distribution equals

standard deviation of sampled proportions/ square root of sample size

MLR results ->

strongest variance = age smallest p largest T


संबंधित स्टडी सेट्स

Management Information Systems CH 2

View Set

Marketing Ch. 5 Overview Assignment

View Set

Chapter 11- Federal Reserve System

View Set

principles of real estate part 3

View Set