Business Analytics 2 - Final Exam
Confidence Interval
An interval that encloses an unknown population parameter with a certain level of confidence
ANOVA Basic Idea
- Compares two types of variation to test equality of means - Comparison basis is ratio of variances - If treatment variation is significantly greater than random variation then means are not equal - Variation measures are obtained by partitioning total variation
ANOA f-test
- Test the equality of two or more (k) population means - Variables have one nominal scaled independent variable, two or more (k) treatment levels or classifications, one interval or ratio scaled dependent variable - Used to analyze completely randomized experimental designs
The central limit theorem is where sample statistics are:
- normal - centered at the population mean - the standard deviation is equal to the population standard deviation divided by the square root of the sample size - It is central to most hypothesis testing and confidence interval construction
Obs. value dependent = 25.7 Predicted value = 38.1 Mean of 29.8 Standard deviation of 2.3 for independent variance of 45 What is the residual?
-12.4
Use the finite population correction factor when n/N >
.05
Conditions required for a valid large sample confidence interval for μ
1. A random sample is selected from target population 2. The sample size n is large. Due to the central limit theorem, this condition guarantees that the sampling distribution of x bar is approximately normal. Also, for large n, s will be a good estimator of σ.
Conditions required for a valid small-sample confidence interval for μ
1. A random sample is selected from the target population 2. The population has a relative frequency distribution that is approximately normal
Interval Estimation Points
1. Provides a range of values 2. Gives information about closeness to unknown population parameter 3. Example: Unknown population mean lies between 50 and 70 with 95% confidence
Point Estimator points
1. Provides a single value based on observations from one sample 2. Gives no information about how close the value is to unknown population parameter 3. Example: sample mean xbar= 3 is the point estimate of the unknown population mean
5 Step Hypothesis Testing
1. Specify the Null Hypothesis 2. Specify the Alternative Hypothesis 3. Set the Significance Level (a) 4. Calculate the Test Statistic and Corresponding P-Value 5. Drawing a Conclusion
Parameter
A numerical descriptive measure of a population. Because it is based on all the observations in the population, its value os almost always unknown.
Sample Statistic
A numerical descriptive measure of a sample. It is calculated from the observations in the sample.
Type 2 Error
occurs if the researcher accepts the null hypothesis when, in fact H0 is false. The probability of committing this error is denoted by B.
Type 1 Error
occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis when, in fact, H0 is true. The probability of committing a this error is denoted by a.
Point Estimator
of a population parameter is a rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the population parameter
Sampling Distribution
of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic.
Rejection Region
of a statistical test is the set of possible values of the test statistic for which the researcher will reject H0 in favor of Ha.
Treatments
of an experiment are the factor level combinations used
Confidence interval for population proportion: The mean of the sampling distribution of p̂ is p; that is, p̂ is an unbiased estimator of ____.
p
Sampling distribution of a statistic
the theoretical probability distribution of the statistic in repeated sampling
Target Parameter
the unknown population parameter (e.g mean or proportion) that we are interested in estimating
Two way analysis of variance
there is evidence that there are difference between the weeks, but not the days @ alpha 0.05. Week 5 shows largest means and turkeys HSD results should be investigated
Confidence interval for population proportion: The standard deviation of the sample distribution of p̂ is
this equation
Large sample confidence interval for p̂
this equation
Sample size determination for 100(1-a)% confidence interval for μ
this equation
When σ is unknown and n is large (n>30), the confidence interval is approximately equal to, , where s is the sample standard deviation.
this equation
Factors
those variables whose effect on the response is of interest to the experimenter - also referred to as independent variables
Small sample confidence interval for μ
where t a/2 is based on (n-1) degrees of freedom:
p̂ =
x/n
Null Hypothesis
H0, represents the hypothesis that will be accepted unless the data provides convincing evidence that it is false. This usually represents the "status quo" or some claim about the population parameter that the researcher wants to test.
Correct answer about pizza ->
H0= Beta (sub dollars off) = 0 Ha= Beta (sub dollars off) is not equal to 0 Predicator variance off / dollars off
Alternative Hypothesis
Ha, represents the hypothesis that will be accepted only if the data provides convincing evidence of its truth. This usually represents the values of a population parameter for which the researcher wants to gather evidence to support.
Why must CI output need: 1. count to be > or equal to 15 2. total - count > or equal to 15
If these fail, we cannot draw a CI, b/c insufficient sample size
σ2 =
Population Variance
contingency table: which has most unusual results
Processors & lawsuits, much lower
Coefficient of Determination =
R^2
x bar =
Sample Mean
In general, we express the reliability associated with a confidence interval for the population mean μ by specifying the ______ ______ within which we want to estimate μ with 100(1-a)% confidence . The _____ _____ then is equal to the half-width of the confidence interval.
Sampling Error
How do you check for the linearity condition in a simple linear reg. model?
Scatterplot of independent variance against dependent variance
Check for linearity in multiple regression?
Scatterplots of y against each of the predictors
Degrees of Freedom
The actual amount of variability in the sampling distribution of t depends on the sample size n. A convenient way of expressing this dependence is to say that the t statistic has (n-1) DF.
If the mean of the sampling distribution is not equal to the parameter, the statistic is said to be a __________ __________ of the parameter.
biased estimate
Assumptions
clear statements of any assumptions made about the population(s) being sampled
When in doubt about outliers, the most conservative approach to take is...
create & report 2 linear regression models one with outliers, one without
Regression to mean ->
each predicted y tends to be closer to its mean (mean of y) than corresponding x was
One-tailed, lower tailed
ex. Ha: μ < 2,400
One-tailed, upper tailed
ex. Ha: μ > 2,400
Two tailed
ex. Ha: μ ≠ 2,400
In order for chi square to have sufficient sample size, the
expected value for any cell must be at least 5
T-Statistic
has a sampling distribution very much like that of the z-statistic: mound shaped, symmetric, with mean 0. - The primary difference between the sampling distributions of t and z is that the t-statistic is more variable than the z-statisitc.
Confidence Interval/Interval Estimator
is a formula that tells us how to use the sample data to calculate an interval that estimates the target parameter
Test Statistic
is a sample statistic computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypothesis
Statistical Hypothesis
is a statement about the numerical value of a population parameter
Confidence Level
is the confidence coefficient expressed as a percentage
Experimental Unit
is the object on which the response and factors are observed or measured
Response variable
is the variable of interest to be measured in the experiment - also known as the dependent variable quantitive
w/o pooled variance ->
no reason to know variability. Generally, not pooled
Confidence interval for population proportion: For large samples, the sampling distribution of p̂ is approximately normal. A sample size is considered large if both
np>15 and nq<15
SE =
Width/2
Factor Levels
are the values of the factor used in the experiment
Qualitative Factors
are those that are not (naturally) measured on a numerical scale
Proportion one 2,204,000 people in jail 13.6% for murder.... how many for murder?
178,976 (confused about this one but going with it)
If our confidence level is 95%, then in the long run, 95% of our confidence intervals will contain μ and
5% will not.
If you are concerned, as a student, whether the instructor proportional assignment of letter grades is same for bother genders... which test?
Chi square of homogeneity
Finite Population Correction Factor
In some sampling situations, the sample size n may represent 5% or perhaps 10% of the total number N of sampling units in the population. When the sample size is large relative to the number of measurements in the population, the standard errors of the estimators of μ and p should be multiplied by this factor
Advantages of ANOVA
Investigator can look at several factors impact on dependent variables
Design Study
Is one for which the analyst controls the specification of the treatments and the method of assigning the experimental units to each treatment
Confidence Coefficient
Is the probability (1-a) that a randomly selected confidence interval encloses the true value of the population parameter
In order to determine whether you should calculate an independent or dependent sample t-test, you must know
No data collection methods and research design data
If conditions are not satisfied for ANOVA, one should use a __________ _________ _________ such as krystal-wallis H test.
Non-parametric stat method
Which hypothesis is currently believed to be true -> "published" or "historically"
Null Hypothesis
Observational Study
One for which the analyst simply observes the treatments and the response on a sample of experimental units
For a confidence coefficient of 95%, the area in the two tails is .05. To choose a different confidence coefficient we increase or decrease the ________ (called a) assigned to the tails.
area
p-value
The observed significance level for a specific statistical test is the probability (assuming H0 is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis and supportive of the alternative hypothesis, as the actual one compute from the sample data
Test that is most robust against the relation of the need for nearly normal, unimodal population
Two-tailed
If the sampling distribution of a sample statistic has a mean equal to the population parameter the statistic is intended to estimate, the statistic is said to be an _____________ _________ of the parameter.
Unbiased Estimate
Completely Randomized design
a design in which the experimental units are randomly assigned to the K treatments or in which independent random samples of experimental units are selected for each treatment - subjects are assumed homogenous - one factor or independent variable - two or more treatment levels or classifications - analyzed by one way analysis of variance (ANOVA)
Unbiased Estimator
a statistic with a sampling distribution mean qual to the population parameter being estimated
Quantitative Factors
are measured by numerical scale
_______ of a test is the probability of observing a value that is at least as extreme as the computed test statistic
p-value
μ =
population mean
σ =
population standard deviation
p =
proportion
Confidence Interval of proportions center their sampling distributions on ....
p̂
mean =
quantitive
proportion =
quantitive
0 is not included in CI, so there are no ____________
reservations
Mean of the sampling distribution equals mean of ...
sampled population
Standard deviation of the sampling distribution equals
standard deviation of sampled proportions/ square root of sample size
MLR results ->
strongest variance = age smallest p largest T