Business Analytics 1 Final part 3

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Linearity

- examine scatter diagram (should appear linear) - examine residual plot (should appear random)

What does simple linear regression do?

-Finds a linear relationship between: one independent variable x and one dependent variable y

•Test whether the average age of respondents is equal to 35. What is the H0 and H1

-H0: mean age = 35 - H1: mean age <> 35

What are your two options with hypothesis?

-reject the null and conclude the sample data provides sufficient evidence to support H1 or - fail to reject the null and conclude the sample data does not support H1

Selecting the Proper Excel Procedure: •Population variances are unknown but assumed equal:

-t-Test: Two-Sample Assuming Equal Variances

Selecting the Proper Excel Procedure: •Population variances are unknown and assumed unequal:

-t-Test: Two-Sample Assuming Unequal Variances

confidence coefficient

1 - a =P(not rejecting H0 | H0 is true) - The Value of a can be controlled. Common Values are 0.01, 0.05, or 0.10

What is the Systematic Model Building Approach?

1.Construct a model with all available independent variables. Check for significance of the independent variables by examining the p-values. 2.Identify the independent variable having the largest p-value that exceeds the chosen level of significance. 3.Remove the variable identified in step 2 from the model and evaluate adjusted R^2 ----- Remove variables one at a time 4. continue until all variables are significant

What is the Hypothesis testing procedure?

1.Identify the population parameter and formulate the hypotheses to test. 2.Select a level of significance 3.Determine the decision rule on which to base a conclusion. 4.Collect data and calculate a test statistic. 5.Apply the decision rule and draw a conclusion.

Assumptions of ANOVA

1.are randomly and independently obtained, 2.are normally distributed, and have equal variances

What is the level of significance?

1.the risk of drawing an incorrect conclusion

how do you find chi-squared degrees of freedom?

= (r-1)(c-1) rows and columns

Type 1 error

= alpha(level of significance)= P(rejecting H0 | H0 is true)

Type 2 error

=Beta = P( not rejecting H0 | H0 is false)

Excel function CHISQ.INV.RT(probability, deg of freedom)

=x^2 that has a right tail area equal to probability for a specified degree of freedom ---- By setting prbability equal to the level of significance, we can obtain the critical value for the hypothesis test COMPUTES CRITICAL VALUE

Residual formula =

Actual Y value - Predicted Y value

Chi - square test calculations step 3:

Compare the chi-square statistic for the level of significance α to the critical value from a chi-square distribution with (r-1)(c-1) degrees of freedom, where r and c are the number of rows and columns in the cross tabulation table respectively

Chi - Square test calculations Step 2:

Compute a test statistic called shi-square statistic, which is the sum of the squares of the differences between observed frequency f0 and expected frequency, fe divided by the expected

Procedure Two- sample test for quality of variances

Excel F- test two-sample for variances

Procedure: Two - sample test for means o^2 unknown, assumed equal

Excel t-test: Two sample assuming equal variances

Procedure: Two sample test for means, o^2 unknown, assumed unequal

Excel t-test: two sample assuming unequal variances

What Procedure? Two-sample test for means O^2

Excel z-test: two sample for means

The principle of Parismony

Good models are as simple as possible

What is the assumption of hypothesis testing?

H0 is true and uses the sample data to determine whether H1 is more likely to be true

•CadSoft sampled 44 customers and asked them to rate the overall quality of a software package. Sample data revealed that 35 respondents (a proportion of 33/44 = 0.795) thought the software was very good or excellent. In the past, this proportion has averaged about 75%. Is there sufficient evidence to conclude that this satisfaction measure has significantly exceeded 75% using a significance level of 0.05?

Hypotheses: - H0 : pi =< 0.75 - H1 : pi > 0.75 Test statistic z Critical value = NORM.S.INV(0.95) = 1.645 P-value = 1-NORM.S.DIST(0.69,TRUE) Do not reject H0

Using the t-statistic

If |t|<1, then the standard error will decrease and adjusted R^2 will increase if the variable is removed. If |t|>1 then the opposite will occur----- you are using t-values instead of p-values basically

Simple Linear Regression

Involves a single independent variable

Are you proving anything with hypothesis testing?

No you are not

H0=? H1=?

Null hypothesis (describes an existing theory) Alternative Hypothesis (the complement of the null)

What do you do if T is smaller than the lower critical value?

Reject the Null hypopthesis

What is the rule of thumb for standard residual?

Standard residuals outside of +/-2 or +/-3 are potential outliers

Independence of Errors:

Successive observation s should not be related - This is important when the independent variable is time

What excel function for a two tailed test using t-distribution?

T.INV(1-a/2,n-1) or T.INV.2T(a, n-1)

What does an adjustment in R^2 indicate?

That the model has improved

what is the test statistic used for?

The decision to reject or fail to reject a null hypothesis - depends on the type of hypothesis test

What happens the further way the mean us from the hypothesized?

The smaller the value of B

Power test

The value of 1 - Beta =P(rejecting H0 | H0 is false) The value of β cannot be specified in advance and depends on the value of the (unknown) population parameter.

What is the problem with higher order polynomials?

They are generally not very smooth and hard to interpret visually - DO NOT recommend going beyond the 3rd order

If the test statistic is nonnegative, then

This is the correct p-value for an upper tail test but you must subtract from 1 for a lower tailed test

Excel output: If the test statistic is negative for a one-tailed p-value...

This is the correct value for a lower tailed test but from an upper tailed test you must subtract the value from one

What is a 2nd order polynomial shapped like?

U-shaped

Chi - Square test calculations Step 1:

Using a cross-tabulation of the data, compute the expected frequency if the two variables are independent.

Standard Error

Variability between observed and predicted Y values... "Standard Error of the Estimate"

Improving the power of the test

We would like the power of the test to be high (equivalently, we would like the probability of a type 2 error to be low) to allow us to make a valid conclusion

What do you do in situations where the data is naturally paired/matched?

a paired t-test is more accurate than assuming that the data come from independent populations. UD is the main difference betweenthe paired samples

chi-square test for independence

a test to determine whether two classifications are independent H0= Two categorical variables are independent H1 = two categorical variables are dependent

adjusted R square

adjusts R2 for sample size and number of X variables

For an upper-tailed test, if the confidence interval falls entirely above the hypothesized value, we

also reject the null hypothsis

What does adjusted R^2 reflect?

both the number of independent variables and the sample size and may either increase or decrease when an independent variable is added or dropped. An increase in adjusted

how can we test for interactions?

by defining a new variable as the product of the two variables, X3=X1*X2 and testing whether this variable is significant, leading to an alternative model

For a lower test of a one-tailed critical value in excel what must you do?

change the sign

What is chi-square distribution?

characterized by degrees of freedom - is a sampling distribution

What is multiple R squared called?

coefficient of multiple determination

The conclusion to reject or fail to reject H0 is based on....

comparing the value of the test statistic to a "critical value" from the sampling distribution of the test statistic when the null hypothesis is true and the chosen level of significance a

What should you do if you choose a small level of significance?

compensate by having a large sample size

ANOVA

conducts an F-test to determine whether variation in Y is due to varying levels of X --- Used to test the significance of regression: H0: population slope coefficient = 0 H1: population slope ciefficient <> 0

what does the critical value do to the sample distribution?

divides the sampling distribution into two parts, a rejection region and a non-rejection region. If the test statistic falls into the rejection region, we reject the null hypothesis; otherwise, we fail to reject it.

What is the base of natural log functions and what is it used for?

e=2.71828 used for b a lot

Procedure: Paired two - sample test for means

excel t-test: paired two sample for means

What do small sample sizes result in?

in a low value of 1 - B

for a one tail test if H1 is stated as >, the rejection region is...

in the upper tail

For ANOVA what does rejecting H0 mean?

indicates that X explains variation in Y

b0=

intercept

Hypothesis Testing

involves drawing inferences about two contrasting propositions (each called a hypothesis) relating to the value of one or more population parameters

Multiple Regression

involves two or more independent variables

variance inflation factor

is a better indicator of Multicollinearity but is not on excel

What happens when significant Multicollinearity is present?

it becomes difficult to isolate the effect of one independent variable on the dependent variable, the signs of coefficients may be the opposite of what they should be, making it difficult to interpret regression coefficients, and p-values can be inflated.

If the test statistic is a nonnegative for the p-value then...

it is correct for an upper - tial test, but for a lower tail test, you must subtract this number from 1.0 to get the correct p-value

How do you data correlation matrix of the recommended threshhold of +/- 0.7

large correlations exist

What is significant about correlations exceeding +/- 0.7?

may indicate multicollinearity

What is the R^2

measure of the fit of the line to the data - the value is between 0-1 where 1 equals a perfect fit the larger the value

What is Multiple R called

multiple correlation coefficient

Linear regression model with more than one independent variable is called what?

multiple linear regression model: y = dependent variable X1....Xk = independent (explanatory) variables B0= is the intercept term B1.....Bk are the regression coefficients for independent variables E= error term

What does regression analysis require?

numerical data

interaction

occurs when the effect of one variable is dependent on another variable

Multiconllinearity

occurs when there are strong correlations amoung the independent variables and they can predict each other better than the dependent variable

types of hypothesis tests

one sample test for mean o known one- sample test for mean, o unknown

Why would you reject H0 relating to the P-value?

p-value < a

What does the excel function CHISQ.TEST(actual range, expected range) compute?

p-value for the chi- squared test

One Sample Tests for Proportions

pi0 is the hypothesized value and p-hat is the sample proportion

•Test whether the average age of respondents is equal to 35. -H0: mean age = 35 - H1: mean age <> 35 •n = 34; sample mean = 38.677; sample standard deviation = 7.858. What is the test statistic?

reject H0

For a lower tailed test if the confidence interval falls entirely below the hypothesized value we...

reject the null hypothesis

Partial Regression Coefficients

represent the expected change in the dependent variable when the associated independent variable is increased by one unit while the values of all other independent variables are held constant.

Standard Residual formula =

residual/standard deviation

What does ANOVA test for?

significance of the entire model... that is it computes an F-statistic testing the hypothesis H0=B1=B2=...Bk=0 H1= at least one Bj is not 0

B1=

slope

•In the CadSoft example, sample data for 44 customers revealed a mean response time of 21.91 minutes and a sample standard deviation of 19.49 minutes.

t = −1.05 indicates that the sample mean of 21.91 is 1.05 standard errors below the hypothesized mean of 25 minutes.

What is an alternative way of testing weather or not the slope is 0?

t-test

What does the F-test do?

test for equality of variances between two samples - MUST assum that both samples are drawn from normal populations

Excel Output if the test statistic is negative

the one-tailed p-value is the correct p-value for a lower - tail test

What happens as the R2 value increases?

the polynomial increases; that is, a 4th order polynomial will provide a better fit than a 3rd order, and so on.

Where is the rejection for a one tailed test of H1 is stated as <

the rejection region is in the lower tail

What happens if we are not able to reject the null with a certain variable in an ANOVA for Multiple Regression?

then that independent variable is not significant and probably should not be included in the model. You remove and then rerun until you can reject the null

factor

variable of interest

Homoscedasticity

variation about the regression line is constant - examine the residual plot

Normality of Errors

view a histogram of standard residuals, regression is robust to departures from normality

Multiple R

where r is the sample correlation coefficient. The value of r varies from −1 to +1 (r is negative if slope is negative)

Linear

y=a+bx

Exponential

y=ab^x

POlynomial (2nd Order)

y=ax^2+bx+c

POlynomial (3rd order)

y=ax^3+bx^2+cx+d

POwer

y=ax^b

Logarithmic

y=ln(x)

Simple Linear Regression Model

you calculate the E seperately and is not used in estimating the paramters

Excel output: upper tail p-value test and the test statistic is negative

you must subtract this number from one to get the correct p-value

Selecting the Proper Excel Procedure: •Population variances are known:

z-test: two-sample for means

Two Sample Hypothesis Test: Upper tailed Test

•This test seeks evidence that the difference between population parameter (1) and population parameter (2) is greater than some value, D0 When D0=0 the test simplu seeks to conclude whether population parameter (1) is larger than population parameter (2).

Two sampled Hypothesis Tests: Lower-tailed test H0

•This test seeks evidence that the difference between population parameter (1) and population parameter (2) is less than some value D0 When D0= 0, the test simply seeks to conclude whether population parameter (1) is smaller than population parameter (2).

Two sampled Hypothesis Tests: Lower-tailed test H1

•This test seeks evidence that the difference between population parameter (1) and population parameter (2) is less than some value D0 When D0= 0, the test simply seeks to conclude whether population parameter (1) is smaller than population parameter (2).

Two-Sample Hypothesis tests: Two Tailed Test

•This test seeks evidence that the difference between the population parameters is equal to, D0 - When D0- 0 we are seeking evidence that population parameter (1) differs from population parameter (2) ---- In most applications D0=0 and we are simply seeking to compare the population parameters

Residuals

•are the observed errors associated with estimating the value of the dependent variable using the regression line:

Overfitting means...

•fitting a model too closely to the sample data at the risk of not fitting it well to the population in which we are interested.

Statistical Inference

•focuses on drawing conclusions about populations from samples. •Statistical inference includes estimation of population parameters and hypothesis testing, which involves drawing conclusions about the value of the parameters of one or more populations.

Regression Analysis

•is a tool for building mathematical and statistical models that characterize relationships between a dependent (ratio) variable and one or more independent, or explanatory variables (ratio or categorical), all of which are numerical.

p-value

•is the probability of obtaining a test statistic value equal to or more extreme than that obtained from the sample data when the null hypothesis is true.

What does Anova Measure?

•measures variation between groups relative to variation within groups.


संबंधित स्टडी सेट्स

PREP U Chapter 65: Assessment of Neurologic Function

View Set

civil rights topic 2- To what extent did the spread of Jim Crow Laws change race relations in the south and how were black people excluded from voting?

View Set

Prep U:Chapter 34=Drug Therapy for Fluid Volume Excess

View Set

Greatest Common Factor (GCF):, GCF and LCM of Monomials

View Set

Handling Difficult Customer Situations - Chapter 5

View Set

Chemistry 1120 Unit 1: Chapter 7: The Quantum-Mechanical Model of the Atom

View Set