STAA TRUE or FALSE and why

Ace your homework & exams now with Quizwiz!

tα/2 is a critical value which depends entirely on the significance level α.

False, it also depends on the degrees of freedom. Note, however for a normal confidence interval, it just depends on sample size n, (d.f =n-1) and of course alfa.

The standard error for the estimator r of population correlation coefficient ρ is √((1−𝑟2)/(𝑛−1 )).

No, false, I guess this is the standard error in that case

In simple linear regression, the confidence interval for the mean value Y given X is wider than the prediction interval for an individual Y given X, since an additional unit term is included to add uncertainty for an individual case.

No, false. It's the other way around

With k=2 (X1, X2) in a multiple linear regression, the contribution of X2 in explaining SST is captured by SSR(X2| X1) = SSR(X1, X2) - SSR(X2)

No, false. We are checking for X2 so we have to remove X1. THE OTHER WAY AROUND

In the Tukey-Kramer procedure, the critical value is computed for each pairwise comparison of absolute difference of group means, when the group sizes are unequal.

TRUE. FOR EACH, IF THEY ARE UNEQUAL. If you mean, critical value. I guess its the same as the ciritical range. The absloute mean difference is calcualted for each pairwise comaparison of absoulte defferene of group menas. The critical value is calculated with a formula using n and MSW and df1=c=number of groups and df2, n-c= number of objects - number of groups.

H1: μ ≥ 120 is an example of an alternative hypothesis in a one-tail test.

This is false, An alternative hypotheis for a one tail test always includes < or >. This could be the case, ( the exmapkle ) for a null hypoteses.

It is possible that adj r2 can attain negative values.

This is true.

With a sample size greater than 100 and given the point estimate and standard error, the 95% confidence interval for μ is wider than a 90% confidence interval, but narrower than a 99% confidence interval.

True

In one-way ANOVA, when the Mean Square Among Groups is much larger than the Mean Square Within Groups, it is likely that the statistic F is greater than the critical value.

True, becuase this value will in this case be larger.

Assume that m variables are added to a multiple linear regression model which has q independent variables. In testing the hypothesis that the set of m variables improves the model, the partial F statistic has a denominator df = n-q-m-1 and numerator df = m.

(Remember k is the number of independent variables) No, false. This is just why do they include m? It should not be in the denomentor part and in the numerator part, it should be switched out to k.

Type 1 error is the kind of error in hypothesis testing which occurs when H0 is rejected when it is actually false, and Type 2 error in hypothesis testing occurs when H0 is rejected when in fact it is true?

False, No. its the other way around. Type I Error Reject a true null hypothesis Considered the more serious type of error The probability of a Type I error is a Type II Error Failure to reject a false null hypothesis The probability of a Type II error is β The power of the test is 1-β

In the Levene's test for homogeneity of variance, a one-way ANOVA is done on the transformed data for the groups, where each observation is replaced by its absolute difference from the corresponding group mean.

False, group median.

Multicollinearity is a violation of a regression model assumption, where some independent variables are highly correlated with other independent variables.

False, multicollineratity is not a violation of the model assumtions, however multicollinerity can lead to unstable coefficients ( large standard error and low t-values. (However it is true that the definition of multicollinearity includes that some independent variables are highly correlated with other independent variables.

Logarithmic transformation of nonlinear multiplicative and exponential regression models can lead to an additive structure similar to the multiple linear format.

I guess this is true. Depending on how you define additive structure.

In testing Ho: π1 - π2 = 0, p1 and p2 are used as separate estimators for π1 and π2, respectively, in estimating the standard error.

No ,false p1 and p2 is used to estimate the mean proportion p, which in turn is used to calculate the test statistic Z stat for two population proportions. No they are used to calculated p bar, in the standard error calculation. However, if this would have concerned a confidence interval, then this would have been correct.

The population proportion π is a statistic computed from a random sample.

No false, pi is a population parameter. the statistic used to estimate the proprtion is p.

If the Durbin-Watson D statistic is equal to 1.78 for the case where n=100 and k=3. From the Durbin-Watson table, dL=1.61 and dU=1.74, and hence, positive autocorrelation is present.

No false. When it is in that zone. We do not reject the null hypothesis. H0: Errors are not correlated H1: Autocorrelation is present.

In testing for the significance of a multiple linear regression model, it suffices to test that at least one βj is equal to 0.

No, almost trick question. We are testing for if at leasat one Bj is not equal to zero.

The sample size n=40 is sufficient for estminating the confidence interval for π if p=0.90.

No, becuase (1-p)*n is 0,1*40=4 which is lower than 5 in this case.

A 95% confidence interval for the mean, [25.6, 39.3], implies that the probability that the population mean is between 25.6 and 39.3 is 0.95

No, false A confidence interval estimate provides a range of values, based on the variability of the statistic. If we draw 100 samples of size n=25, the 95% confidence means that 95 of these 100 samples (that is, 95%) will yield confidence intervals that contain the true value µ. Thus, based on one random sample, you can be 95% confident that your estimated interval will be one of those intervals that contain µ.

If the associated probability of a test statistic is larger than the level of significance α, then one rejects the corresponding H0.

No, false if the assosciated probability of a test statistic is less than the level of significance alfa, then we reject the null hypothesis.

If a subset regression model deviates from the true/full model primarily with random differences, the mean value of the Mallows Cp statistic for the subset regression is equal to k, the number of independent variables in the said subset regression model.

No, false its equal to, or less than (k+1)

In testing for the ratio of two population variances, the statistic used is: 𝐹 =(𝑆1)^2/(𝑆2)^2 , where (𝑆2)^2 >(𝑆1)^2 .

No, false, S1 is the larger sample variance. Even though the beginning was correct.

Parsimony in multiple linear regression analysis is synonymous to optimizing r2 while taking as many candidate predictors as possible.

No, false, as few as possible.

The paired difference test for μ1 - μ2 entails getting only 1 random sample.

No, false, in this case you have two separate samples. (CH10)

Sp is the pooled sample standard deviation used in estimating the confidence interval for the difference of two population means when the different variances are large.

No, false, this method is used when the variances are equal.

Homoscedasticity is an assumption of linear regression wherein the error terms are independent.

No, false. Homoscalasticity is a term used for describing equal variance.

The coefficient of determination is equal to 0.80 in an estimated simple linear regression model. This means that (0.80)^2 = 0.64 => 64% of the variation in Y can be explained by its linear relationship with X.

No, false. The coeffiecint of determination r^2 is already squared. So, in this case 80% of the variation in Y could be explained by its linnear relationship with x.

For model validation in regression, one can do split-data approach where after the estimation process, one has to gather new data to check if the results are consistent.

No, false. You split the data you already have.

Zα is always less than or equal to tα, and Zα approaches the tα value as the sample size increases.

No, isen't it the other way around. tα (or the actual distribution referred to as only t? ) approaches Z when the sample size increases?

The confidence interval for μ coincides with the rejection region for the corresponding hypothesis test of Ho: μ=0 vs H1: μ≠0.

No, the rejection regions is not included included in the confidence interval.

The Deviance statistic follows an F-distribution with df=n-k-1, where n is the number of observations and k is the number of independent variables.

No, this is false, it follows a x^2 distribution. Otherwise it would have been correct.

In the simple linear regression model, β1 is the expected value of the dependent variable Y when the independent variable X = 1.

No, you are forgetting about the random error component, which should be included when we are talking about the population regression line. This is not the case for the estimate population regression equation. and you forgot about the population intercept term B0. Then we remove that term.

The computed 95% confidence interval for β1 in a simple linear regression is: [-2.51, 3.84]. This indicates that the coefficient β1 is significant.

No,false. Because zero is included?

The Cook's distance D is a function of residual, mean square error, and hi. Observations with extremely small D values must be investigated for possible influential impact on the regression function.

The first part is correct ( depending on where you draw the line for inclusion of what is included in a formula) But the second part is not correct. HIgh values exxceeding the F distribution with alfa = 0,5 are flagged as potentially highly influencial.

With σ known, the standard error for 𝑋̅ increases as the sample size decreases.

True, Look at the formula. The standard error is just the last part of the formula. (Sigma/squareRoot of n) The size (n) of a statistical sample affects the standard error for that sample. Because n is in the denominator of the standard error formula, the standard error decreases as n increases. It makes sense that having more data gives less variation (and more precision) in your results.

In the simple linear regression model, β0 is the expected value of the dependent variable Y when the independent variable X = 0.

True, if you do not account for: (now you forgot about the random error term again!?)

The scatter plot of residuals versus fitted values can reveal violation of model assumption of homoscedasticity.

True. This is one of the reasons that we do this. The other is to check for violations against the assumtion of linnearity.

With S1=2 and S2=1, we reject Ho: no difference between variances, with n1=n2=25 and α=0.05.

True. We reject. Two tail test. Fstat is 4. F critical is 1,9838. We definitently reject the null hypothesis of the variances being equal.

In computing for the required sample size for the confidence interval for proportion, using the worst-case scenario π =0.50 gives a sample size n which is larger than for any other value of π.

Yes true, its the other way around. 0,5*0,5=0,25 0,9*0,1=0,09 The first one will yield a larger sample size.

If the population standard deviation is 4 times the margin of error, the required sample size for the 95% confidence interval for the mean is n = (4*(Z)^2.

Yes true. IF we now assume all squares have been solved. 8 is 4 times of 2.

If there is a problem with heteroscedasticity, a possible remedial measure is to use a square root transformation.

Yes, I think is simply true. We use both square root transformation and log transformation for the same purpose. To overcome violations of non equal variance or non linnerarity.

The odds ratio in logistic regression indicates the ratio of the probability of the event of interest to its complement.

Yes, this is true.

If we were to include a particular categorical variable, which has 4 categories, in a multiple regression model, we have to create 3 dummy variables.

Yes, this is true. You always create one dummy variable less than the amount of catergories you are testing for.

𝑋̅ is a statistic used in the computation for the confidence interval of μ.

Yes, this is true. this is the Formula and the point estimate (Sample statistic is included in that formula.)

With σ = 10 and error margin = 2, the required sample size for the 99% confidence interval for the mean is larger than n = 100.

Yes, true, the sample size you need is 167

Influence analysis: if LEV is greater than 2(k+1)/n (where k=total number of independent variables; n=total number of observations), the observation is potentially an influential observation.

false. They have mixed them up. IF LEV is greater then 2k/n. IS for LEV. For hi, its 2(k+1) that is the value we are comparing with.

For the confidence interval for π1 - π2, the standard error for p1 - p2 is based on: 𝑝̅ = (𝑋1+𝑋2)/(𝑛1+𝑛2) .

FALSE The pooled estimate for the overall proportion is: 𝑝̅ = (𝑋1+𝑋2)/(𝑛1+𝑛2). Double check with classmates, but the standard error term is most likely whatever comes after the Zalfa or t-alfa term. In this case it is according to the picture.

The critical value for the Cook's D statistic is Fα=0.05(df1=k+1; df2=n-k-1), where k is the number of independent variables and n is the sample size.

False, TRICK QUESTION. alfa =0,5 not 0,05

The df for a sample mean is equal to the number of observations which are not free to vary.

NO, almost a bit trick question. Read it 45 times. Definition: Number of observations that are free to vary after sample mean has been calculated

One method of measuring multicollinearity is to determine VIF for each independent variable. A rule of the thumb is: if VIFj for Xj is greater than 5, then it can be associated with multicollinearity. This also means that the (𝑅𝑗)^2, the coefficient of multiple determination for a regression model, using Xj as the dependent variable and all other X variables as independent variables, is greater than 0.80.

Not sure. Y is the dependent value. Not the independet value.

In simple linear regression, the test statistic t for Ho: β1=0 has the same associated probability as the corresponding F from the ANOVA table for regression.

TRUE, Look at the picture.

The least squares method in regression is used to find the estimators of the regression coefficients by minimizing the sum of the squared residuals.

TRUE,I guess its the same as saying this:

In multiple linear regression, the r2 is always greater than adj r2.

Yes, True

Stepwise regression uses the associated p-values of t and F statistics in adding and removing candidate independent variables.

Yes, true. But double check. I guess F statistic for the significance and t statisitic for evaluating the slope.

Values computed from a population are called parameters.

Yes, true. as I have understood it.

A critical value for a confidence interval for μ solely depends on the sample size.

no, false. The critical value depends on alfa and the sample size! Looking at the table.

The coefficient of partial determination for k-variable linear regression model, with reference to the independent variable Xj, measures the proportion of variation in the dependent variable Y that is explained by Xj.

FALSE! . The only thing missing here in the definition is the part of holding all other variables constant. And then in the statement here they added. (With reference to Xj)

A point estimate is a single value of a parameter?

FALSE, A point estimate is a single value of a statistic. We can estimate a population parameter ...with a sample statistic (a point estimate)

If the associated p of a test statistic is larger than α, then in terms of absolute values, one should expect the computed test statistic to be greater than the corresponding critical value.

FALSE, If the p value is larger than alfa we do not reject the null hypotesis. If the computed test statisitic is greater than the critical value we reject the null hypoteis.

In inferential statistics, a standard error is the associated probability for a given test statistic.

FALSE, No, the test statistic is much more than that. Its the samlple mean minus my divided by (sigma(sqareroot of n)) AND IF this were to be correct, probably he would state: the standard error is the standard deviation of a point estimate. (or given test statistic)

Logistic regression: if 𝑒^(𝑏𝑗) > 0, the independent variable Xj increases the odds ratio of the event of interest, given that bj is the estimated coefficient for Xj.

FALSE, have to be larger than 1!!! bj shjould be the estimated coefficient of xj(?)

In the pooled-variance t-test for the difference of 2 means, with S1 = 2.5, S2 = 2.8, n1 = 20, and n2 = 25, the degrees of freedom is v = 43.

FALSE, v, is used tot calcualate the degrees of freedom for separate variance tests, not pooled ones. Just calculate with formula on formula sheet.

In the one-way ANOVA, the associated probability of the corresponding F statistic depends on the degrees of freedom df1 and df2, as well as α.

FALSE. df1=c-1 and df2=n-c, THE P VALUE DOES NOT DEPEND ON ALFA. So sneeky.

Assume that SSR in regression ANOVA is thrice the value of SSE. With SST = 1000, the coefficient of determination is equal to 0.70.

False. Simple calc yilds no. SST=SSR+SSE and r^2=SSR/SST


Related study sets

12a. Social Thinking and Social Influence

View Set

Chem - Hydrogen and Alkali --- Lanthanides & Actinides

View Set

Benign Prostatic Hyperplasia (BPH) Practice Questions (Test #5, Fall 2020)

View Set

Clinical Psych Test 2 (Chapters 4-8)

View Set

RT 240 Module 2 Assessment - Ch. 35

View Set

Epiphany Elements of a Short Story or Novel

View Set