PE220 ANOVA
Predictor variable populations do not match.
When you have a mismatch between the Type I and Type III SS(Sum of Squares), what does this indicate? A. Type I means are lower than expected. B. Predictor variable populations do not match. C. Type III means are 3x the population counts.
False The Tukey method and the pairwise t-tests are two methods you learned about that compare all possible pairs of means, so they can be used only when you make pairwise comparisons. The Dunnett method compares all categories to a control group.
The Dunnett method compares all possible pairs of means, so it can be used only when you make pairwise comparisons. a. true b. false
DFFITS and CooksD only The variable Summary_i compresses the indicator variables RStud_i, DFits_i, and CookD_i into a single variable, with values in the order shown in the assignment statement that defines Summary_i. Therefore, the Summary_i value 011 means that the RStudent value did not exceed the cutoff, but the values for DFFITS and CooksD did.
The observation below is from the data set InfluentialBF. Assume that these assignment statements were used in creating the data set: CutDFFits=2*(sqrt(&numparms/&numobs)); CutCooksD=4/&numobs; RStud_i=(abs(RStudent)>3); DFits_i=(abs(DFFits)>CutDFFits); CookD_i=(CooksD>CutCooksD); Summary_i=compress(RStud_i||DFits_i||CookD_i); For which statistics did this observation exceed the cutoff criteria? a. RStudent, DFFITS, and CooksD b. RStudent and DFFITS only c. RStudent and CooksD only d. DFFITS and CooksD only
the simplest model with the best performance on the validation data The best model is the simplest (the most parsimonious) model that has the best performance on the validation data. The training data is used to fit the model and generate the possible models to be assessed.
When you use honest assessment, which of the following would be considered the best model? a. the simplest model with the best performance on the training data b. the simplest model with the best performance on the validation data c. the most complex model with the best performance on the training data d. the most complex model with the best performance on the validation data
proc glm data=SASUSER.MLR; class c1; model y=c1 x1-x3 /solution; run; D is correct A. Continuous vars will be treated as categorical variables by adding in the CLASS statement. B. CLASS statement is not a valid option in PROC REG. C. SOLUTION is not a valid option in PROC REG model statement.
A linear model has the following characteristics: a dependent variable (y) three continuous predictor variables (x1-x3) one categorical predictor variable (c1 with 3 levels) Which SAS program fits this model? A. proc glm data=SASUSER.MLR; class c1 x1 x2 x3; model y=c1 x1-x3 /solution; run; B. proc reg data=SASUSER.MLR; model y=c1 x1-x3 /solution; run; C. proc reg data=SASUSER.MLR; class c1; model y=c1 x1-x3; run; D. proc glm data=SASUSER.MLR; class c1; model y=c1 x1-x3 /solution; run;
CLASS Statement
How do you tell PROC TTEST that you want to do a two-sample TTEST? A. SAMPLE=2 OPTION B. CLASS STATEMENT C. GROUPS=2 OPTION D. PAIRED STATEMENT
Oxygen Consumption decreases by a value of -3.31085 for each 1 unit increase of runtime.
If runtime increases by 1 unit, what happens to the response var(oxygen consumption)? A. Predictor var (runtime) is negative so no effect. B. Oxygen Consumption decreases by a value of -3.31085 for each 1 unit increase of runtime.
No
If there is no correlation among the predictor variables, can there still be collinearity in the model? Yes No
that the errors are normally distributed The Residuals versus Quantile plot is a normal quantile plot of the residuals. Using this plot, you can verify that the errors are normally distributed, which is one of our assumptions. Here the residuals follow the normal reference line pretty closely, so we can conclude that the errors are normally distributed.
In the diagnostic plots below, what does the Residual versus Quantile plot indicate about the model? a. that the errors are normally distributed b. that the data set contains many influential observations c. that the model is inadequate because the spread of the residuals is less than the spread of the centered fit d. that the model is inadequate because patterns occur in the spread around the reference line
The row percentages indicate that the distribution of Survived changes when the value of Age changes. To see a possible association, you look at the row percentages. The percent of children who survived (52.29) is much higher than the percent of adults who survived (31.26)
In this crosstabulation table, what evidence indicates a possible association between the variables Age and Survived? a. The frequency statistics indicate that the values of each variable are equally distributed across levels. b. The row percentages indicate that the distribution of Survived changes when the value of Age changes. c. The column percentages indicate that most of the people on the boat were children.
all of the above All six steps are important for developing good regression models. You might need to perform some steps iteratively to produce the best possible model.
Which of the following is suggested for developing good regression models? a. getting to know your data by performing preliminary analyses b. identifying good candidate models c. checking and validating your assumptions using residual plots and other statistical tests d. identifying any influential observations or collinearity e. revising the model if needed f. validating the model with data not used to build the model g. all of the above h. a, c, and d only
A Pearson correlation coefficient is a measure of linear association.
Which of the following statements is/are true? a. A Pearson correlation coefficient is a measure of linear association. b. A nonsignificant p-value for a Pearson correlation means no relationship. c. A negative Pearson correlation indicates a low degree of linear association. d. A random cloud of data implies a negative correlation.
proc reg data=statdata.bodyfat2 plots(only)= (RSTUDENTBYPREDICTED(LABEL) COOKSD(LABEL) DFFITS(LABEL) DFBETAS(LABEL)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm / r influence; id Case; run; quit; Program a specifies the R and INFLUENCE options, which request diagnostic statistics.
Which of these programs requests diagnostic statistics as well as diagnostic plots? a. proc reg data=statdata.bodyfat2 plots(only)= (RSTUDENTBYPREDICTED(LABEL) COOKSD(LABEL) DFFITS(LABEL) DFBETAS(LABEL)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm / r influence; id Case; run; quit; b. proc reg data=statdata.bodyfat2 plots(only)= (QQ RESIDUALBYPREDICTED RESIDUALS); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm; id Case; run; quit; c. both of the above
DRUGDOSE*DISEASE Interaction effects can also be coded as DRUGDOSE | DISEASE
Which one of the predictor variable(s) produces an interaction effect? PROC GLM DATA=STAT1.DRUG; CLASS DRUGDOSE DISEASE; MODEL BLOODP=DRUGDOSE DISEASE DRUGDOSE*DISEASE; FORMAT DRUGDOSE DOSEF.; RUN; QUIT; TITLE; A. CLASS DRUGDOSE DISEASE B. DRUGDOSE*DISEASE C. BLOODP=DRUGDOSE
The variance inflation factors indicate that collinearity is present in the model. Several variance factors are above 10 (Abdomen, Weight, Height, Chest, Hip,Density, Adiposity, and FatFreeWt). This indicates that collinearity among the predictor variables is present in the model.
View this PROC REG output. What does the output indicate about the model? a. The p-value for the overall model is not significant. b. The model does not fit the data well. c. The p-values for the parameter estimates indicate that collinearity is present in the model. d. The variance inflation factors indicate that collinearity is present in the model. e. none of the above
Several observations exceed the cutoff values, so these observations might be influential. The gray horizontal lines mark the +2 and -2 cutoff values of the RSTUDENT residuals. Several observations fall outside these lines, so these observations might be influential.
View this plot of RSTUDENT residuals versus predicted values of PctBodyFat2. What does it indicate? a. The model does not fit the data well. b. The residuals have a cyclical shape, so the independence assumption is being violated. c. Several observations exceed the cutoff values, so these observations might be influential. d. none of the above
within-group sample means The one-way ANOVA model states that the dependent variable is equal to a within-group population mean (mu_i) plus a deviation from the population mean. The within-group population mean (mu_i) is estimated by using the within-group sample mean (ybar_i).
What are the 'predicted values' that result from fitting a one-way analysis of variance (ANOVA) model? A. within-group sample variances B. between-group sample variances C. within-group sample means D. between-group mean differences
Sums of Squares you obtain from fitting the effects in the order listed within the model coding.
What does the Type I SS represent? A. Sums of Squares you obtain from fitting the effects in the order listed within the model coding. B. Sums of Squares from fitting each effect after all the other terms in the model. SS of effects corrected after the initial terms have been fitted. (Adjusted)
AIC uses 2p + n +2 The model with the smaller information criterion is considered to be better.
What is AIC(Akaike's information criterion) selection method produce? A. Produces Akaike's information criterion that searches for the most parsimonious model(selects the model that minimizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model. B. Produces Akaike's information criterion that searches for the largest model(selects the model that maximizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model.
both of the above An influential observation is an observation that strongly affects the linear model's fit to the data. If the influential observation weren't there, the best fitting line to the rest of the data would most likely be very different.
What is an influential observation? a. an unusual observation that can sometimes have a large residual compared to the rest of the points b. an observation so far away from the rest of the data that it influences the slope of the regression line c. both of the above d. neither of the above
All of the y values are the same. All of the y mean values are the same. However, if the y values are all the same, then there might be problems in your data collection or measurement.
What is the visual cue in a scatter plot that there is no association between the response variable and the explanatory variables? a. All of the y values fall on a straight line. b. None of the y values fall on a straight line. c. All of the y mean values fall on a straight line. d. None of the y mean values fall on a straight line. e. All of the y values are the same. f. All of the y mean values are the same.
a table of correlations and a scatter plot matrix with histograms along its diagonal By default, PROC CORR produces a table of correlations (which can be a correlation matrix, depending on your program). The NOSIMPLE option suppresses printing of the simple descriptive statistics for each variable, and PLOT=MATRIX requests a scatter plot matrix instead of individual scatter plots. The HISTOGRAM option displays histograms of the variables in the VAR statement along the diagonal of the scatter plot matrix.
What output does this program produce? proc corr data=statdata.bodyfat2 nosimple plots=matrix(nvar=all histogram); var Age Weight Height; run; a. individual correlation plots and simple descriptive statistics b. a scatter plot matrix only, with histograms along its diagonal c. a table of correlations and a scatter plot matrix with histograms along its diagonal d. can't tell from the information given
Pearson Chi-Square Pearson Chi-Square is Affected by sample size meaning that if the sample size doubles or triples... then the statistic value increases by doubles or triple.
What test performs a formal test of association between two categorical variables...and produces the difference between Observed frequencies and Expected frequencies which results in a PValue? A. Hocking's Criterion B. Pearson Chi-Square
Cramer's V Cramer's V is also not affected by sample size meaning that if the sample size increases, the statistic does not.
What test performs a formal test of association between two categorical variables...and produces the difference between Observed frequencies and Expected frequencies which results in a statistic between -1 and 1 for small 2 by 2 tables or a range between 0 and 1 for larger 2 by 2 tables? A. Cramer's V B. Pearson Chi-Square C. Wald
Within Group Means
When calculating Sums of Squares, what does SSE produce? A. Between Group Variation B. Within Group Means C. Total Variation
0.50
With a fair coin, your probability of getting heads on one flip is 0.5. If you flip a coin once and got heads, what is the probability of getting heads on the second try? a. 0.50 b. 0.25 c. 0.00 d. 1.00 e. 0.75
0.75
With a fair coin, your probability of getting heads on one flip is 0.5. If you flip a coin twice, what is the probability of getting at least one head out of two? a. 0.50 b. 0.25 c. 0.00 d. 1.00 e. 0.75
When you want to look at the effect of one variable at different slices or value ranges of another variable.
Within PROC PLM, what does the SLICE statement do when your model contains an interaction? A. Segments the model into multiple components. B. When you want to look at the effect of one variable at different slices or value ranges of another variable.
the assumption of equal variances You use Levene's Test for Homogeneity in PROC GLM to verify the assumption of equal variances in a one-way ANOVA model.
You can examine Levene's Test for Homogeneity to more formally test which of the following assumptions? a. the assumption of errors being normally distributed b. the assumption of independent observations c. the assumption of equal variances d. the assumption of treatments being randomly assigned
proc logistic data=MYDIR.EMPLOYMENT descending; class education (param=ref ref='3'); model Hired = Salary Education; run; PROC LOGISTICS gives odds rations compared against the last class level and parameter using effect coding.
A Human Resource manager fits a logistic regression model with the following characteristics: binary target HIRED continuous predictor SALARY categorical predictor EDUCATION (levels=1,2,3) The default odds ration compares each level for the variable EDUCATION. Which SAS program gives parameter estimates for EDUCATION that are consistent with the default odds ration? A. proc logistic data=MYDIR.EMPLOYMENT descending; class education (param=ref ref='3'); model Hired = Salary Education; run; B. proc logistic data=MYDIR.EMPLOYMENT descending; class education; model Hired = Salary Education; run; C. proc logistic data=MYDIR.EMPLOYMENT descending; class education salary (param=ref ref='3'); model Hired = Salary Education; run;
proc logistic data=MYDIR.CAMPAIGN descending; class Homeowner; model Respond=Income Homeowner; run; Homeowner Y/N is a character variable and therefore must be in a class statement.
A marketing manager attempts to determine those customers most likely to purchase additional products as the result of a nation-wide marketing campaign. The manager possesses a historical dataset (CAMPAIGN) of a similar campaign from last year. It has the following characteristics: target variable RESPOND (0,1) continuous predictor INCOME categorical predictor HOMEOWNER (Y,N) Which SAS program performs the analysis? A. proc logistic data=MYDIR.CAMPAIGN descending; class Homeowner; model Respond=Income Homeowner; run; B. proc logistic data=MYDIR.CAMPAIGN descending; by Homeowner; model Respond=Income Homeowner; run; A. proc logistic data=MYDIR.CAMPAIGN descending; model Respond=Income Homeowner; run; A. proc logistic data=MYDIR.CAMPAIGN descending; class Income Homeowner; model Respond=Income Homeowner; run;
Randomly Assigned, Constant
Along with the three original ANOVA assumptions of independent observations, normally distributed errors, and equal variances across treatments, you make two more assumptions when you include a blocking factor in the model. You assume that the treatments are _______________ within each block, and you assume that the effects of the treatment factor are _______________ across levels of the blocking factor. A. Normally Distributed , Constant B. Randomly Assigned, Constant
The model had the highest c statistic on the validation data. The c statistic measures model performance (area under the ROC curve), with higher values better. You need to use the validation data because the c statistic will show better fit on the training data with over-fit models.
An analyst compared manay different models to predict the binary Purchase var and selected one particular model. What rationale supports this decision? A. The model had the highest c statistic on the training data. B. The model had the highest c statistic on the validation data. C. The model had the lowest c statistic on the training data. D. The model had the lowest c statistic on the training data.
Adj R-Square Adj R-Sq is the proportion of variance in the response explained by the model, adjusting for the number of parameters. A larger model would not generally have a smaller R-Square, but might have a smaller Adj R-Sq.
An analyst has selected this model as a champion because it shows better model fit than a competing model with more predictors. Which statistic justifies this rationale? A. R-Square B. Coeff Var C. Adj R-Square D. Error DF
You calculate the residuals from ANOVA by taking each observation and subtracting its group mean.
How do you calculate residuals in ANOVA? A. Take each observation and subtract the group mean. B. Take the group means and subtract the total of all observations.
Large wrist size is significantly different than small wrist size. Large is significantly different than small due to bar exceeding decision limit. Medium does not exceed the limit. The different widths of decision limits is due to unbalance data and is not an issue.
Given alpha=0.02, which conclusion is justified regarding percentage of body fat, comparing small(S), medium (M) and large (L) wrist sizes? A. Medium wrist size is significantly different than small wrist size. B. Large wrist size is significantly different than small wrist size.
No The p-value of 0.2942 is greater than 0.05, so you fail to reject the null hypothesis and conclude that the variances are equal.
Given an PValue of 0.2942, is there suffiicent evidence to reject the assumption of equal variances? A. Yes B. No
Yes The p-value of <.001 is less than 0.05, so you would reject the null hypothesis and conclude that the means between the two groups are significantly different.
Given an PValue of <.001, is there suffiicent evidence to reject the assumption of equal variances? A. Yes B. No
-2 and 2
Given the properties of the standard normal distribution, you would expect about 95% of the studentized residuals to be between which two values? a. -3 and 3 b. -2 and 2 c. -1 and 1 d. 0 and 1 e. 0 and 2 f. 0 and 3
36.1680 and 52.3021 The CLI option, which displays the 95% CL Predict column in the Output Statistics table, produces confidence limits for an individual predicted value. In this table, the third observation, for Kate, contains the value 55 for Performance. Therefore, the values in her 95% CL Predict column are the lower and upper confidence limits for a new individual value at the same value of Performance. In contrast, the CLM option displays the values in the 95% CL Mean column, which are the lower and upper confidence limits for a mean predicted value for each observation.
Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value? a. 44.7500 and 44.2351 b. 40.1023 and 48.3678 c. 36.1680 and 52.3021 d. can't tell from the information given
The effect of one variable differs at different levels of another. A stratified plot of the effect of one variable against the dependent variable shows different patterns across different levels of the other.
How can you recognize an interaction? a. The effect of one variable differs at different levels of another. b. The joint effect of two variables is greater than the sum of their individual effects. c. A stratified plot of the effect of one variable against the dependent variable shows different patterns across different levels of the other. d. Two variables are talking at a party. e. One variable shows statistical significance without the other in the model, but no significance with the other variable in the model.
Add 1 to the Number in Model value because the value does not include the intercept.
How do you find P using Mallows' Cp within PROC REG? A. Subtract 1 from Number in Model value because the value already includes the intercept. B. Add 1 to the Number in Model value because the value does not include the intercept.
Use a CLASS statement The test of equal variance (Folded F Test) is produced by default whenever a two-sample t-test is requested. A two-sample t-test is requested using the CLASS statement in PROC TTEST.
How do you get PROC TTEST to display the test for equal variance? A. Use the option EV B. Use the MEANS statement with a HOVTEST option. C. Request a plot of the residuals D. Use a CLASS Statement
10
How many observations did you find that might substantially influence parameter estimates as a group? a. 0 b. 1 c. 4 d. 5 e. 7 f. 10
proc reg data=SASUSER.MLR; model y=x1-x4; run; B is correct. In A, the VAR statement is not available in PROC REG. In C, for MLR, all independent vars should be specified in the same model statement. In D, SOLUTION is not a valid option in PROC REG.
Identify the correct SAS program for fitting a multiple linear regression model with dependent variables(y) and four predictor variables(x1-x4). A. proc reg data=SASUSER.MLR; var y x1 x2 x3 x4; model y=x1-x4; run; B. proc reg data=SASUSER.MLR; model y=x1-x4; run; C. proc reg data=SASUSER.MLR; model y=x1; model y=x2; model y=x3; model y=x4; run; D. proc reg data=SASUSER.MLR; model y=x1 x2 x3 x4 /solution; run;
0
If you have 20 observations in your ANOVA and you calculate the residuals, to which of the following would they sum? a. -20 b. 0 c. 20 d. 400 e. Unable to tell from the information given
Unable to tell from the information given
If you have 20 observations in your ANOVA and you calculate the squared residuals, to which of the following would they sum? a. -20 b. 0 c. 20 d. 400 e. Unable to tell from the information given
one-way ANOVA You can use one-way ANOVA because you're comparing two samples, males and females. Another option is the two-sample t-test, which is a special case of one-way ANOVA. You can use two-way ANOVA when you have more than one predictor variable.
If you want to compare the average monthly spending for males versus females, which statistical method should you choose? a. one-sample t-test b. one-way ANOVA c. two-way ANOVA
Report the F value and possibly remove the blocking factor from future studies. Your only choice is to report the F value, and if you plan future studies, do not include the blocking variable. The blocking factor must be included in all ANOVA models that you calculate with the sample that you've already collected.
If your blocking variable has a very small F value in the ANOVA report, what would be a valid next step? a. Remove it from the MODEL statement and re-run the analysis. b. Test an interaction term. c. Report the F value and possibly remove the blocking factor from future studies.
provides separate estimates and significance tests for each model parameter.
In PROC REG, what does the parameter estimate table describe? A. provides the overall fit for the model B. provides separate estimates and significance tests for each model parameter(response variable).
Yes, you need to add 1 to the Number in Model value(4+1) to account for the intercept, then you evaluate the problem using the record values. Cp(4.0004)<=Number in Model(p(4+1=5).
In the PROC REG output, does the record (model index=1) meet Mallows' Cp? A. Yes, you need to add 1 to the Number in Model value(4+1) to account for the intercept, then you evaluate the problem using the record values. Cp(4.0004)<=Number in Model(p(4+1=5). B. No, the Number in Model(4) already includes the intercept which is less than Cp(4.0004) value.
cp because it is listed first.
In the PROC REG statement below, which of the three selection criterion sorts the order of the model? A. cp because it is listed first. B. adjrsq because it is the last selection method and re-sorts the order of the model as the last step in the order flow.
The estimated value of the response variable(oxygen consumption) when the predictor variable (runtime) is equal to 0.
In the Parameter Estimates table below, what does the value of 82.42494 in the Intercept variable indicate? proc reg data=fitness; model oxygen_consumption=runtime; title ' predicting oxygen consumption from runtime'; run; quit; A. The estimated value of the response variable(oxygen consumption) when the predictor variable (runtime) is equal to 0. B. Estimated value of the response variable when the x variable is not equal to zero.
The F Value indicates the ratio (Model Mean Square/Error Mean Square). The
In the REG ANOVA output, what does the F Value produce? A. Correlation Strength B. The F Value indicates the ratio (Model Mean Square/Error Mean Square).
the SCORE= option The SCORE= option specifies the data set that contains the parameter estimates. PROC SCORE reads the parameter estimates from this data set, scores the observations in the data set that the DATA= option specifies, and writes the scored observations to the data set that the OUT= option specifies.
In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations? proc score data=dataset1 score=dataset2 out=dataset3 type=parms; var Performance; run; a. the DATA= option b. the SCORE= option c. the OUT= option
Where p is the number of parameters in the model + intercept.
Mallows' CP for Parameter Estimation(Hocking's Criterion) Cp<=2p-(p(full)+1) which includes the number of parameters in the model (includes intercept). What makes a good candidate model? A. Where p is the number of parameters in the model + intercept. B. Where p is the number of parameters in the model Excluding the intercept.
The smallest model where CP<=p.
Mallows' CP for Prediction Cp<=p which includes the number of parameters in the model (includes intercept). What makes a good candidate model? A. The largest model where CP>=p and includes the intercept. B. The smallest model where CP<=p.
False
Predictor variables are assumed to be normally distributed in linear regression models. True False
the mean of y
Run PROC REG with this MODEL statement: model y=x1;. If the parameter estimate (slope) of x1 is 0, then the best guess (predicted value) of y when x1=13 is which of the following? a. 13 b. the mean of y c. a random number d. the mean of x1 e. 0
the odds of the event are 1.142 greater for each one thousand dollar increase in salary. This odds ration involves a one unit increase in salary. The units described are in 1000s of dollars.
Salary data are stored in 1000's of dollars. What is a correct interpretation of the estimate? A. the odds of the event are 1.142 greater for each one dollar increase in salary. B. the odds of the event are 1.142 greater for each one thousand dollar increase in salary. C. the probability of the event is 1.142 greater for each one dollar increase in salary.
model purchase = gender age region; model purchase = gender | age | region @1; B gives all interactions, up to three-way. D gives all interactions up to two-way.
Select the equivalent LOGISTIC procedure model statements (select two). A. model purchase = gender age region; B. model purchase = gender | age | region; C. model purchase = gender | age | region @1; D. model purchase = gender | age | region; @2
the equal variance assumption When a residuals plot displays a funnel shape, it indicates that the variance of the residuals is not constant. That is, the variance increases toward the wide end of the "funnel." This shows you that your model violates the equal variance assumption.
Suppose you have a residuals plot that shows a funnel shape for the residuals, such as in the plot below. Which assumption of linear regression is being violated? a. the linearity assumption b. the independence assumption c. both the linearity assumption and the independence assumption d. the equal variance assumption e. both the linearity assumption and the equal variance assumption
Larger
The SSM and SSE represent pieces of the total variability. If the SSM is _________than the SSE, you reject the null hypothesis that all of the group means are equal. A. Smaller B. Equal To C. Larger
False
The STEPWISE, BACKWARD, and FORWARD strategies result in the same final model if the same significance levels are used in all three. True False
Experimentwise, EER
The ________________ error rate, or _________, is the probability of making at least one Type I error when performing all of the pairwise comparisons. The _____________ increases as the number of pairwise comparisons increases. A. Experimentwise, EER B. Comparisonwise, CER
Constant variance, because the interquartile ranges are different in different ad campaigns. The interquartile range is closely related to variance, not normality. And the p-value is for testing equality of mean differences, not for testing normality or constant variance.
The box plot was used to analyze daily sales data following three different ad campaigns. The business analyst concludes that one of the assumptions of ANOVA was violated...Which assumption has been violated and why? A. Normality, because Prob >F<.0001. B. Normality, because the interquartile ranges are different in different ad campaigns. C. Constant variance, because Prob>F<.0001. D. Constant variance, because the interquartile ranges are different in different ad campaigns.
Comparisonwise, CER
The chance that you make a Type I error increases each time you conduct a statistical test. The _______________ error rate, or ________, is the probability of a Type I error on a single pairwise test. A. Experimentwise, EER B. Comparisonwise, CER
None of the above.
The correlation between tuition and rate of graduation at U.S. colleges is 0.55. What does this mean? a. The way to increase graduation rates at your college is to raise tuition. b. Increasing graduation rates is expensive, causing tuition to rise. c. Students who are richer tend to graduate more often than poorer students. d. None of the above.
the predicted values of the response when all predictors=0; The algebraic meaning of an intercept is the value when all predictors=0
The intercept estimate is interpreted as: A. the predicted values of the response when all the predictors are at their current values. B. the predicted values of the response when all predictors are at their means. C. the predicted values of the response when all predictors=0;
Linear
Variables are correlated if there is a positive or negative __________________ relationship. A. Linear B. Logistic
a two-sided t-test Because the cereal manufacturer is interested in determining whether the two processes produce a different mean cereal weight, he needs to perform a two-sided t-test.
The manufacturer for a cereal company uses two different processes to package boxes of cereal. He wants to be sure the two processes are putting the same amount of cereal in each box. He plans to perform a two-sample t-test to determine whether the mean weight of cereal is significantly different between the two processes. What type of test should he run? A. an upper-tailed t-test B. a two-sided t-test C. a lower tailed t-test
The errors are independent, normally distributed with zero mean and constant variance. Assumptions on errors are independence, normality, zero mean and constant variance.
The standard form of a linear regression model is: Y=B0+B1X+E Which statement best summarizes the assumptions placed on the errors? A. The errors are correlated, normally distributed with constant mean and zero variance. B. The errors are correlated, normally distributed with zero mean and constant variance. C. The errors are independent, normally distributed with zero mean and constant variance.
TTEST
To perform the two-sample t-test and the one-sided test, you can use PROC ________. You add the PLOTS option to the PROC _________ statement to control the plots that ODS produces. You add the SIDES=U or SIDES=L option to specify an upper or lower one-sided test. A. GLM B. GLMSELECT C. TTEST
-1.0414 In effects coding, 'L' will have design variables both at -1. Estimated logit will be intercept minus each of the other estimates. (-0.1310-0.2130-0.6974=-1.0414)
Using effect coding, what is the estimated logit for a person with large wrist size? A. -0.1310 B. -1.0414 C. 0.5664 D. 0.7794
Interaction Plots
What does plots(only)=intplot produce within a proc glm step? A. Integer Plots B. Interval Plots C. Interaction Plots
The Mean of the Square Error in the model that is explained using the dependent variable(s).
What does the Model Mean Square value of 633.01458 indicate? A. Strength of the model B. The Mean of the Square Error in the model that is explained using the dependent variable(s).
The proportion of total sum of squares accounted for by the model. Something to do with variability.
What does the R-Square value measure? a. The correlation between the independent and dependent variables. b. The proportion of total sum of squares accounted for by the model. c. Model sum of squares over error sum of squares. d. Something to do with variability.
Square Root of the Mean Square Error (MSE) in the ANOVA table. The Root MSE is a measure of the standard deviation of the predictor variable(oxygen consumption) at each value of the response variable(runtime).
What does the Root MSE calculate? A. Square Root of the Mean Square Error (MSE) in the ANOVA table. B. Root percentage of variance in the ANOVA table.
Sums of Squares from fitting each effect after all the other terms in the model. SS of effects corrected after the initial terms have been fitted. (Adjusted)
What does the Type III SS represent? A. Sums of Squares you obtain from fitting the effects in the order listed within the model coding. B. Sums of Squares from fitting each effect after all the other terms in the model. SS of effects corrected after the initial terms have been fitted. (Adjusted)
The number of levels or distinct field values/classifications(Ex, Fa, Gd, TA) Good for each type of predictor variable(Heating QC). The Predictor Variable Heating QC has four levels (Excellent, Fair, Good, Typical Average) so the degrees of freedom represent the 4 levels ( 1 level(Ex) + 3 additional options (Fa, Gd, Ta)).
What does the degress of freedom (DF) represent in the image below? A. The number of levels or distinct field values/classifications(Ex, Fa, Gd, TA) Good for each type of predictor variable(Heating QC). B. The number of degrees multiplied by the Season Sold factor.
AICC uses n(n + p) __________ n-p-2 The model with the smaller information criterion is considered to be better.
What is AICC selection method produce? A. Produces Corrected Akaike's information criterion that searches for the most parsimonious model(selects the model that minimizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model. B. Produces Corrected Akaike's information criterion that searches for the largest model(selects the model that maximizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model.
BIC 2(p + 2)q - 2q(2) Squared The model with the smaller information criterion is considered to be better.
What is BIC selection method produce? A. Produces Sawa Bayesian information criterion that searches for the most parsimonious model(selects the model that minimizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model. B. Produces Sawa Bayesian information criterion that searches for the largest model(selects the model that maximizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model.
SBC plog (n) The model with the smaller information criterion is considered to be better.
What is SBC selection method produce? A. Produces Schwarz Bayesian information criterion that searches for the most parsimonious model(selects the model that minimizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model. B. Produces Schwarz Bayesian information criterion that searches for the largest model(selects the model that maximizes unexplained variablility with as few effects(predictor variables) as possible). It does this by adding a penalty that represents complexity of the model.
Only cases with variables that are fully populated are used. PROC LOGISTIC ignores observations with missing data.
What is the default method in the LOGISTIC procedure to handle observations with missing data? A. Missing data is imputed B. Parameters are estimated accounting for the missing values. C. Parameter estimates are made on all available data. D. Only cases with variables that are fully populated are used.
H0: u=u0 H0: u-u0=0
What is the null hypothesis for a one-sample TTEST? A. H0: u=u0 B. H0: u0=0 C. H0: u-u0=0 D. H0: u0-0=0
Between Group Variation
When calculating Sums of Squares, what does SSM produce? A. Between Group Variation B. Within Group Means C. Total Variation
Total Variation
When calculating Sums of Squares, what does SST produce? A. Between Group Variation B. Within Group Means C. Total Variation
No, because they are significant to the interaction effects.
When the Pvalue of the Type III SS is significant, do you remove the predictor variables that do not have a significant Pvalue? A. Yes, as they will cloud the interaction values. B. No, because they are significant to the interaction effects.
The simplest model with the best performance on the validation data
When using honest assessment, which of the following would be considered the best model? a. The simplest model with the best performance on the training data b. The simplest model with the best performance on the validation data c. The most complex model with the best performance on the training data d. The most complex model with the best performance on the validation data
model Health=Drug Disease Drug*Disease; In the MODEL statement, you first specify the main effect variables as they exist in the two-way ANOVA model. You then define the interaction term by separating the two main effect variables with an asterisk in the MODEL statement.
When you perform a two-way ANOVA in SAS, which of the following statements correctly defines the model that includes the interaction between the two main effect variables? a. class Drug*Disease; b. class Drug=Disease; c. model Drug*Disease; d. model Health=Drug Disease Drug*Disease;
proc GLMSELECT data=SASUSER.MLR; model y=x1-x10 /selection=backward select=bic; run; When specifying backward selection criteria, it is done with the SELECTION=option. The choice of BIC as the criteria is conveyed using SELECT=.
Which SAS program will correctly use backward elimination with BIC selection criteria within the GLMSELECT procedure? A. proc GLMSELECT data=SASUSER.MLR; model y=x1-x10 /selection=backward select=bic; run; B. proc GLMSELECT data=SASUSER.MLR; model y=x1-x10 /selection=backward choose=bic; run; C. proc GLMSELECT data=SASUSER.MLR; model y=x1-x10 /select=backward selection=bic; run;
Records with the Model in Number value of 4 because they satisfy Mallows' CP criterion and contain the fewest parameter estimates.
Which of the first 5 records would make the best candidate(s) models with the fewest parameters when evaluating Mallows' CP? A. Records with the Model in Number value of 5 because they include the most parameter estimates which would make the strongest model candidate. B. Records with the Model in Number value of 4 because they satisfy Mallows' CP criterion and contain the fewest parameter estimates.
None of the above
Which of the following affects alpha? A. The p-value of the test B. The sample size C. The number of Type I errors. D. All of the above E. Answers A and B only F. None of the above
None of the above
Which of the following assumptions does collinearity violate? a. Independent errors b. Constant variance c. Normally distributed errors d. None of the above
STUDENT residuals You can use STUDENT residuals to detect outliers. To detect influential observations, you can use RSTUDENT residuals and the DFFITS and Cook's D statistics.
Which of the following can you use to detect outliers? a. DFFITS statistics b. Cook's D statistics c. STUDENT residuals d. RSTUDENT residuals
all of the above All of these statements are available for use within PROC PLM for postprocessing. Recall that this postprocessing will be performed using the item store.
Which of the following is available for use in postprocessing within PROC PLM? a. LSMEANS b. LSMESTIMATE c. SLICE d. all of the above
The observations are dependent. In an ANOVA model, you assume that the errors are normally distributed for each treatment, the errors have equal variances across treatments, and the observations are independent. When you add a blocking factor to your ANOVA model, you also assume that the treatments are randomly assigned within each block and that the effects of the treatment are the same within each block.
Which of the following is not an assumption you make when including a blocking factor in an ANOVA randomized block design? a. The treatments are randomly assigned within each block. b. The errors are normally distributed. c. The effects of the treatment factor are constant across the levels of the blocking variable. d. The observations are dependent.
ods output RSTUDENTBYPREDICTED=Rstud COOKSDPLOT=Cook DFFITSPLOT=Dffits DFBETASPANEL=Dfbs; proc reg data=statdata.bodyfat2 plots(only)= (RSTUDENTBYPREDICTED(label) COOKSD(label) DFFITS(label) DFBETAS(label)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm; id Case PctBodyFat2 Abdomen Weight Wrist Forearm; title; run; quit; Program b is almost correct, but the images must be created for the data sets to be saved. Program c tells SAS to create the images and save them into their own data sets.
Which program correctly saves statistics from the OutputStatistics ODS output object to an output data set? a. proc reg data=statdata.bodyfat2; PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm / r influence; id Case PctBodyFat2 Abdomen Weight Wrist Forearm; run; quit; b. ods output RSTUDENTBYPREDICTED=Rstud COOKSDPLOT=Cook DFFITSPLOT=Dffits DFBETASPANEL=Dfbs; proc reg data=statdata.bodyfat2 plots=none; PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm; id Case PctBodyFat2 Abdomen Weight Wrist Forearm; title; run; quit; c. ods output RSTUDENTBYPREDICTED=Rstud COOKSDPLOT=Cook DFFITSPLOT=Dffits DFBETASPANEL=Dfbs; proc reg data=statdata.bodyfat2 plots(only)= (RSTUDENTBYPREDICTED(label) COOKSD(label) DFFITS(label) DFBETAS(label)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm; id Case PctBodyFat2 Abdomen Weight Wrist Forearm; title; run; quit; d. ods output outputstatistics; proc reg data=statdata.bodyfat2 plots(only)= (RSTUDENTBYPREDICTED(LABEL) COOKSD(LABEL) DFFITS(LABEL) DFBETAS(LABEL)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm; id Case PctBodyFat2 Abdomen Weight Wrist Forearm; run; quit;
SBC
Which selection method uses this formula? plog (n) A. AIC B. AICC C. BIC D. SBC
BIC
Which selection method uses this formula? 2(p + 2)q - 2q(2) Squared A. AIC B. AICC C. BIC D. SBC
AIC
Which selection method uses this formula? 2p + n + 2 A. AIC B. AICC C. BIC D. SBC
AICC
Which selection method uses this formula? n(n + p) __________ n-p-2 A. AIC B. AICC C. BIC D. SBC
Gender should not be removed due to its involvement in the significant interaction. Gender cannot be removed due to its involvement in the significant interaction.
Which statement is correct at an alpha level of 0.05? A. School*Gender should be removed because it is non-significant. B. Gender should not be removed because it is non-significant. C. School should be removed because it is significant. D. Gender should not be removed due to its involvement in the significant interaction.
F test in the ANOVA table.
Which statistic(s) is/are used to test the null hypothesis that all regression slopes are zero, against the alternative hypothesis that they are not all zero? a. F test in the ANOVA table. b. F test in the Regression table. c. The Global t-test in the parameter estimates table. d. R square e. Adjusted R square
R square
Which value tends to increase (can never decrease) as you add predictor variables to your regression model? a. R square b. Adjusted R square c. Mallows' Cp d. Both a and b e. F statistic f. All of the above
You use the F-test for equality of variances to evaluate the assumption of equal variances in the two populations. You calculate the F statistic, which is the ratio of the maximum sample variance of the two groups to the minimum sample variance of the two groups. If the p-value of the F-test is greater than your alpha, you fail to reject the null hypothesis and can proceed as if the variances are equal between the groups. If the p-value of the F-test is less than your alpha, you reject the null hypothesis and can proceed as if the variances are not equal.
You use the _-test for equality of variances to evaluate the assumption of equal variances in the two populations. A. F B. T
7 The number of all possible subset models that can be created using k predictor variables is 2^k. One of those models is an intercept-only model(the null model) and SAS does not assess that model when using the options SELECTION=ADJRSQ in PROC REG. Therefore, the total number of subset models assessed is 2^k-1. In this case, where k=3, 2^3-1=7. 2*2*2-1
proc reg data=sashelp.fish; model weight=length1 height width / selection=adjrsq; run; How many possible subset models will be assessed by SAS? A. 3 B. 4 C. 6 D. 7