SAS
A best practice in atwo-way ANOVA is to plot the data to identify possible interactions between the variables. Which statement is true when you consider an interaction plot?
An interaction occurs when the difference between group means of one variable changes at different levels of another variable. This causes non-parallel lines in the interaction plot.
When you perform a two-way ANOVA in SAS, which of the following statements correctly defines the model that includes the interaction between the two main effect variables?
model Health=Drug Disease Drug*Disease;
Given the following output, is there sufficient evidence to reject the assumption of equal variances? Levene's Test for Homogeneity of Weight VarianceANOVA of Squared Deviations from Group MeansSourceDFSum ofSquaresMeanSquaresF ValuePr > FBrand19.237E-79.237E-71.120.2942Error780.0000658.283R-7
no
If you want to compare the average monthly spending for teenagers, adults, and senior citizens, which statistical method should you choose?
one-way ANOVA
Which of the following PROC GLMSELECT steps splits the original data set into a training data set that contains 80% of the original data and a validation data set that contains 20% of the original data?
proc glmselect data=housing; class fireplace lot_shape; model Sale_price = fireplace lot_shape; partition fraction(test=0 validate=.20); run;
Suppose you ran a PROC GLMSELECT step that saved the context and results of the statistical analysis in an item store named homestore. Which of the following programs scores new observations in a data set named new and saves the predictions in a data set named new_out?
proc plm restore=homestore;score data=new out=new_out;run;
With a large enough data set, observations can be divided into three subset data sets for use in honest assessment. Which of the following is not the name of one of these three subset data sets?
score
In a PROC FREQ step, which of the following statements creates a frequency table for Country, a frequency table for Size, and a crosstabulation table for Country by Size?
tables Country Size Country*Size;
Suppose you're testing for an association between student ratings of teachers and student grades. The Rating variable has the values 1 (for poor), 2 (for fair), 3 (for good), and 4 (for excellent). The Grade variable has the values A, B, C, D, and F.
tables Rating*Grade / chisq measures;
You can examine Levene's test for homogeneity to more formally test which of the following assumptions?
the assumption of equal variances
Suppose you have a residuals plot that shows a funnel shape for the residuals. Which assumption of linear regression is being violated?
the equal variance assumption
The location and spread of a normal distribution depend on the value of which two parameters?
the mean (μ) and the standard deviation (σ)
How do you define the term power?
the measure of the ability of the statistical hypothesis test to reject the null hypothesis when it is actually false
Honest assessment might generate multiple candidate models that have the same (or nearly the same) validation assessment values. In this situation, which model should be selected?
the most parsimonious model
In general, you can say that a model fits the data well when the values of which of the following are higher?
the percentage of concordant pairs
In the simple linear regression model, what does β1 represent?
the slope parameter
Which of the following does PROC GLMSELECT use to select a model from the candidate models when a validation data set is provided?
the smallest overall validation average squared error
Which of the following phrases describes the model sums of squares, or SSM, in one-way ANOVA?
the variability between the groups
-----------------------------------
-
------------------------------------
-
-------------------------------------
-
-----------------------------------------------------------------
-
-------------------------------------------------------------------
-
---------------------------------------------------------------------
-
Which of the following statements describes a positive linear relationship between two variables? The more I eat, the less I want to exercise. The more salty snacks I eat, the more water I want to drink. No matter how much I exercise, I still weigh the same.
2 only
Suppose you're analyzing the relationship between hot dog ingredients and taste. Which of the following statistics provides evidence of a relatively strong association between the variables Type (which has the values Beef, Meat, and Poultry) and Taste (which has the values Bad and Good)?
A Cramer's V statistic that is close to 1
Which PROC TTEST option would you use to change the confidence level in a confidence interval plot?
ALPHA=
Which statement about the backward elimination method is false?
All main effects and interactions that remain in the final model must be significant.
Which of the following statements is true about information criteria such as AIC, AICC, BIC, and SBC?
All of the above
In predictive modeling, your goal is to create the best possible model to score new data. The model you choose should be which of the following?
It should be flexible enough to fit the training data well but generalize to new data sets.
According to the goodness-of-fit statistics shown in the table below, which multiple logistic regression model would be the best to use? StatisticModel 1Model 2Model 3AIC501.5520.4501.5SC501.5520.4501.5c0.6750.6750.655
Model 1
You want to use PROC PLM to analyze the item store named mystore which was created in the stat1 library. Which of the following statements uses the correct syntax?
PROC PLM RESTORE=stat1.mystore;
When you add predictor variables to a model, which of the following values tend to increase or stay the same (and can never decrease)?
R-square
A multiple regression analysis shows that at least one slope in the regression of the population is not 0, and at least one predictor variable explains a significant amount of variablility in the response variable. What should you do?
Reject the null hypothesis.
Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step? StepVariableEnteredVariableRemovedNumberVars InPartialR-SquareModelR-SquareC(p)F ValuePr > F 1 RunTime10.74340.74343.343284.00<.0001 1 Age20.02130.76472.81922.54<.1222
STEPWISE
Which of the following can you use to detect outliers?
STUDENT residuals
Examine this plot of RSTUDENT residuals versus predicted values of PctBodyFat2. What does it indicate?
Several observations exceed the cutoff values, so these observations might be influential.
Given the following PROC REG output and assuming a significance level of 0.05, which of the following statements is true? Select all that apply. Analysis of VarianceSourceDFSum ofSquaresMeanSquaresF ValuePr > FModel1119.72668119.726682.000.1585Error2501495959.83716Corrected Total25115079 Root MSE7.73545R-Square0.0079Dependent Mean18.93849Adj R-Sq0.0040Coeff Var40.84511 Parameter EstimatesVariableDFParameterEstimateStandardErrort ValuePr >|t|Intercept132.165429.363503.440.0007Height1-0.188560.13330-1.410.1585
The model explains less than 1% of the variation in the response variable.
Select the statement below that incorrectly interprets a 95% confidence interval (15.02, 15.04) for the population mean, if the sample mean is 15.03 ounces.
The probability is .95 that the true average weight is between 15.02 and 15.04 ounces.
Which statement about binary logistic regression is false?
The response variable can have more than two levels if one of the levels is coded as 0.
This table shows frequency statistics for the variables Country and Size from a data set that contains data about people and the cars that they drive. What evidence in the table indicates a possible association?
The row percentages indicate that the distribution of size changes when the value of country changes.
Which of the following is not an assumption in a two-way ANOVA?
The sample is large.
A bank manager is concerned that the percent of loans processed that contain errors has increased above the acceptable amount of 1%. A significance test is conducted to test his concern (H0: loan error rate<=0.01, Ha: loan error rate>0.01). The manager concludes that the rate is indeed above 1%, when in reality it is not. What type of error did the manager make?
Type I
Consider a table of individual effects. Which statistic adjusts for all other effects in the table?
Type III sum of squares
Which of the following statements about scoring is true?
When you score data, you apply the score code (the equations obtained from the final model) to the scoring data.
Which of the following statements is true about the SEED= option in PROC GLMSELECT?
You can reproduce your results if you specify an integer that is greater than zero in the SEED= option and then rerun the code using the same SEED= value
Based on the following correlation matrix, what type of relationship do Performance and RunTime have? Pearson Correlation Coefficients, N = 31Prob > |r| under H0: Rho=0PerformanceRunTimeAgePerformance1.00000-0.82049-0.71257<.0001<.0001Error-0.820491.000000.19523<.00010.2926Corrected Total-0.712570.195231.00000<.00010.2926
a fairly strong, negative linear relationship
What output does the following program produce? proc corr data=stat1.bodyfat2 nosimple plots(only)=scatter(nvar=all); var Age Weight Height; run;
a table of correlations and individual scatter plots for each variable in the VAR statement
Which of these programs requests diagnostic statistics as well as diagnostic plots?
a. proc reg data=stat1.bodyfat2 plots(only)= (RSTUDENTBYPREDICTED(LABEL) COOKSD(LABEL) DFFITS(LABEL) DFBETAS(LABEL)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm / vif; id Case;run;quit;
When you interpret p-values from models that are chosen using any automated variable selection technique, which of the following should you be cautious about?
all of the above
Which of the following is suggested for developing good regression models?
all of the above
What is an influential observation?
an observation so far away from the rest of the data that it influences the slope of the regression line
Which of the following is not a characteristic of predictive modeling?
answers the question "How is X related to Y?"
Predictive models can be based on which of the following?
both parametric and non-parametric models
Which program correctly saves statistics from the output statistics ODS output object to an output data set?
c. ods output RSTUDENTBYPREDICTED=RstudCOOKSDPLOT=CookDFFITSPLOT=DffitsDFBETASPANEL=Dfbs;proc reg data=stat1.bodyfat2 plots=(RSTUDENTBYPREDICTED(label)COOKSD(label)DFFITS(label)DFBETAS(label)); PREDICT: model PctBodyFat2 = Abdomen Weight Wrist Forearm;id Case PctBodyFat2 Abdomen Weight Wrist Forearm;title;run;quit;
If you're trying to understand the relationship between age, weight, and running time, what is your goal?
explanatory analysis
Collinearity decreases the variance of the parameter estimates, and increases the prediction error of the model.
false
Dunnett's method compares all possible pairs of means
false
In forward selection, after a variable is added to the model, it can be removed if it becomes non-significant later.
false
The adjusted R-square increases for every term that is added to the model.
false
You know you'll need to do postprocessing analysis, so you use the statement below to create an item store. Later, you can start a new SAS session and perform additional analysis on the item store, proj1results. STORE OUT=proj1results;
false
Suppose you want to investigate the relationship between two different high schools and the student's interests in school. The variable School indicates the name of each school. The variable Focus identifies each student's main focus in school as Grades or Sports.
model Focus(event='Sports')=School;
A sample from a population should be which of the following?
representative
In the diagnostic plots below, what does the Residual versus Quantile plot indicate about the model?
that the errors are normally distributed
Given the following SAS output, is there sufficient evidence to reject the hypothesis of equal means? SourceDFSum ofSquaresMeanSquaresF ValuePr > FBrand10.030338160.0303381651.02<.001Error790.046384420.00059467Corrected Total800.07672257
yes
View this PROC REG output. What does the output indicate about the model?
The variance inflation factors indicate that collinearity is present in the model.
A department store is deploying a chosen model to make predictions for an upcoming sales period. They have the necessary data and are ready to proceed. Which of the following methods can be used for scoring?
any of the above
Suppose you want to fit a multiple logistic regression model to determine which of two rehabilitation programs is more effective. The categorical response variable Relapsed (Yes or No) indicates whether study participants stayed clean after one year. The categorical predictor variables are Program (1 or 2) and Gender (Male or Female). Age is a continuous predictor variable. Assume that you want to use reference cell coding with the default reference levels.Which of the following CLASS statements correctly completes the PROC LOGISTIC step for this analysis? proc logistic data=program.rehabilitation; _______________________________ model Relapsed (event='Yes') = Program | Gender | Age @2; run;
class Program(param=ref ref='2') Gender(param=ref ref='Male');
A CLASS statement is required in a two-sample t test
true
The standard error measures the variability associated with the sample mean, x̄.
true
To reject a test with Student's t statistic, the t statistic should be far from 0 and have a small corresponding p-value.
true