BAS 320 Conceptual Midterm

¡Supera tus tareas y exámenes ahora con Quizwiz!

In general there are two types of predictions.

- Average value of Y among all entities with the same value of X. - Value of Y for one entity with some value of X.

What number can summarize the typical value of Y?

- average (mean) - median

The following is a small set of data that has been collected X = time on site (minutes). Y = cart value. 3 43.31 14 12.44 22 54.90 23 52.10 29 67.33 33 68.54 When the permutation procedure is run, a "permuted dataset" is created. Mark all options that contain possible collections of Y values in the permuted dataset.

- c(43.31, 12.44, 54.90, 52,10, 67.33,68.54) - c(67.33, 68.54, 52.10, 43.31, 12.44, 54.90)

If you make a scatterplot, what features indicate that Spearman's rank correlation MUST be used (mark all that apply)

- curvature of the stream of points (monotonic nonlinearity) - extreme outliers - heteroscedasticity (vertical spread in y varies depending on x)

There are multiple ways of creating a data frame ("spreadsheet") in R. Select all commands that can be used to create/import data frames into R.

- data.frame() - read.csv() - read.table() - load() (selecting an .RData file) - data() (for a pre-existing dataset in R) - library() then data() (for datasets that live in other packages)

When using read.csv() to read in a data file, you should specify the additional argument stringsAsFactors=TRUE or stringsAsFactors=FALSE. What are the consequences of each decision? Mark all that apply.

- stringsAsFactors=TRUE will treat text column as categorical variables (you can convert a column to text later by redefining it using the left-arrow along with the as.character() function) - stringsAsFactors=FALSE will keep text column as text (you can convert a column to a categorical variable later by redefining it using the left-arrow along with the factor() function)

What is the expected count of Females who sit in the Middle assuming no association between Location and Gender?

131.8

What is the Reduction in SSE? Report your answer to 1 decimal place (e.g. 3.2) Analysis of Variance Table Response: CollegeGPA Df Sum Sq MeanSq . F value Pr(>F) HSGPA 1. 19.7 19.7 99.641 < 2.2e-16 *** Residuals 605 105.8

19.7

What is the 10th element in the sequence seq( from=100, by=20, length=25)?

280

What is the third element in the vector c( 8, 6, 9)?

9

Which is the marginal distribution for Location?

B 18%, F 37%, M 40%, V 4%

A data set containing information on Tips, Bill Amount, Party Size and Smokers is being analyzed. The R output below as run to determine if there was an association between "Size_of_Party" and being a Smoker (Yes, No). It was pre-determined to test for a difference in means. Can statistical significance be determined?

Cannot determine. Need more iterations.

If we ask a class of students to tell us their favorite pet, we will be collecting a ________ variable.

Categorical

When studying association between a Categorical variable and a Quantitative variable, we assign the role of X to be the __________________ variable and Y to be the __________________ variable.

Categorical; Quantitative

For the following scenario, which variable is the "x" variable? Color of background vs Time spent on Website

Color of background

A vector x <- c( 3, 8, 9) has

Elements 3, 8, and 9 and positions 1, 2, and 3

A not statistically significant association can have practical significance.

False

If a test for association has a p-value of .32, the association is statistically significant.

False

If a test for association has a p-value of .32, the association is still considered to have practical significance.

False

It's desired to add a new column named "PricePerSqFt" which is the price column divided by the sqft_living column. Which command will do this?

HOUSE$PricePerSqFt <- HOUSE$price / HOUSE$sqft_living

Based on the scatterplot, which measure of association would be the most appropriate for describing and assessing the relationship between the following numeric variables.

Neither Pearson or Spearman

For the following scenario, which variable is the "x" variable? Waiting Time in line vs. Number of open cashiers

Number of open cashiers

Is Pearson's correlation ok for measuring the strength of this relationship or should Spearman's rank correlation be used instead?

Pearson (point cloud is elliptical and no outliers)

Based on the scatterplot, which measure of association would be the most appropriate for describing and assessing the relationship between the following numeric variables.

Pearson's R Correlation

The p-value for a statistical test can be found by the ________________________.

Permutation procedure

The command which( x > 5) will return

Positions of elements in vector x that are greater than 5.

For the following scenario, which variable is the "Y" variable? Price of item vs. Probability of purchasing

Probability of purchasing

We use ________________ to asses Normality of a variable.

QQ Plot

What is the typical miss the model makes when making predictions?

RMSE

What is our goal with regression, with respect to SSE (sums of squares error)?

Reduce it

when you are instructed "Print the contents of `DF` to the screen" what does that mean?

Run DF as a command so its contents are shown in the Console/Rmd

Based on the scatterplot, which measure of association would be the most appropriate for describing and assessing the relationship between the following numeric variables.

Spearman's Rank Correlation

Why can't Spearman's rank correlation be used to describe/analyze the following association?

The curvature in the stream is non-monotonic so Spearmans' can't be used

F statistics = 7

The observed variance in group averages is 7 times larger then what was expected if there was no association.

The association between y and x is to be studied. The scatterplot showed quite a few outliers. The associate() command was run and the following output was obtained. What do you conclude about the statistical significance of the association?

The test is inconclusive; it's necessary to re-run associate() and add in the permutations= argument and give it a number much greater than the default of 500

If a test for association has a p-value of .003, the association is statistically significant.

True

R2 computed from the simple linear regression gives the information about the percentageof the variability in y that can be attributed to the differences in x.

True

To find what it the fifth element of vector V we write ___

V[5]

When we are interpreting the 95% Confidence Interval we say:

We are 95% confident that the true value of population average for a given value of X is inside the interval.

p-value of ANOVA test is 0.04.

We cannot conclude anything about whether the association is statistically significant until we look at the 95% CI for p-value.

95% Confidence Interval for p-value is (0.03, 0.06).

We need to increase the number of permutations and rerun the test.

The Standard error of the predicted average is _____________ the Standard error of the prediction.

always lower than

Distributions of Y for each level of X are roughly symmetric with no extreme outliers. We can compare ________ .

averages (means)

The center of the interval when calculating a confidence interval for the slope is ________________.

b1

The command levels(colors) returns "dog", "cat", "bird", then all code lines besides one below are acceptable.

colors[1] <- "turtle" (not acceptable)

We say that the test is inconclusive when the 95% Confidence Interval for p-value is

contains 0.05

Making predictions outside X variable range is an example of

extrapolation

head()

get the first few rows

tail()

get the last few rows

ncol()

get the number of columns

nrow()

get the number of rows

dim()

get the number of rows and columns

Distributions of Y for each level of X are noticeably skewed and/or have extreme outliers. We can compare ________ .

medians

To find the Confidence Interval limits we can use R function ____

predict()

view()

see the data in uneditable spreadsheet form

A larger standard error will increase the width of a confidence interval.

true


Conjuntos de estudio relacionados

NUR 205 Ch 30 Nursing Management: Diabetes Mellitus

View Set

AGRY 105 Exam 3 (Lectures 25-37)

View Set

20 Telephone Consumer Protection Act (TCPA)

View Set

Conceptual Physics, 11E (Hewitt) Chapter 3 Linear Motion Multiple Choice

View Set

Lesson 2.7: Money Markets and Bank Accounts

View Set

1. Spinal nerve. Dorsal rami of the spinal nerves.

View Set

PHYSICS CH 6: Interaction of sound and media

View Set

CRJU 301 Final CH's 1-11 Quizzes

View Set