Chapter 4

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

How do we figure out the boundary for outliers that are too low

- [1.5 * (Q3 - Q1)]

If you wanted to explore a situation in which the explanatory variable is categorical, which of the following is/are true?

-A quantitative outcome variable would lead you to use a faceted histogram. -A categorical outcome variable would lead you to use a tally.

Which of the following are quantitative variables? (Check all that apply.)

-CognitionZscore -Happiness

The big box with the thick line depicts the

-Data that fall between Q3 and Q1 cut points -Data that fall between the 75th percentile and the 25th percentile -The middle 50 percent of data points -Data that are in the 2nd and 3rd quartile

If you have a quantitative outcome variable and a categorical explanatory variable, which visualization(s) could you use?

-Faceted histogram (gf_facet_grid) -Box plot (gf_boxplot) -Jitter plot (gf_jitter) -Scatterplot (gf_point)

If you have a single, categorical variable, which visualization(s) could you use?

-Frequency table (tally) -Bar graph (gf_bar)

Construct a scatterplot to explore the relationship in the between GPA and Happiness among participants in the SleepStudy. What seems to be true?

-GPA does not appear to predict happiness. -The participants with the lowest GPA are NOT the least happy. -The participants with the highest GPA are NOT universally happy.

If you have a single, quantitative variable, which visualization(s) could you use?

-Histogram (gf_histogram) -Box plot (gf_boxplot)

If you have a quantitative outcome variable and a quantitative explanatory variable, which visualization(s) could you use?

-Jitter plot (gf_jitter) -Scatterplot (gf_point)

Which of the following variables do you think would be worse than Sexat explaining variation in Thumb length?

-Job -Year -RaceEthnic

The scatterplot above shows Thumb as the outcome variable (on the y-axis) and Height (in inches) as the explanatory variable (on the x-axis). Which of the following relationships can you see in the graph?

-Taller people tend to have longer thumbs. -Shorter people tend to have shorter thumbs. -If you know someone's height, you can make a more accurate prediction of their thumb length than if you didn't know their height.

What is IQR?

-The distance between Q3 and Q1 -The height of the box

How should we interpret this boxplot?

-There's more variability in happiness among high stress individuals than there is in individuals with normal stress levels. -Individuals with normal stress levels look to be happier than individuals with high stress levels. -A person with a median level of happiness within the normal stress group is happier than about 75% of all individuals in the high stress group.

The scatter plot of Happiness by GPA is below. The mean is drawn in orange. What does the scatter plot tell you about the relationship between Happiness and GPA? (Assume that the maximum Happiness score was 36.)

-Whether an individual has a high or low GPA, our best prediction of their happiness isn't very different. -Happiness is unrelated to GPA. -There are more individuals that have a Happinessscore greater than 20, compared to less than 20.

Maybe considering yourself a morning person (a "lark") or an evening person (an "owl") is related to variation in GPA. Which of the following plots would help us see whether variation in GPA is related to variation in LarkOwl?

-gf_histogram(~ GPA, data = SleepStudy) %>% gf_facet_grid(LarkOwl ~ .) -gf_boxplot(GPA ~ LarkOwl, data = SleepStudy) -gf_point(GPA ~ LarkOwl, data = SleepStudy)

If we created a boxplot and then chained a jitter plot onto it, what proportion of the points would fall inside the box?

50%

To examine the distribution of Happiness, which would be more useful?

A Histogram

Where on the density histogram would you look to see evidence of between-group variation in Happiness?

Across the two histograms

Why do you think the box for tall is wider than the box for medium?

Because height varies more for people in the tallgroup.

What kind of explanatory variable does it include?

Categorical

What kind of outcome variable does it include?

Categorical

What kind of variables should go in tally()?

Categorical

You have learned to make some pretty fancy histograms now. Let's take a moment to reflect. What kind of variables should go in gf_facet_grid()?

Categorical

Imagine that you wrote the following code. What would it do? gf_boxplot(Happiness ~ Stress, data = SleepStudy, color = "orange") %>% gf_jitter()

Create a single plot (a boxplot with an overlaid jitter plot)

True or False The sample distribution would be almost perfectly rectangular if only we had rolled a 20-sided die.

False

In a study designed to find out if smoking habits explain variation in fat consumption, _______ would be the outcome variable and ______ would be the explanatory variable.

Fat; EverSmoke

Using the SleepStudy data frame, produce a jitter plot to examine ClassesMissed by Gender (coded 0 for female, 1 for male). Among students who missed no class were there more females or more males?

Females

If you have a categorical outcome variable and a categorical explanatory variable, which visualization(s) could you use?

Frequency table (tally)

You suspect that in the SleepStudy, Gender can be used to explain sleep quality (PoorSleepQuality). Produce a jitter plot to explore whether your suspicion might be right. Which of the following is true?

Gender does not appear to predict sleep quality.

Let's say a researcher hopes to explore the hypothesis that knowing about someone's stress level can help to predict their happiness. What word equation best captures this idea?

Happiness = Stress + other stuff

In a study designed to find out what explains variation in Happiness, _____ would be the outcome variable and _____ would be the explanatory variable.

Happiness; Stress

Consider the following model: Thumb = Sex + other stuff What is the best visualization to use?

Histogram

Here is a density histogram of self-reported Happiness faceted by Stress (high vs. normal). Where on the density histogram would you look to see evidence of within group variation in Happiness?

Horizontally, along the x-axis

How do we find IQR?

IQR = Q3 - Q1

In the jitter plot above, which makes use of transparency, what visual feature indicates a higher frequency of data points?

Less transparency

Use gf_point() to examine ClassesMissed by Gender (coded 0 for females, 1 for males). Locate the SleepStudy participant who missed the most classes. Is it a female or a male?

Male

Construct a boxplot using data from the NutritionStudy that illustrates how females and males (coded in the variable Gender) differ in daily consumption of Calories. Which of the following is true?

More than half of females consume less than 2,000 calories per day.

Someone has a hypothesis that younger people drink more alcohol than older people. Based on the scatterplot of number of drinks per week by age, which of the following observations is true?

Most people do not drink more than five drinks per week.

Consider the following model: Thumb = Height + other stuff What is the best visualization to use?

Neither

Of the two variables used to create the faceted histograms above (year in college and race/ethnicity), which does a better job predicting thumb length?

Neither seems like a very good predictor

Based on the data shown in the boxplot, can we conclude that smoking causes changes in fat consumption?

No, because these data are the result of a correlation study, not an experiment.

What does the distance between the two points (shown in the red rectangle) mean?

Nothing

Use the DataCamp window above to construct a faceted histogram of Fat by EverSmoke in the NutritionStudy data frame. Which of the three EverSmoke groups looks like the panel below?

Patients who CURRENTLY smoke

Still curious about Stress as an explanatory variable in the SleepStudy, you construct a boxplot to see if it's related to DepressionScore. Which of the following is true?

Q3 for participants with normal stress levels is roughly the same as Q1 for participants with high stress participants.

What kind of outcome variable does it include?

Quanitative

What kind of explanatory variable does it include?

Quantitative

What kind of outcome variable does it include?

Quantitative

What kind of variables should go in gf_histogram()?

Quantitative

We now have explored examples of three of the following kinds of variables. Which have we not yet explored?

Quantitative explanatory

You decide to conduct a study of energy drinks using undergraduates from your school. You select participants by randomly choosing ID numbers from among all ID numbers of current students. Once chosen, you randomly pick one of two energy drinks for students to consume weekly, throughout the school term. The first step is an example of _____ and the second is an example of _____.

Random selection; random assignment

Let's split GPA into three groups—low, medium, and high—and then create a faceted histogram. What goes in the blanks in the following code? SleepStudy$GPA3Group <- ntile(_____, 3) gf_dhistogram(~ Happiness, data = _____) %>% gf_facet_grid(GPA3Group ~ .)

SleepStudy$GPA; SleepStudy

Consider the following model: WtLost = Condition + other stuff What is the best visualization to use?

Tally

In your study, you tested two types of energy drinks (SuperBuzz and StayFocused). You found that students who consumed SuperBuzz rated themselves as more alert on average than did those who drank StayFocused. Your roommate suspects that you are being fooled by chance (also called Type 1 error). What's her concern?

The difference you found was the result of sampling variation.

Below is a boxplot of Calories consumed per day by Gender. On the right you see the distribution for males. The two rectangles that compose the "box" portion of the plot have different heights. What does that mean?

The distribution of Calories consumed by males is skewed.

In the plot below, what does the point circled in red represent?

The happiness of a student with high stress

Using the SleepStudy data frame, create a boxplot to explore whether Stress (coded as normal or high) might be used to explain GPA. Which of the following statements are true?

The sole outlier is a participant with a normal level of stress.

Where should you look in the histogram to notice within-group variation?

The spread of the distribution

Which Height variable (the three-category variable or the two-category variable) explains more variation in Thumb length?

The three-category variable

In the jitter plot below, we've put a green box around a dense row of data and a red box around a less dense row of data. What does the density of dots represent?

There are a lot of individuals who have the same value on the y-axis.

Based on this histogram, which of the following observations are true?

There are fewer people who consider themselves larks than those who consider themselves neither a Lark nor an Owl.

In the SleepStudy, might Stress be a predictor of Happiness? What do you see in boxplot?

There are more outliers among participants with normal stress than there are among participants with high stress.

True or False Even a truly random process like a 20-sided fair die would rarely result in a perfectly rectangular distribution.

True

True or False It's very unlikely that a computer-generated random sample of 211 numbers would be perfectly rectangular because there are random fluctuations.

True

True or False Larger samples would probably look more rectangular than smaller samples. So, a sample of 10,000 would look more rectangular than a sample of 1,000, and a sample of 1,000 would look more rectangular than a sample of 200.

True

In the boxplot+jitter plot above, how many variables are depicted?

Two: Thumb and Sex

If we made a histogram of 211 random numbers generated from a computer or 20-sided die, what would the resulting distribution look like? What shape would you expect?

Uniform

Where on this boxplot would you look to see evidence of within-group variation in fat consumed?

Vertically, within each boxplot

Based on what you see in the histograms, scatterplots, and boxplots, is some of the variation in Thumb length explained by Height?

Yes

Try to compare the two distributions above. Do thumb lengths vary by sex?

Yes

What R code produced the plot below?

gf_point(Happiness ~ Stress, data = SleepStudy)

Which of these functions might help us recode Height as a categorical variable?

ntile()

If we are mostly going to put the outcome variable on the y-axis (and the explanatory variable on the x-axis), what order would you expect in our R code?

outcome ~ explanatory


Ensembles d'études connexes

Force and Motion Calculations (S=D/T and F=MA)

View Set

ACCT 218 | Chapter 5 & 6 | Test (Incomplete)

View Set

Medical-Surgical: Cardiovascular and Hematology

View Set

Digital Marketing and Social Media

View Set