Chapter 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Which of the following from FloridaLakes are quantitative variables? (Check all that apply.)

-pH -NumSamples -MinMercury

Tally up the number of lakes for which the variable AgeData is 0. How many are there?

10

Which of the following lines of R code would save only the patients with less than 200 drinks per week into a new data frame called NutriStudy?

NutriStudy <- filter(NutritionStudy, Alcohol<200)

What does the second row of numbers tell you?

The frequency with which each number appears in your list

What does TRUE mean in this context?

This person's ring finger is longer (>) than their index finger.

If you wanted to generalize to all lakes in Florida, but only included lakes within a 50 km radium of the research center in your study; what should concern you? (Check all that apply.)

-The sample is not random. -The sample may not represent the population you want to know about

Use the DataCamp window above to write some code that will show you the values for Age and Alcohol for patients in the NutritionStudydata frame. The last study participant is 45 years old. How many alcoholic drinks does she consume per week?

0.2

Use ntile() to create groups of lakes that are low, medium, and high in Chlorophyll. Save this in the FloridaLakes data frame as a new variable called Chlorophyll3Group. If you then use the head() and select() commands to print out the first six rows of Chlorophyll3Group, what do you get as a result?

1, 1, 3, 1, 1, 3

What's true about data?

1.They require that you've selected a sample. 2.They are the result of measurement. 3.They represent something about the world.

The FloridaLakes data frame includes a variable called Calcium. How many lakes have a Calcium level that exceeds 5.0? (Hint: Try using the tally command.)

35

How many variables are shown in this snippet of data?

4

Using the DataCamp window above, determine how many variables there are in the FloridaLakes data frame.

5

The FloridaLakes data frame includes information collected by researchers when they analyzed samples of water (collected in standardized test tubes) from a number of lakes. Using the DataCamp window above, determine how many lakes are included in the data frame.

53

Why might a land developer be interested in collecting data on lakes in Florida?

A land developer might be interested in collecting data on lakes in Florida to determine what areas are the most stable for property development and crop growth as well. If the soil is too moist then it can also mean it more loose or muddy, which might lead to a disinterest in future investors. The developer might ask themselves if the area is filtered/clean enough for future investors.

You run the following command: RandomLakes <- sample(FloridaLakes, 10). What will be the result?

A new data frame of 10 lakes drawn randomly from your FloridaLakes data frame

You run the following command: RandomPatients <- sample(NutritionStudy, 10). What will be the result?

A new data frame of 10 patients drawn randomly from the NutritionStudy data frame

In the NutritionStudy data frame, the number 6.3 appears in the column labeled Fiber. This 6.3 is an example of:

A value

Which of the following from the NutritionStudy is a quantitative variable?

Alcohol (number of alcoholic drinks consumed per week)

Look at the image below. What are examples of a research unit (or case), a variable, and a value, respectively?

Annie, AvgMercury, 1.33

What's true of sampling variation?

Because of it, no sample will perfectly reflect the population.

The arrange() function will print out the data set sorted by Age. But if you printed out MindsetMatters, it won't be sorted by Age. Why?

Because we didn't save it.

From looking at the R code, what kind of value is in the variable RingLonger?

Boolean. Ring finger is longer than index—TRUE or FALSE.

If you wanted to know how many students were in Fingers, which commands could help you?

Fingers, tail(Fingers), str(Fingers)

The data frame is currently organized alphabetically by lake. What if you'd like to see it ordered by average mercury level, with the most polluted lake appearing first on the list? Save the result back into FloridaLakes.

FloridaLakes <- arrange(FloridaLakes, desc(AvgMercury))

You'd like to divide the original data frame into three groups with low, medium, and high levels of average mercury. What R function would you use to do this and save the result as a new variable called MercGroup?

FloridaLakes$MercGroup <- ntile(FloridaLakes$AvgMercury, 3)

How would this variable be represented in a tidy data frame?

IndexRingRatio would be represented as a new column in the same data frame.

In the FloridaLakes data frame, what kind of variable is AgeData?

Integer

In this snippet of data, there are 6 ages listed: 35, 45, 52, 29, 38, and 39. Does this mean there are six variables for Age?

No

You'd like to divide the patients in the data frame into two equal groups, those who consume relatively low amounts of Cholesterol per day, and those who consume relatively high amounts of Cholesterolper day. You want to save this categorization in a variable called CholesterolGroup. What R code could you use to do this?

NutritionStudy$CholesterolGroup <- ntile(NutritionStudy$Cholesterol, 2)

Arrange the FloridaLakes data frame by the variable called Calcium. What is the name of the lake with the lowest amount of Calcium?

Ocheese Pond

Which of the following comparison statements would be TRUE if the student had entered in some number for SSLast?

SSLast != "NA"

A researcher was interested in the fat content in patients' diets. What does the 50.1 (in the green box) represent?

The number of grams of fat consumed per day by this patient.

The nutrition study included 315 patients. Where is this information represented in the data frame?

The number of rows in the data frame

The study included 53 lakes in Florida. Where is this information in the data frame

The number of rows of data

What do you think the first row of numbers in the output of the tally() function tells you?

The numbers that were in your list of numbers

You've been commissioned to do a study of all lakes with average mercury levels above 1. You want to save the data of the lakes that meet this criterion to a new data frame called HighMercury. What's wrong with the following code?

There are capitalization errors.

If you're told that there's random measurement error in how one of your variables was recorded, what do you know for sure?

There will be more variation than you might expect.

Why might the Environmental Protection Agency (the EPA) be interested in collecting data on lakes in Florida?

They might be interested in collecting data on lakes in Florida because it is completely surrounded by large bodies of water where you can also find several different kinds of animals that might be interesting to look into.

What's the name of the LAST lake in the FloridaLakes data frame?

Yale

Would it be okay if in another data set, we decided that 20 was male and 10 was female?

Yes

You'd like to see the first 10 rows of FloridaLakes, so you run head(FloridaLakes). It doesn't give you what you wanted. Why not?

You didn't indicate that you wanted to see 10 rows.

If you want to quickly see the name of the lake with the lowest average mercury level, what R command might you run?

arrange(FloridaLakes, AvgMercury)

Let's say you want to filter your data so that you do NOT include lakes that have missing data for Chlorophyll. What line of code will do that?

filter(FloridaLakes, Chlorophyll​ != "NA")

. Here are some data from a study of mercury levels in Florida lakes. Researchers analyzed samples of water (collected in standardized test tubes) from each lake. The study included 53 lakes in Florida and put it in a data frame called FloridaLakes. What R command produced the printout below?

head()

The rows in this data frame represent _____ and columns represent _____.

lakes; qualities of the lake

The rows in the NutritionStudy data frame represent _____ and the columns represent _____.

patients; variables

If you'd like to see an overview of what's in the data frame—a list of your variables, whether they're numeric or factors, and so forth—what command would you use?

str()

How would you quickly find the total number of water samples (or test tubes) collected across all of the lakes in your study?

sum(FloridaLakes$NumSamples)


Conjuntos de estudio relacionados

Integrated Marketing 337 Midterm

View Set

Porths Pathophysiology-- Chapter 1

View Set

Sexually Transmitted Infections (STIs) and Perinatal Infections

View Set