Chapter 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

C) The number of grams of fat consumed per day by this patient.

A researcher was interested in the fat content in patients' diets. What does the 50.1 (in the green box) represent? A) The number of grams the typical 76-year-old patient in this population consumes per day. B) The percent of patients in the sample that ate more fatty food than this patient in the current week. C) The number of grams of fat consumed per day by this patient. D) The percentage of calories that came from fat for this sample.

- ntile() - splits it into n tiles. -ntile(Fingers$Height, 2)

How do you create Categorical Variables by Cutting Quantitative Variables? What does that function do?

"NA"

How do you identify missing data?

as.factor() and as.numeric()

How do you specify variables in R?

head(select(Fingers, RaceEthnic, Job))

How would you combine head() and select()?

B) sum(FloridaLakes$NumSamples)

How would you quickly find the total number of water samples (or test tubes) collected across all of the lakes in your study? A) sample(sum, FloridaLakes) B) sum(FloridaLakes$NumSamples) C) tally(~NumSamples, data = FloridaLakes) D) arrange(FloridaLakes, NumSamples)

C) arrange(FloridaLakes, AvgMercury)

If you want to quickly see the name of the lake with the lowest average mercury level, what R command might you run? A) arrange(FloridaLakes) B) tally(FloridaLakes$AvgMercury) C) arrange(FloridaLakes, AvgMercury) D) str(FloridaLakes)

A) The sample is not random

If you wanted to generalize to all lakes in Florida but only included lakes within a 50 km radium of the research center in your study, what should concern you? (check all that apply) A) The sample is not random B) The sample is not convenient C) The sample will not have variation D) The sample may not represent the population you want to know about

B) str()

If you'd like to see an overview of what's in the data frame -- a list of your variables, whether they're numeric or factors, and so forth -- what command would you use? A) tally() B) str() C) c() D) sort()

C) There will be more variation than you might expect.

If you're told that there's random measurement error in how one of your variables was recorded, what do you know for sure? A) Your data are biased. B) A mistake was made when the data were either recorded or entered. C) There will be more variation than you might expect. D) All of the above

D) A value

In the NutritionStudy data frame, the number 6.3 appears in the column labeled Fiber. This 6.3 is an example of A) A variable B) A condition C) A unit sampled D) A value

C) filter(FloridaLakes, Chlorophyll​ != "NA")

Let's say you want to filter your data so that you do NOT include lakes that have missing data for Chlorophyll. What line of code will do that? A) filter(FloridaLakes, Chlorophyll != "0") B) filter(FloridaLakes, Chlorophyll​ == "0") C) filter(FloridaLakes, Chlorophyll​ != "NA") D) filter(FloridaLakes, Chlorophyll​ == "NA")

D) FloridaLakes <- arrange(FloridaLakes, desc(AvgMercury))

The data frame is currently organized alphabetically by lake. What if you'd like to see it ordered by average mercury level, with the most polluted lake appearing first on the list? Save the result back into FloridaLakes. A) FloridaLakes <- sample(FloridaLakes, desc(AvgMercury)) B) FloridaLakes <- sum(AvgMercury) C) FloridaLakes <- tally(~ NumSamples, AvgMercury) D) FloridaLakes <- arrange(FloridaLakes, desc(AvgMercury))

C) The number of rows in the data frame

The nutrition study included 315 patients. Where is this information represented in the data frame? A) One of the values in the data frame B) The number of variables in the data frame C) The number of rows in the data frame D) The number of columns in the data frame

A) patients; variables

The rows in the NutritionStudy data frame represent _____ and the columns represent _____. A) patients; variables B) variables; values C) variables; patients D) None of the above

A) lakes; qualities of the lake

The rows in this data frame represent _____ and columns represent _____. A) lakes; qualities of the lake B) each test tube; qualities of the test tube C) mercury level; qualities of mercury D) none of the above

C) The number of rows of data

The study included 53 lakes in Florida. Where is this information in the data frame? A) One of the values in the data frame B) The number of variables in the data frame C) The number of rows of data D) The number of columns of data

Categorical variables take category or label values, and place an individual into one of several groups. Nominal, Qualitative

What are categorical variables?

A) Annie, AvgMercury, 1.33

What are examples of a research unit (or case), a variable, and a value, respectively? A) Annie, AvgMercury, 1.33 B) AvgMercury, Annie, 1.33 C) Annie, AvgMercury, pH D) ID, AvgMercury, 1.33

Quantitative variables are also called continuous variables. They are measured on a scale in which a value could be placed between any two numbers (can be measured to the decimal place).

What are quantitative variables?

$ is often used to indicate that what follows is a variable name

What does "$" in R mean? eg) MindsetMatters$Age

Sorts data frames.

What does arrange() do? eg) arrange(MindsetMatters, Age)

creates a vector

What does c() do?

Arranges values in descending order

What does desc() do? eg) arrange(MindsetMatters, desc(Age))

Shows you just the first few rows of a data frame.

What does head() do? eg) head(MindsetMatters)

It recodes the variable for you.

What does recode() do? eg) recode(Fingers$Job, "1" = 0, "2" = 50, "3" = 100)

Takes a random sample of the given dataset.

What does sample() do?

look at just a few specific variables

What does select() do? eg) select(Fingers, RaceEthnic, Job)

Sorts a given set in ascending order

What does sort() do?

shows us the last few data in the data set.

What does tail() do?

Creates a frequency table.

What does tally() do? eg) tally(~ Age, data = MindsetMatters); tally(MindsetMatters$Age), tally(my vector)

R lets us assign labels to different levels of a categorical variable.

What does the factor() function do? eg) Fingers$Sex <- factor(Fingers$Sex, levels = c(1,2), labels = c("female", "male"))

measuring in MM instead of CM

What is a mistake in measurement?

2 fundamental process of statistics: measurement and sampling

What is data the result of?

Independence in the context of sampling means that the selection of one object for a study has no effect on the selection of another object; if the two are selected, they are selected independently, just by chance.

What is independent sampling?

The difference between a measured value and the true value; Error caused by the natural fluctuation in most real-world measurements.

What is measurement error?

The process by which we represent some attribute of an object with a number or place it into a category.

What is measurement?

Everyone in the population has an equal chance of being studied.

What is random sampling?

Samples will be different from one another. In addition, no sample will be perfectly representative of the population. Can be biased or unbiased.

What is sampling variation or sampling error?

The process by which we choose which people to study.

What is sampling?

Can create new data frames with different summary values based on different groupings.

What is the aggregate() function? eg) aggregate(Happiness ~ Region, data = HappyPlanetIndex, FUN = mean)

D) all of the above

What is true about data? A) They require that you've selected a sample. B) They are the result of measurement. C) They represent something about the world D) All of the above.

C) Because of it, no sample will perfectly reflect the population.

What's true of sampling variation? A) It's almost purely theoretical. We rarely encounter it. B) It leads to bias. C) Because of it, no sample will perfectly reflect the population. D) All of the above.

The rows represent the cases sampled. The columns represent variables, or the attributes of each case that were measured.

When organizing data in a data frame, what does the row represent? Column?

B) pH C) NumSamples D) MinMercury

Which of the following from FloridaLakes are quantitative variables? (check all that apply) A) Lake B) pH C) NumSamples D) MinMercury

A) NutriStudy <- filter(NutritionStudy, Alcohol <200)

Which of the following lines of R code would save only the patients with less than 200 drinks per week into a new data frame called NutriStudy? A) NutriStudy <- filter(NutritionStudy, Alcohol <200) B) NutriStudy <- arrange(NutritionStudy, Alcohol) C) filter(NutritionStudy, Alcohol != 200) D) NutriStudy <- tally(NutritionStudy$Alcohol)

C) A new data frame of 10 lakes drawn randomly from your FloridaLakes data frame.

You run the following command: RandomLakes <- sample(FloridaLakes, 10). What will be the result? A) A printout of random lakes B) A new data frame of 10 lakes drawn randomly from the population C) A new data frame of 10 lakes drawn randomly from your FloridaLakes data frame. D) None of the above

C) a new data frame of 10 patients drawn randomly from the NutritionStudy data frame

You run the following command: RandomPatients <- sample(NutritionStudy, 10). What will be the result? A) a printout of random patients B) a new data frame of 10 patients drawn randomly from the population C) a new data frame of 10 patients drawn randomly from the NutritionStudy data frame D) none of the above

A) FloridaLakes$MercGroup <- ntile(FloridaLakes$AvgMercury, 3)

You'd like to divide the original data frame into 3 groups with low, medium, and high levels of average mercury. What R function would you use to do this and save the result as a new variable called MercGroup? A) FloridaLakes$MercGroup <- ntile(FloridaLakes$AvgMercury, 3) B) MercGroup <- sort(AvgMercury, 3) C) arrange(FloridaLakes, 3) D) ntile(FloridaLakes$AvgMercury, 3)

A) NutritionStudy$CholesterolGroup <- ntile(NutritionStudy$Cholesterol, 2)

You'd like to divide the patients in the data frame into two equal groups, those who consume relatively low amounts of Cholesterol per day and those who consume relatively high amounts of Cholesterol per day. You want to save this categorization in a variable called CholesterolGroup. What R code could you use to do this? A) NutritionStudy$CholesterolGroup <- ntile(NutritionStudy$Cholesterol, 2) B) ntile(NutritionStudy$Cholesterol, 2) C) arrange(Cholesterol, 2) D) NutritionStudy$CholesterolGroup <- str(NutritionStudy$Cholesterol, 2)

A) You didn't indicate that you

You'd like to see the first 10 rows of FloridaLakes, so you run head(FloridaLakes). It doesn't give you what you wanted. Why not? A) You didn't indicate that you wanted to see 10 rows. B) head() displays variable names C) head() can only be applied to vectors D) This is an odd request, so there's no R command for it

C) There are capitalization errors

You've been commissioned to do a study of all lakes with average mercury levels above 1. You want to save the data of the lakes that meet this criterion to a new data frame called HighMercury. What's wrong with the following code? HighMercury <- filter(floridalakes, avgmercury > 1) A) It's missing quotation marks around the number 1 B) It doesn't appropriately name the new data frame C) There are capitalization errors D) Nothing

Shows us the overall structure of the data frame, including number of observations, number of variables, names of variables and so on

what does str() do? eg) str(MindsetMatters)

It filters data, it filters in, not out.

what is filter()? eg) filter(Fingers, SSLast != "NA")


Conjuntos de estudio relacionados

Chapter 3: Victims and Victimization

View Set

The Advantages and Disadvantages of Volcanoes

View Set

Citizen Kane: Film Analysis/Criticism

View Set

Chapter 9 Lumbar Spine, Sacrum, and Coccyx

View Set

Last Chapters of Personal Finance

View Set