7. Data Analysis with R Programming
A data analyst inputs the following code in RStudio: sales_1 <- (3500.00 * 12) Which of the following types of operators does the analyst use in the code? Select all that apply. 1. Arithmetic 2. Logical 3. Relational 4. Assignment
1. Arithmetic 4. Assignment
The read_csv() function is a part of the dplyr package. 1. True 2. False
2. False
Which of the following functions let you display smaller groups, or subsets, of your data? 1. ggplot() 2. facet_wrap() 3. geom_bar() 4. geom_point()
2. facet_wrap()
Which of the following examples is the proper syntax for calling a function in R? 1. <-20 2. print() 3. #first 4. data_1
2. print()
RStudio includes which of the following panes? Select all that apply. 1. R console pane 2. Environment pane 3. Command pane 4. Source editor pane
1. R console pane 2. Environment pane 4. Source editor pane
What information does a data analyst usually find in the header section of an RMarkdown document? Select all that apply. 1. Date 2. File type 3. Title and author 4. Conclusions
1. Date 2. File type 3. Title and author
What are the benefits of using a programming language to work with your data? Select all that apply. 1. Easily reproduce and share your work 2. Choose a business task for analysis 3. Save time 4. Clarify the steps of your analysis
1. Easily reproduce and share your work 3. Save time 4. Clarify the steps of your analysis
Which of the following are benefits of adding labels and annotations to your plot? Select all that apply. 1. Highlighting important data in your plot 2. Choosing a geom for your plot 3. Helping stakeholders quickly understand your plot 4. Indicating the main purpose of your plot
1. Highlighting important data in your plot 3. Helping stakeholders quickly understand your plot 4. Indicating the main purpose of your plot
What type of software application is RStudio? 1. Integrated development environment 2. Source editor 3. Database 4. Data visualization tool
1. Integrated development environment
A data analyst writes two hashtags next to their header. What will this do to the header font in the .rmd file? 1. Make it smaller 2. Make it bigger 3. Make it centered 4. Make it a different color
1. Make it smaller
Which of the following are standards of tidy data? Select all that apply. 1. Observations are organized into rows 2. Columns are named 3. Each value has its own cell 4. Variables are organized into columns
1. Observations are organized into rows 3. Each value has its own cell 4. Variables are organized into columns
How do data analysts refer to the words and symbols they use to write instructions for computers? 1. Programming languages 2. Variable languages 3. Code languages 4. Syntax languages
1. Programming languages
Which of the following are included in R packages? Select all that apply. 1. Reusable R functions 2. Tests for checking your code 3. Sample datasets 4. Naming conventions for R variable names
1. Reusable R functions 2. Tests for checking your code 3. Sample datasets
Which of the following aesthetics attributes can you map to the data in a scatterplot? Select all that apply. 1. Size 2. Color 3. Text 4. Shape
1. Size 2. Color 4. Shape
When an analyst installs a package that is not in Base R, where does R call the package from? 1. The CRAN archive 2. The RStudio website 3. The tidyverse 4. Python
1. The CRAN archive
In ggplot2, what symbol do you use to add layers to your plot? 1. The plus sign (+) 2. The equal sign (=) 3. The ampersand symbol (&) 4. The pipe operator (%>%)
1. The plus sign (+)
Why do analysts use comments In R programming? Select all that apply. 1. To make an R Script more readable 2. To explain their code 3. To act as functions 4. To provide names for variables
1. To make an R Script more readable 2. To explain their code
A nested function is a function contained within code that performs a broader function. 1. True 2. False
1. True
If you write code directly in R source editor, RStudio can save your code when you close your current session. 1. True 2. False
1. True
In ggplot2, you use the plus sign (+) to add a layer to your plot. 1. True 2. False
1. True
Programming languages can be used to reproduce and share your analysis. 1. True 2. False
1. True
The rename_with() function can be used to reformat column names to be upper or lower case. 1. True 2. False
1. True
Tidy data is a way of standardizing the organization of data within R. 1. True 2. False
1. True
Tidyverse is a collection of packages in R with a common design philosophy. 1. True 2. False
1. True
A data analyst wants to embed a link in their RMarkdown document. They write (click here!)(www.rstudio.com) but it doesn't work. What should they write instead? 1. [click here!](www.rstudio.com) 2. "click here!"(www.rstudio.com) 3. <click here!> (www.rstudio.com) 4. click here!(www.rstudio.com)
1. [click here!](www.rstudio.com)
In ggplot2, an ________ is a visual property of an object in your plot. 1. aesthetic 2. argument 3. annotation 4. alpha
1. aesthetic
A data analysts is cleaning their data in R. They want to be sure that their column names are unique and consistent to avoid any errors in their analysis. What R function can they use to do this automatically? 1. clean_names() 2. rename() 3. rename_with() 4. select()
1. clean_names()
Which of the following functions can a data analyst use to get a statistical summary of their dataset? Select all that apply. 1. cor() 2. sd() 3. ggplot2() 4. mean()
1. cor() 2. sd() 4. mean()
Programming involves __________ a computer to perform an action or set of actions. 1. instructing 2. updating 3. training 4. filtering
1. instructing
A data analyst writes the following code chunk to return a statistical summary of their dataset: quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x,y)). Which function will return the average value of the y column? 1. mean(y) 2. mean(x) 3. cor(x,y) 4. sd(x)
1. mean(y)
The bias function compares the actual outcome of the data with the _______ outcome to determine whether or not the model is biased. 1. predicted 2. desired 3. final 4. probable
1. predicted
Which of the following functions returns a summary of the data frame, including the number of columns and rows? Select all that apply. 1. skim_without_charts() 2. clean_names() 3. rename() 4. glimpse()
1. skim_without_charts() 4. glimpse()
Which of the following examples can you use in R for date/time data? Select all that apply. 1. seven-24-2018 2. 2019-04-16 3. 2018-12-21 16:35:28 UTC 4. 06:11:13 UTC
2. 2019-04-16 3. 2018-12-21 16:35:28 UTC 4. 06:11:13 UTC
What do the label and annotate functions do? 1. Choose a geom 2. Display subsets of your data 3. Customize the look and feel of your plots 4. Load a dataset
3. Customize the look and feel of your plots
What are ggplot2, tidyr, dplyr, and forcats all a part of? 1. A collection of commonly used, CRAN-based data sets 2. A collection of core tidyverse packages 3. A list of variables for use in programming in RStudio 4. A list of functions that clean data efficiently
2. A collection of core tidyverse packages
A data analyst creates an interactive version of their R Markdown document to share with other users that allows them to execute code the analyst wrote. What did they create? 1. A code chunk 2. An R notebook 3. A markdown 4. An HTML report
2. An R notebook
What is the role of the x argument in the following code? ggplot(data = diamonds) + geom_bar(mapping = aes (x = cut)) 1. A variable 2. An aesthetic 3. A function 4. A dataset
2. An aesthetic
Which of the following are benefits of using ggplot2? Select all that apply. 1. Automatically clean data before creating a plot 2. Combine data manipulation and visualization 3. Customize the look and feel of your plot 4. Easily add layers to your plot
2. Combine data manipulation and visualization 3. Customize the look and feel of your plot 4. Easily add layers to your plot
Which of the following are best practices for creating data frames? Select all that apply. 1. Rows should be named 2. Each column should contain the same number of data items 3. Columns should be named 4. All data stored should be the same type
2. Each column should contain the same number of data items 3. Columns should be named
A data analysis wants to convert their R Markdown file into another format. What are their options? Select all that apply. 1. JPEG, PNG, and GIF 2. HTML, PDF, and Word 3. Slide presentation 4. Dashboard
2. HTML, PDF, and Word 3. Slide presentation 4. Dashboard
A data analyst has finished editing their R Markdown file and wants to save it as an HTML report. What tool will they use? 1. Output 2. Knit 3. Save 4. Hashtags
2. Knit
__________ code is freely available and may be modified and shared by the people who use it. 1. Open-syntax 2. Open-source 3. Open-access 4. Open-ended
2. Open-source
A data analyst writes the code summary(penguins) in order to show a summary of the penguins dataset. Where in RStudio can the analyst execute the code? Select all that apply. 1. Files tab 2. R console pane 3. Source editor pane 4. Environment pane
2. R console pane 3. Source editor pane
A data analyst is creating a plot for a presentation to stakeholders. The analyst wants to add a caption to the plot to help communicate important information. What function could the analyst use? 1. The geom_point() function 2. The labs() function 3. The geom_bar() function 4. The facet_wrap() function
2. The labs() function
A data analyst uses the bias() function to compare the actual outcome with the predicted outcome to determine if the model is biased. They get a score of 0.8. What does this mean? 1. Bias cannot be determined 2. The model is biased 3. Bias can be determined 4. The model is not biased
2. The model is biased
A data analyst is working with a dataset in R that has more than 50,000 observations. Why might they choose to use a tibble instead of the standard data frame? Select all that apply. 1. Tibbles can automatically change the names of variables 2. Tibbles automatically only preview the first 10 rows of data 3. Tibbles can create row names 4. Tibbles automatically only preview as many columns as fit on screen
2. Tibbles automatically only preview the first 10 rows of data 4. Tibbles automatically only preview as many columns as fit on screen
The R programming language can be used for which of the following tasks? Select all that apply. 1. Gaming 2. Visualization 3. Statistical analysis 4. Data analysis
2. Visualization 3. Statistical analysis 4. Data analysis
A data analyst wants to mark the beginning of their code chunk. What delimiter should they type in their .rmd file? 1. +++{r } 2. ```{r } 3. ***{r } 4. ==={r }
2. ```{r }
A delimiter is a character that marks the beginning and end of ________. 1. an HTML report 2. a data item 3. an .rmd file 4. a command line
2. a data item
The ________ aesthetic makes some points on a plot more transparent, or see-through, than others. 1. fill 2. alpha 3. linetype 4. color
2. alpha
In R, the ______ is information that a function needs to run. 1. comment 2. argument 3. variable 4. operator
2. argument
A data analyst inputs the following command: quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x,y)). Which of the functions in this command can help them determine how strongly related their variables are? 1. sd(x) 2. cor(x,y) 3. sd(y) 4. mean(y)
2. cor(x,y)
In RStudio, the ________ is where you can find all the data you currently have loaded, and can easily organize and save it. 1. plots pane 2. environment pane 3. R console pane 4. source editor pane
2. environment pane
An analyst is organizing a dataset in RStudio using the following code: arrange(filter(Storage_1, inventory >= 40), count) Which of the following examples is a nested function in the code? 1. inventory 2. filter 3. arrange 4. count
2. filter
A data analyst creates a scatterplot with a lot of data points. It is difficult for the analyst to distinguish the individual points on the plot because they overlap. What function could the analyst use to make the points easier to find? 1. geom_bar() 2. geom_litter() 3. geom_point() 4. geom_line()
2. geom_litter()
Packages in R include ___________. Select all that apply. 1. visualizations 2. reusable R functions 3. tests for checking your code 4. sample datasets
2. reusable R functions 3. tests for checking your code 4. sample datasets
What should you use to assign a value to a variable in R? 1. A comment 2. An argument 3. An operator 4. A vector
3. An operator
A data analyst adds a section of executable code to their .rmd file so users can execute it and generate the correct output. What is this section of code called? 1. Documentation 2. YAML 3. Code chunk 4. Data plot
3. Code chunk
A data analyst previously created a series of nested functions that carry out multiple operations on some data in R. The analyst wants to complete the same operations but make the code easier to understand for their stakeholders. Which of the following can the analyst use to accomplish this? 1. Argument 2. Vector 3. Pipe 4. Comment
3. Pipe
3. In ggplot2, what function do you use to map variables in your data to visual features of your plot? 1. The ggplot() function 2. The geom_bar() function 3. The aes() function 4. The geom_point() function
3. The aes() function
A data analyst is working with a large data frame. It contains so many columns that they don't all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use? 1. str() 2. head() 3. colnames() 4. mutate()
3. colnames()
A data frame is a collection of ___________. 1. data 2. tibbles 3. columns 4. cells
3. columns
Which tidyverse package contains a set of functions, such as select(), that help with data manipulation? 1. ggplot2 2. forcats 3. dplyr 4. readr
3. dplyr
In ggplot2, you can use the ________ function to specify the data frame to use for your plot. 1. labs() 2. geom_point() 3. ggplot() 4. aes()
3. ggplot()
Which tidyverse package is used for data visualization? 1. tidyr 2. readr 3. ggplot2 4. dplyr
3. ggplot2
Which R function can be used to make changes to a data frame? 1. str() 2. head() 3. mutate() 4. colnames()
3. mutate()
A data analyst is working with a data frame named cars. The analyst notices that all the column names in the data frame are capitalized. What code chunk lets the analyst change all the column names to lowercase? 1. rename_with(tolower, cars) 2. rename_with(cars, toupper) 3. rename_with(cars, tolower) 4. rename_with(toupper, cars)
3. rename_with(cars, tolower)
A data analyst wants to quickly create visualizations and then share them with a teammate. They can use for the analysis. 1. a database 2. structured query language 3. the R programming language 4. a dashboard
3. the R programming language
A data analyst is working with customer information from their company's sales data. The first and last names are in separate columns, but they want to create one column with both names instead. Which of the following functions can they use? 1. select() 2. separate() 3. unite() 4. arrange()
3. unite()
In data analytics, what is CRAN? 1. An R interface that has many of the same functions as RStudio 2. A collection of packages that function together to make analysis in R more efficient 3. A function for finding packages to use for analysis in RStudio 4. A commonly used online archive with R packages and other R resources
4. A commonly used online archive with R packages and other R resources
In ggplot2, which of the following concepts refers to the shape, color, and size of data points in a plot? 1. Geoms 2. Facets 3. Annotations 4. Aesthetics
4. Aesthetics
When working in R, for which part of the data analysis process do analysts use the tidyr package? 1. Data visualization 2. Data calculations 3. Data security 4. Data cleaning
4. Data cleaning
A data analyst wants to make one of the headers in their R Markdown document smaller. What should they include in the markdown text to do this? 1. Backticks 2. Semicolons 3. Spaces 4. Hashtags
4. Hashtags
While formatting their R Markdown document, a data analyst decides to make one of the headers smaller. What do they type into the document to do this? 1. Brackets 2. Backticks 3. Parentheses 4. Hashtags
4. Hashtags
A data analyst inserts some code directly into their R Markdown file so that they can refer to it directly in their write-up. What is this called? 1. R notebook 2. YAML header 3. Markdown 4. Inline code
4. Inline code
When using RStudio, what does the installed.packages() function do? 1. Creates code for analysts to use to edit their packages 2. Installs all available packages for use in an RStudio session 3. Selects the best packages to use based on an analyst's current needs 4. Presents a list of packages currently installed in an RStudio session
4. Presents a list of packages currently installed in an RStudio session
What type of plot will the following code create? ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) 1. Line diagram 2. Bar chart 3. Boxplot 4. Scatterplot
4. Scatterplot
A data analyst has to create a monthly report for their stakeholders. What can they create to help them save time generating these reports? 1. R notebook 2. .rmd file 3. HTML report 4. Template
4. Template
What function can you use to put a text label inside the grid of your plot to call out specific data points? 1. The aes() function 2. The facet_wrap() function 3. The labs() function 4. The annotate() function
4. The annotate() function
Why are tibbles a useful variation of data frames? 1. Tibbles make changing the names of variables easier 2. Tibbles can create row names 3. Tibbles can change the data type of inputs 4. Tibbles make printing easier
4. Tibbles make printing easier
To create bullet points in their output document, a data analyst adds ________ to their RMarkdown document. 1. hashtags 2. spaces 3. brackets 4. asterisks
4. asterisks
A data analyst is working with the penguins data. The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. The analyst wants to create a data frame that only includes the Adelie species. The analyst receives an error message when they run the following code: penguins %>% filter(species <- "Adelie") How can the analyst change the second line of code to correct the error? 1. filter(Adelie == species) 2. filter("Adelie" <- species) 3. filter("Adelie") 4. filter(species == "Adelie")
4. filter(species == "Adelie")
Which R function should you use if you want to preview just the first six rows of a data frame? 1. colnames() 2. str() 3. mutate() 4. head()
4. head()
An analyst comes across dates listed as strings in a dataset, for example December 10th, 2020. To convert the strings to a date/time data type, which function should the analyst use? 1. now() 2. lubridate() 3. datetime() 4. mdy()
4. mdy()
Markdown is a ________ for formatting plain text files. 1. coding language 2. file application 3. guide 4. syntax
4. syntax
The benefits of using ____________ for data analysis include the ability to quickly process lots of data and create hight quality visualizations. 1. a dashboard 2. a spreadsheet 3. structured query language 4. the R programming language
4. the R programming language