D465 - Data Applications
Array
A collection of values in spreadsheet cells
Advantage of storing code in R
Allows reproducibility and collaboration among analysts.
Advantage of tidyverse
Cohesive data manipulation packages in R.
Fill in the blank The SQL command _____ combines table rows with the same values into summary rows. WITH GROUP BY TABLE ORDER BY
GROUP BY
Which HAVING clause indicates to only retrieve products that have been sold more than 100 times? HAVING COUNT(order_items.product_id) > 100 HAVING COUNT(order_items.product_id) < 100 HAVING (order_items.product_id) > 100 HAVING (order_items.product_id > 100)
HAVING COUNT(order_items.product_id) > 100
Output formats for documents
HTML, PDF, Word (docx), Markdown.
Different JOIN functions in SQL
INNER, LEFT, RIGHT, FULL OUTER JOIN types.
Symbol for comments in R
Pound sign (`#`) precedes comments in R.
Programming languages and use cases
Python: web dev, data science, ML, automation.
Nested function usage
Simplifies operations, improves code readability.
R unique challenges
Steeper learning curve, limited web dev capabilities.
Aggregation
The process of collecting or gathering many separate pieces into a whole
Absolute reference
A reference within a function that is locked so that rows and columns won't change if the function is copied
Knit button in R
Compiles R Markdown into desired output formats.
COUNTIF function in spreadsheets
Counts cells meeting a specified condition.
sample() function for biased data
Creates random unbiased data samples.
dplyr filter() function
Subset rows based on specific conditions in R.
Data validation process
The process of checking and rechecking the quality of data so that it is complete, accurate, secure and consistent
Delimiter for code chunks
Triple backticks or markup to define code sections.
Smoothing line usage
Visual representation of trends in data.
Which of the following queries contain subqueries? Select all that apply. 1. SELECT call 2. FROM recordings 3. ORDER BY call.employee_id, call.start_time 1. SELECT employee_id 2. FROM employees 3. WHERE department_id IN (SELECT department_id 4. FROM departments 5. WHERE location_id = 1000) 1. SELECT product_name, 2. CASE 3. WHEN price < 10 THEN 'Low price' 4. WHEN price >= 10 AND price < 20 THEN 'Medium price' 5. ELSE 'High price' 6. END AS price_category 4. FROM products 1. SELECT price 2. FROM sales 3. WHERE price = (SELECT MAX (salary) 4. FROM sales)
1. SELECT price 2. FROM sales 3. WHERE price = (SELECT MAX (salary) 4. FROM sales) 1. SELECT employee_id 2. FROM employees 3. WHERE department_id IN (SELECT department_id 4. FROM departments 5. WHERE location_id = 1000)
A spreadsheet cell contains the coldest temperature ever recorded in Austria: -37 degrees Celsius. Which function would convert that to Fahrenheit? =CONVERT(-37, F, C) =CONVERT(-37, C, F) =CONVERT(-37, "F", "C") =CONVERT(-37, "C", "F")
=CONVERT(-37, "F", "C")
A data analyst at an engineering company calculates the number of spreadsheet rows that contain the value turbine. Which function do they use? =COUNTIF(C1:C100,"turbine") =COUNTIF(C1:C100,turbine) =COUNTIF(C1:C100,"=turbine") =COUNTIF(turbine=C1:C100)
=COUNTIF(C1:C100,"turbine")
Which function will return the number of characters in spreadsheet cell F8 in order to confirm it contains exactly 15 characters? =LEN(F8, 15) =LEN(15) =LEN(F8) =LEN(15, F8)
=LEN(F8)
A data analyst works with a spreadsheet containing product information that often has very long text strings. To check for consistency, they use a function to count the number of characters in cell P12. What is the correct syntax of the function? =LEN(P:12) =LEN(P,12) =LEN(P:P12) =LEN(P12)
=LEN(P12)
Which function will calculate the sum of the products of the corresponding items in the arrays M1:M4 and P1:P4? =SUMPRODUCT(M1:M4, P1:P4) =MULTIPLY(M1:M4, P1:P4) =PRODUCT(M1:M4, P1:P4) =ARRAY(M1:M4, P1:P4)
=SUMPRODUCT(M1:M4, P1:P4)
GROUP BY
A SQL clause that groups rows that have the same values from a table into summary rows
LIMIT
A SQL clause that specifies the maximum number of records returned in a query
OUTER JOIN
A SQL function that combines RIGHT and LEFT JOIN to return all matching records in both tables
JOIN
A SQL function that is used to combine rows from two or more tables based on a related column
COUNT DISTINCT
A SQL function that only returns the distinct values in a specified range
ROUND
A SQL function that returns a number rounded to a certain number of decimal places
INNER JOIN
A SQL function that returns records with matching values in both tables
RIGHT JOIN
A SQL function that will return all records from the right table and only the matching records from the left.
LEFT JOIN
A SQL function that will return all the records from the left table and only the matching records from the right table
Subquery
A SQL query that is nested inside a larger query
Temporary table
A database table that is created and exists temporarily on a database server
SUMPRODUCT
A function that multiplies arrays and returns the sum of those products
Calculated field
A new field within a pivot table that carries out certain calculations based on the values of other fields
Profit margin
A percentage that indicates how many cents of profit has been generated for each dollar of sale
VALUE
A spreadsheet function that converts a text string that represents a number to a numeric value
MATCH
A spreadsheet function used to locate the position of a specific lookup value
Summary table
A table used to summarize statistical information about data
Logical operators
AND (&&), OR (||), NOT (!).
Plus sign in ggplot2
Adds layers to ggplot objects for customization.
When working with a temporary table in a SQL database, at what point will the table be automatically deleted? After completing all calculations in the table After running a report from the table After ending the session in the SQL database After running the query in the SQL database
After ending the session in the SQL database
What will this query return? 1. SELECT * 2. FROM Books_table 3. LEFT JOIN Biography_table All records in the biography table and any matching rows from the books table All records in both the books table and the biography table All rows from the books table joined together with the biography table All records in the books table and any matching rows from the biography table
All records in the books table and any matching rows from the biography table
Modulo
An operator (%) that returns the remainder when one number is divided by another
Main operators in R
Arithmetic, relational, logical, assignment operators.
R and Python similarities
Both widely used in data science with extensive libraries.
Fill in the blank: A data professional uses the SQL _____ statement to return records that meet conditions by including an if/then statement in a query. CASE HAVING WHEN CONCAT
CASE
Which SQL function combines groups of text strings from multiple cells in order to create a new string? CONCAT COMBINE CONSOLIDATE CONNECT
CONCAT
Fill in the blank: The spreadsheet function _____ can be used to tally the number of cells in a range that are not empty. RANGE COUNT COUNT DISTINCT RETURN
COUNT
You write a SQL query that will count values in a specified range. Which function should you include in your query to only count each value once, even if it appears multiple times? COUNT RANGE COUNT DISTINCT COUNT COUNT VALUES
COUNT DISTINCT
COUNT vs. COUNT DISTINCT in SQL
COUNT: total rows, COUNT DISTINCT: unique values.
Fill in the blank: The spreadsheet function _____ returns the number of cells within a range that match a specified value. COUNTIF ARRAY COUNT DISTINCT VALUE
COUNTIF
Which spreadsheet tool finds an average value using values generated within a pivot table? Filter Data validation Conditional formatting Calculated field
Calculated field
A data analyst selects Format Cells and the option Text Is Exactly Baseball. This changes the color of all the cells that contain the word "Baseball." What spreadsheet tool is the analyst using? Conditional formatting Filtering Data validation CONVERT
Conditional formatting
A junior data analyst writes the following formula: =AVERAGE($C$1:$C$100). What are the purposes of the dollar signs ($)? Select all that apply. Average the values in cells C1 to C100 regardless of whether the formula is copied. Ensure rows and columns do not change. Create an absolute reference. Perform the calculation more efficiently.
Create an absolute reference. Ensure rows and columns do not change.
You prepare a project tracker spreadsheet. Next to each project is the name of the team member responsible. What spreadsheet tool will create a drop-down list with team member names to save you time when assigning the projects? Data validation Conditional formatting Pop-up menus Find
Data validation
Fill in the blank: A junior data analyst at a healthcare organization uses the spreadsheet _____ function to locate specific characters from insurance provider account numbers. FROM FIND WHERE IDENTIFY
FIND
Why might a data professional add a CREATE TABLE statement to a temporary table? Include metadata about the data in the table Automate calculations in the table Give multiple people access to the table Create a second table within the temporary table
Give multiple people access to the table
What will GROUP BY do in this query? GROUP BY apartment; SELECT apartment, AVG(price) AS apt_prices FROM rent_data Group together the apartment and rent_data tables Group only the rows in the apt_prices table Group together the rent_data by apartment prices Group the rows in the table by apartment
Group the rows in the table by apartment
A data analyst wants to retrieve only records from a database that have matching values in two different tables. Which JOIN function should they use? OUTER JOIN RIGHT JOIN INNER JOIN LEFT JOIN
INNER JOIN
JOIN commands in SQL
INNER, LEFT, RIGHT, FULL OUTER JOIN types.
Common errors in ggplot2
Incorrect aesthetic mappings, syntax misunderstanding.
When working with subqueries, which query will execute first? Rightmost Outermost Innermost Leftmost
Innermost
What SQL clause can be added to this query to ensure only the first 50 results are returned? 1. SELECT * 2. FROM Leaf_Database 3. WHERE tree_type = maple LIMIT 50 FIRST 50 RETURN 50 ONLY 50
LIMIT 50
Underscores
Lines used to underline words and connect text characters
Tibbles vs. data frames
Modernized data frames with improved features.
A data professional writes a query that uses more than one arithmetic operator. What do they add to the query to control the order of the calculations? Dollar sign ($) Parenthesis [()] Colon [:] Backslash [/]
Parenthesis [()]
What data will appear in the temporary table created through this query? 1. WITH plant_variety AS ( 2. SELECT * 3. FROM bigquery-public-data.plants.African_species 4. WHERE daily_growth_rate_percentage = 0.05 5. ) Plant varieties that grow exactly 0.05 percent per day A random subset of African plant species Plant varieties that are equal to 0.05 inches tall All plant species that exist in the public dataset
Plant varieties that grow exactly 0.05 percent per day
Locking table array in VLOOKUP
Prevents range changes for formula accuracy.
Data security
Protecting data from unauthorized access or corruption by adopting safety measures
A data professional runs a query that will return a dataset containing numbers out to five decimal places. Which SQL function will limit the records to two decimal places? LEN NUM LIMIT ROUND
ROUND
Which data-validation menu option highlights data entry errors to ensure spreadsheet formulas continue to run correctly? Reject invalid inputs Forbid entry Deny text Remove validation
Reject invalid inputs
SELECT command in SQL
Retrieves data from one or more database tables.
SELECT statement usage in SQL
Retrieving data from one or more tables.
In a SQL query, what is the purpose of the modulo (%) operator? Return the remainder of a division calculation Convert a decimal to a percent Apply an exponent to a value Find the square root of a number
Return the remainder of a division calculation
MIN function in spreadsheets
Returns the smallest value in a cell range.
Pivot table elements
Rows, columns, values, filters for data aggregation.
Fill in the blank: A data analyst uses _____ to copy data from one table into a temporary table without adding the new table to the database. TEMP COPY TO WITH SELECT INTO
SELECT INTO
VLOOKUP function in spreadsheets
Searches for values in a vertical column.
Presentation formats
Slides, Dashboards, Interactive web apps.
FROM statement in SQL
Specifies tables for data retrieval in SQL queries.
You use VLOOKUP in a spreadsheet containing weather data. While searching for rainfall levels in Chicago, you encounter an error because your spreadsheet value has a trailing space after the city name. What function should you use to eliminate this space? CUT TRIM NOSPACE VALUE
TRIM
Aliasing
Temporarily naming a table or column in a query to make it easier to read and write
Data aggregation
The process of gathering data from multiple sources and combining it into a single, summarized collection
What will this spreadsheet function return? =SUMIF(K20:K70, ">=50", L20:L70) The sum of all values in cells L20 to L70 that correspond to values in cells K20 to K70 that are greater than or equal to 50. The sum of any values in cells K20 to K70 and cells L20 to L70 that are greater than or equal to 50. The sum of all values in cells K20 to K70 for which the value in cells L20 to L70 is greater than or equal to 50. The count of the number of cells in the array K20:K70 that have a value greater than or equal to 50.
The sum of all values in cells L20 to L70 that correspond to values in cells K20 to K70 that are greater than or equal to 50.
Which of the following statements accurately describe pivot tables? Select all that apply. The calculated field in a pivot table is used to apply filters based on specific criteria. The values in a pivot table are used to calculate and count data. A pivot table is a data summarization tool. The rows of a pivot table organize and group data horizontally.
The values in a pivot table are used to calculate and count data. A pivot table is a data summarization tool. The rows of a pivot table organize and group data horizontally.
What is an example of an array in a spreadsheet? All cells with number values Cells D7, E14, and F20 The values in cells B2 through B31 All cells with values greater than 100
The values in cells B2 through B31
A data analyst at a recycling company manually recalculates the new column materials_sorter. They want to identify any rows with values that do not match those in the original column, compost_sorter. Which SQL clauses would enable them to do so? Select all that apply. WHERE materials_sorter !! compost_sorter WHERE materials_sorter >< compost_sorter WHERE materials_sorter <> compost_sorter WHERE materials_sorter != compost_sorter
WHERE materials_sorter <> compost_sorter WHERE materials_sorter != compost_sorter
Functions in ggplot2
ggplot(), geom_point(), geom_line(), aes().
Spreadsheet cell D5 contains the decimal .74. Which formula will convert it to a percentage? =D5%100 =D5,100 =D5(100) =D5*100
=D5*100
Fill in the blank: Aliasing involves _____ naming a table or column to make a query easier to read and write. permanently perpetually temporarily privately
temporarily
Fill in the blank: To copy data from one table into a _____, a data professional uses the SELECT INTO statement. temporary table new table defined function table view
temporary table
Basic aesthetic attributes in ggplot2
x-axis, y-axis, color for plot customization.
Fill in the blank: A data professional uses _____ in order to ensure spreadsheet values are static, rather than carrying over a preexisting formula or function. conditional formatting formatting paste values only data validation
paste values only
Fill in the blank: To combine rows from two or more tables based on a _____ column, data professionals use the SQL JOIN clause. unique dissimilar foreign related
related
Fill in the blank: The _____ of a pivot table organize and group the selected data horizontally. columns rows filters values
rows