4
Sampling bias in data collection happens when a sample isn't representative of _____.
A sample of all electric car owners
A data analyst creates many new tables in their company's database. When the project is complete, the analyst wants to remove the tables so they don't clutter the database. What SQL commands can they use to delete the tables?
DROP TABLE IF EXISTS
Return the length of a string of text by counting the number of characters it contains
LENGTH()/LEN()
A data analyst is working with product sales data. They import new data into a database. The database recognizes the data for product price as text strings. What SQL function can the analyst use to convert text strings to floats?
cast
Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition.
cells
Documentation is the process of tracking _____ during data cleaning. Select all that apply.
changes, additions, deletions, and errors
In data analytics, _____ describes how well two or more datasets are able to work together.
compatibility
NULL/missing value for the item "Number of employees per store" is an example of
completeness
The degree to which all required measures are known
completeness
The degree to which a set of measures is equivalent across systems
consistency
Which process do data analysts use to make data more organized and easier to read?
data manipulation
Which of the following are limitations that might lead to insufficient data? Select all that apply.
data that updates continually, outdated data, and data from a single source
A data analyst wants to determine the length of a text string by counting the number of characters it contains. They can use the MID function.
false
In SQL databases, what data type refers to a number that contains a decimal?
float
Data _____ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.
integrity
TRIM is a function that removes _____ spaces in data. Select all that apply
leading, trailing, and repeated spaces in data
LEN is a function data analysts use to determine the _____ of a text string by counting the number of characters it contains.
length
Margin of error is the _____ amount that the sample results are expected to differ from those of the actual population
maximum
Making sure data is properly verified is an important part of the data-cleaning process. Which of the following tasks are involved in this verification? Select all that apply.
recheck the data-cleaning effort, manually fix errors in the data, and consider whether the data is credible and appropriate for the project.
A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?
the results should be real and not caused by random chance
Why is it important for a data analyst to document the evolution of a dataset? Select all that apply
to recover data-cleaning errors, inform other users of changes, and determine the quality of the data
While cleaning data, documentation is used to track _____. Select all that apply.
to track changes, deletions, and errors
In which of the following situations would a data analyst use SQL instead of a spreadsheet? Select all that apply.
to work with a huge amount of data. SQL can also quickly pull information from many different sources in a database and record queries and changes throughout a project.
When a data _____ is interrupted, it can result in an incomplete dataset
transfer
The V in VLOOKUP stands for what?
vertical
A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A100. What is the correct syntax?
=COUNTIF(A2:A100,"<5")
Describe the difference between a null and a zero in a dataset.
A null indicates that a value does not exist. A zero is a numerical response.
A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?
A sample of all electric car owners
Describe the relationship between a text string and a substring.
A text string is a group of characters within a cell. A substring is a smaller subset of that text string.
Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply.
Analysts can use SQL and spreadsheets to perform arithmetic, use formulas, and join data
To correct a typo in a database column, where should you insert a CASE statement in a query?
As a SELECT clause
In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population's true response?
Between 70% and 80%
In SQL databases, the _____ function can be used to convert data from one datatype to another
CAST
Convert data from one datatype to another
CAST()
The _____ function can be used to return non-null values in a list.
COALESCE
Return non-null values in a list
COALESCE()
Data analysts use which function to add strings together and create new text strings for unique keys?
CONCAT
Add strings together to create new text strings that can be used as unique keys
CONCAT()
As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. This is an example of data visualization.
False
A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%
False, The confidence level percentage and margin of error percentage do not have to add up to 100%. They are independent of each other
What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.
Gather related data on a small scale and request additional time to find more complete data, Perform the analysis by finding and using proxy data from other datasets.
What is the last name of the customer that appears in row 10 of your query result?
Hughes
A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database?
INSERT INTO
Add new data into a database
INSERT INTO
If you have to complete your analysis with insufficient data, how should you address this limitation?
Identify trends with the available data
Pull data from any table in a database
SELECT FROM
Pull data from a specific place in a table, typically a table column
SELECT FROM WHERE
A data analyst is analyzing medical data for a health insurance company. The dataset contains billions of rows of data. Which of the following tools will handle the data most efficiently?
SQL
Data analysts usually use _____ to deal with very large datasets.
SQL
_____ can process large amounts of data much more quickly than spreadsheets
SQL
Which of the following are benefits of using SQL? Select all that apply.
SQL can handle huge amounts of data, can be adapted and used with multiple database programs, and offers powerful tools for cleaning data.
Which of the following SQL functions can data analysts use to clean string variables? Select all that apply
SUBSTR and TRIM
Return a limited number of characters to create substrings from longer strings of text
SUBSTR()
A data analyst wants to find out how many people in Utah have swimming pools. It's unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?
Sample
SQL is a language used to communicate with databases. Like most languages, SQL has dialects. What are the advantages of learning and using standard SQL? Select all that apply.
Standard SQL works with a majority of databases and requires a small number of syntax changes to adapt to other dialects.
To remove leading, trailing, and repeated spaces in data, analysts use the ____ function.
TRIM
Remove leading, trailing, and repeated spaces in data
TRIM()
A data analyst is cleaning a dataset with inconsistent formats and repeated cases. They use the TRIM function to remove extra spaces from string variables. What other tools can they use for data cleaning? Select all that apply
TRIM, remove duplicates, and find and replace for data cleaning
In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?
The entire population
Addresses in the business database are identified as incorrect when compared to the public postal service database is an example of
accuracy
The degree of conformity of a measure to a standard or a true value
accuracy
Which of the following principles are key elements of data integrity? Select all that apply.
accuracy, completeness, consistency, and trustworthiness
To evaluate how well two or more data sources work together, data analysts use data mapping.
True
A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe?
Typecasting
_____ refers to the process of converting data from one type to another.
Typecasting
Change existing data in a database
UPDATE
A data analyst is cleaning a dataset. They want to confirm that users entered five-digit zip codes correctly by checking the data in a certain spreadsheet column. What would be most helpful as the next step?
Using the field length tool to specify the number of characters in each cell in the column
A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?
VLOOKUP
The concept of using data integrity principles to ensure measures conform to defined business rules or constraints is the definition of what term?
Validity
To count the total number of spreadsheet values within a specified range, a data analyst uses the _____ function.
COUNTA
A data analyst at an e-commerce company is working with a spreadsheet containing last month's sales. The most expensive product their company sells costs $49.99, so they want to quickly confirm that all of the data in the Sales column is $49.99 or less. What function can they use?
COUNTIF
An analyst is working on a project involving customers from Bogota, Colombia. They receive a spreadsheet with 5,000 rows of customer information. What function can they use to confirm that the column for City contains the word Bogota exactly 5,000 times?
COUNTIF
Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?
Change all of the dates to the same format
Remove data from a database
DELETE
What is the process of combining two or more datasets into a single dataset?
Data merging
A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?
Data that keeps updating
A financial analyst imports a dataset to their computer from a storage device. As it's being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?
Data transfer
A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?
Delimiter
What is the term for a character that indicates the beginning or end of a data item, such as a comma?
Delimiter
Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.
Insufficient data and sampling bias
The _____ function retrieves characters from the middle of the text you supply.
MID
A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply.
average population of a certain country from 2015 through 2020 and the difference in population between two specific countries in 2018.
Every database has its own formatting, which can cause the data to seem inconsistent. Data analysts use the _____ tool to create a clean and consistent visual appearance for their spreadsheets.
clear formats
A data analyst uses the CASE statement to consider one or more _____, then returns a value
conditions
What is involved in seeing the big picture when verifying data cleaning? Select all that apply
consider the business problem, the goal, and the data
Date of store opening stored in both MM/DD/YYYY and MM/YY formats is an example of
consistency
What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.
responsible for ensuring data is available, secure, and backed up to prevent loss.
Data uniformity is not a process, but the degree to which the data uses the _____ unit of measurement.
same
A predetermined structure that includes a function's required information and its proper placement is called _____.
syntax
For a function to work properly, data analysts must follow each function's predetermined structure. What is this structure called?
syntax
A team of analysts is working on a data analytics project. How could data in a SQL database be more useful to the team than data in spreadsheets? Select all that apply.
they can access the data at the same time, use SQL to interact with the database program, and track changes to SQL queries across the team.
Documenting data-cleaning makes it possible to achieve what goals? Select all that apply.
to be transparent about your process, keep team members on the same page, and demonstrate to project stakeholders that you are accountable
What are some of the benefits of using SQL for analysis? Select all that apply
tracking changes across a team, interacting with database programs, and pulling information from different database sources
What are the most common processes and procedures handled by data engineers? Select all that apply.
transform data into a useful format for analysis; give it a reliable infrastructure; and develop, maintain, and test databases and related systems
Conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions
true
The CAST function can be used to convert the DATE datatype to the DATETIME datatype.
true
Data collected five years ago used technology that is not approved or supported by the business is an example of what?
validity