4. Process Data from Dirty to Clean
The V in VLOOKUP stands for what? 1. Visual 2. Vertical 3. Variable 4. Virtual
2. Vertical
What is the term for a character that indicates the beginning or end of a data item, such as a comma? 1. Marker 2. Condition 3. Delimiter 4. Substring
3. Delimiter
Which statistical power is typically considered the minimum for statistical significance? 1. 0.8 or 80% 2. 0.9 or 90% 3. 0.6 or 60% 4. 0.5 or 50%
1. 0.8 or 80%
A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A 100. What is the correct syntax? 1. =COUNTIF(A2:A100, "<5") 2. =COUNTIF(A2:A100, >5) 3. =COUNTIF(A2:A100, ">5") 4. =COUNTIF(A2:A100, <5)
1. =COUNTIF(A2:A100, "<5")
To count the total number of spreadsheet values within a specified range, a data analyst uses the _________ function. 1. COUNTA 2. WHOLE 3. TOTAL 4. SUM
1. COUNTA
If you have to complete your analysis with insufficient data, how should you address this limitation? 1. Identify trends with the available data 2. Estimate certain findings based on your best guess 3. Use only the data that supports their desired outcome 4. Create new datasets similar to what's available
1. Identify trends with the available data
A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant? 1. Results that are real and not caused by random chance 2. Results that are inaccurate and should be ignored 3. Results that are hypothetical and in need of more testing 4. Results that are unlikely to occur again
1. Results that are real and not caused by random chance
To remove leading, trailing, and repeated spaces in data, analysts use the _______ function. 1. TRIM 2. MID 3. LEFT 4. RIGHT
1. TRIM
What are some of the possible challenges associated with using 100% of a population in data analysis? Select all that apply. 1. Using 100% of a population is not representative 2. Using 100% of a population is time-consuming 3. Using 100% of a population is expensive 4. Using 100% of a population is unreliable
2. Using 100% of a population is time-consuming 3. Using 100% of a population is expensive
At what point during the analysis process does a data analyst use a changelog? 1. While gathering the data 2. While cleaning the data 3. While visualizing the data 4. While reporting the data
2. While cleaning the data
Data _______ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle. 1. replication 2. integrity 3. analysis 4. sampling
2. integrity
Sampling bias in data collection happens when a sample isn't representative of ________. 1. the population most affected by the data 2. the population as a whole 3. a subset of the population 4. a dataset about the population
2. the population as a whole
Conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions. 1. True 2. False
1. True
When calculating sample size using an online calculator, it's necessary to input the _______. Select all that apply. 1. confidence level 2. statistical power 3. population size 4. margin of error
1. confidence level 3. population size 4. margin of error
Which function removes leading, trailing, and repeated spaces in data? 1. TIDY 2. TRIM 3. CROP 4. CUT
2. TRIM
As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. 1. True 2. False
2. False
Reviewing version history is an effective way to view a changelog in SQL. 1. True 2. False
2. False
Documentation is the process of tracking _________ during data cleaning. Select all that apply. 1. additions 2. deletions 3. changes 4. inactivity
1. additions 2. deletions 3. changes
When describing a SUM function, the ________ is =SUM(value 1 through value 2) 1. standard 2. structure 3. syntax 4. script
3. syntax
Which process do data analysts use to make data more organized and easier to read? 1. Data manipulation 2. Data replication 3. Data uniformity 4. Data transfer
1. Data manipulation
A data analyst is cleaning a dataset. They want to confirm that users entered five-digit zip codes correctly by checking the data in a certain spreadsheet column. What would be most helpful as the next step? 1. Changing the column width to fit only five digits 2. Formatting the cells in the column as number 3. Using the MAX function to determine the maximum value in the cells in the column 4. Using the field length tool to specify the number of characters in each cell in the column
4. Using the field length tool to specify the number of characters in each cell in the column
A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use? 1. FIND 2. MATCH 3. VALUE 4. VLOOKUP
4. VLOOKUP
If a data analyst is using data that has been _________, the data will lack integrity and the analysis will be faulty. 1. wide 2. clean 3. public 4. compromised
4. compromised
A data analyst uses the CASE statement to consider one or more ______, then returns a value. 1. additions 2. identifications 3. changes 4. conditions
4. conditions
Companies usually create sample sizes before analysts get to see the data. 1. True 2. False
1. True
Describe the difference between a null and a zero in a dataset. 1. A null indicates that a value does not exist. A zero is a numerical response. 2. A null represents a value of zero. A zero represents an empty cell 3. A null signifies invalid data. A zero is missing data 4. A null represents a number with no significance. A zero represents the number zero
1. A null indicates that a value does not exist. A zero is a numerical response.
Which of the following principles are key elements of data integrity? Select all that apply. 1. Accuracy 2. Selectivity 3. Trustworthiness 4. Consistency
1. Accuracy 3. Trustworthiness 4. Consistency
What are the most common processes and procedures handled by data warehousing specialists? Select all that apply. 1. Ensuring data is backed up to prevent loss 2. Ensuring data is secure 3. Ensuring data is available 4. Ensuring data is properly cleaned
1. Ensuring data is backed up to prevent loss 2. Ensuring data is secure 3. Ensuring data is available
SQL is a language used to communicate with databases. Like most languages, SQL has dialects. What are the advantages of learning and using standard SQL? Select all that apply. 1. Standard SQL works with a majority of databases 2. Standard SQL is much easier to learn than other dialects 3. Standard SQL is automatically translated by databases to other dialects 4. Standard SQL requires a small number of syntax changes to adapt to other dialects
1. Standard SQL works with a majority of databases 4. Standard SQL requires a small number of syntax changes to adapt to other dialects
In order to have a high confidence level in a customer survey, what should the sample size accurately reflect? 1. The entire population 2. The trends from other customer surveys 3. The most valuable members of the population 4. The predictions of stakeholders
1. The entire population
A team of analyst is working on a data analytics project. How could data in a SQL database be more useful to the team than data in spreadsheets? Select all that apply. 1. They can use SQL to interact with the database program 2. They can use SQL to pull information from the database at the same time 3. They can track changes to SQL queries across the team 4. They can use SQL to make working with smaller datasets easier
1. They can use SQL to interact with the database program 2. They can use SQL to pull information from the database at the same time 3. They can track changes to SQL queries across the team
Why is it important for a data analyst to document the evolution of a dataset? Select all that apply. 1. To recover data-cleaning errors 2. To inform other users of changes 3. To identify best practices in the collection of data 4. To determine the quality of the data
1. To recover data-cleaning errors 2. To inform other users of changes 4. To determine the quality of the data
In SQL databases, the ________ function can be used to convert data from one datatype to another. 1. LENGTH 2. SUBSTR 3. CAST 4. TRIM
3. CAST
The _________ function can be used to return non-null values in a list. 1. CAST 2. TRIM 3. COALESCE 4. CONCAT
3. COALESCE
What tool can a data analyst use to figure out how many identical errors occur in a dataset? 1. CONFIRM 2. CASE 3. COUNTA 4. COUNT
3. COUNTA
Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity? 1. Leave the dates in their current formats 2. Remove data in an unfamiliar date format 3. Change all of the dates to the same format 4. Organize the data by country
3. Change all of the dates to the same format
A financial analyst imports a dataset to their computer from a storage device. As it's being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise? 1. Data manipulation 2. Data analysis 3. Data transfer 4. Data gathering
3. Data transfer
A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database? 1. CREATE TABLE IF NOT EXISTS 2. UPDATE 3. INSERT INTO 4. DROP TABLE IF EXISTS
3. INSERT INTO
A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey? 1. A sample of all electric car owners 2. A sample of car owners who have owned more than one electric car 3. A sample of car owners who most recently bought an electric car 4. The entire population of electric car owners
1. A sample of all electric car owners
What is involved in seeing the big picture when verifying data cleaning? Select all that apply. 1. Consider the goal 2. Consider the reporting 3. Consider the business problem 4. Consider the data
1. Consider the goal 3. Consider the business problem 4. Consider the data
Making sure data is properly verified is an important part of the data-cleaning process. Which of the following tasks are involved in this verification? Select all that apply. 1. Considering whether the data is credible and appropriate for the project 2. Manually fixing any errors found in the data 3. Rechecking the data-cleaning effort 4. Asking stakeholders to check and confirm the data is clean
1. Considering whether the data is credible and appropriate for the project 2. Manually fixing any errors found in the data 3. Rechecking the data-cleaning effort
Which of the following are limitations that might lead to insufficient data? Select all that apply. 1. Data from a single source 2. Outdated data 3. Data that updates continually 4. Duplicate data
1. Data from a single source 2. Outdated data 3. Data that updates continually
A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe? 1. Data that's geographically limited 2. Data that keeps updating 3. Data that's outdated 4. Data from only one source
1. Data that's geographically limited
Documenting data-cleaning makes it possible to achieve what goals? Select all that apply. 1. Demonstrate to project stakeholders that you are accountable 2. Be transparent about your process 3. Visualize the results of your data analysis 4. Keep team members on the same page
1. Demonstrate to project stakeholders that you are accountable 2. Be transparent about your process 4. Keep team members on the same page
What are the most common processes and procedures handled by data engineers? Select all that apply. 1. Developing, maintaining, and testing databases and related systems 2. Verifying results of data analysis 3. Giving data a reliable infrastructure 4. Transforming data into a useful format for analysis
1. Developing, maintaining, and testing databases and related systems 3. Giving data a reliable infrastructure 4. Transforming data into a useful format for analysis
A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply. 1. What was the difference in population between two specific countries in 2018? 2. What was the reason for the population increase in a certain country? 3. What was the effect of migration on the population of a certain country? 4. What was the average population of a certain country from 2015 through 2020?
1. What was the difference in population between two specific countries in 2018? 4. What was the average population of a certain country from 2015 through 2020?
In data analytics, _________ describes how well two or more datasets are able to work together. 1. compatibility 2. suitability 3. alignment 4. agreement
1. compatibility
While cleaning data, documentation is used to track ________. Select all that apply. 1. deletions 2. bias 3. errors 4. changes
1. deletions 3. errors 4. changes
Data mapping is the process of _______ fields from one data source to another. 1. matching 2. extracting 3. linking 4. merging
1. matching
Describe the relationship between a text string and a substring. 1. A text string is a column of data within a table. A substring is one cell within that column 2. A text string is a group of characters within a cell. A substring is a smaller subset of that text string 3. A text string is the list of attributes at the top of columns within a table. A substring is a single attribute within that list 4. A text string is a row of data within a table. A substring is one cell within that row
2. A text string is a group of characters within a cell. A substring is a smaller subset of that text string
What is one potential problem associated with data manipulation that analysts must be aware of? 1. Data manipulation can make a dataset easier to read 2. Data manipulation can introduce errors 3. Data manipulation can help organize a dataset 4. Data manipulation can separate a dataset among different locations
2. Data manipulation can introduce errors
What is the process of combining two or more datasets into a single dataset? 1. Data validation 2. Data merging 3. Data transferring 4. Data composition
2. Data merging
A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called? 1. Substring 2. Delimiter 3. Partition 4. Unit
2. Delimiter
A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%. 1. True 2. False
2. False
A data analyst wants to determine the length of a text string by counting the number of characters it contains. They can use the MID function. 1. True 2. False
2. False
SQL and spreadsheets process large amount of data at the same speed. 1. True 2. False
2. False
Sometimes during analysis, an analyst discovers that it's necessary to adjust the business objective. When this happens, the analyst should take the initiative to do so without involving others in order to be respectful of their time. 1. True 2. False
2. False
To evaluate how well two or more data sources work together, data analysts use data mapping. 1. True 2. False
2. False
Verification and reporting come directly before the data-cleaning process. 1. True 2. False
2. False
When gathering data through a survey, companies can save money by surveying 100% of a population. 1. True 2. False
2. False
Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply. 1. Process huge amounts of data efficiently 2. Perform arithmetic 3. Use formulas 4. Join data
2. Perform arithmetic 3. Use formulas 4. Join data
Which of the following are benefits of using SQL? Select all that apply. 1. SQL can be used to program microprocessors on database servers 2. SQL can be adapted and used with multiple database programs 3. SQL can handle huge amounts of data 4. SQL offers powerful tools for cleaning data
2. SQL can be adapted and used with multiple database programs 3. SQL can handle huge amounts of data 4. SQL offers powerful tools for cleaning data
What are some of the benefits of using SQL for analysis? Select all that apply. 1. SQL has built-in functionalities 2. SQL interacts with database programs 3. SQL tracks changes across a team 4. SQL can pull information from different database sources
2. SQL interacts with database programs 3. SQL tracks changes across a team 4. SQL can pull information from different database sources
Which of the following SQL functions can data analysts use to clean string variables? Select all that apply. 1. LENGTH 2. SUBSTR 3. COUNTIF 4. TRIM
2. SUBSTR 4. TRIM
Conditional formatting is a spreadsheet tool that changes how cells appear when values meet a specific condition. Data analysts can use conditional formatting to do which of the following tasks? Select all that apply. 1. To sort data in series of cells into a meaningful order 2. To identify blank cells or missing information 3. To calculate mathematical equations 4. To make cells stand out for more efficient analysis
2. To identify blank cells or missing information 4. To make cells stand out for more efficient analysis
Data _______ is a cleaning feature to check the accuracy and quality of data before adding or importing it. 1. security 2. validation 3. governance 4. mapping
2. validation
In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population's true response? 1. Between 70% and 75% 2. Between 75% and 80% 3. Between 70% and 80% 4. Between 73% and 78%
3. Between 70% and 80%
What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply. 1. Continue with the analysis using data from less reliable sources 2. Create and us hypothetical data that aligns with analysis predictions 3. Gather related data on a small scale and request additional time to find more complete data 4. Perform the analysis by finding and using proxy data from other datasets
3. Gather related data on a small scale and request additional time to find more complete data 4. Perform the analysis by finding and using proxy data from other datasets
A data analyst is cleaning a dataset with inconsistent formats and repeated cases. They use the TRIM function to remove extra spaces from string variables. What other tools can they use for data cleaning? Select all that apply. 1. Import data 2. Protect sheet 3. Remove duplicates 4. Find and replace
3. Remove duplicates 4. Find and replace
A data analyst wants to find out how many people in Utah have swimming pools. It's unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept? 1. Statistical significance 2. Confidence level 3. Sample 4. Margin of error
3. Sample
A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe? 1. Visualizing 2. Calculating 3. Typecasting 4. Processing
3. Typecasting
Every database has its own formatting, which can cause the data to seem inconsistent. Data analysts use the __________ tool to create a clean and consistent visual appearance for their spreadsheets. 1. autocorrect 2. spellcheck 3. clear formats 4. conditional formatting
3. clear formats
Margin of error is the _______ amount that the sample results are expected to differ from those of the actual population. 1. median 2. average 3. maximum 4. minimum
3. maximum
To correct a typo in a database column, where should you insert a CASE statement in a query? 1. As an ORDER BY clause 2. As a GROUP BY clause 3. As a FROM clause 4. As a SELECT clause
4. As a SELECT clause
Data analysts use which function to add strings together and create new text strings for unique keys? 1. CAST 2. LENGTH 3. TRIM 4. CONCAT
4. CONCAT
A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe? 1. Outdated data 2. Data from only one source 3. Geographically limited data 4. Data that keeps updating
4. Data that keeps updating
