4

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Sampling bias in data collection happens when a sample isn't representative of _____.

A sample of all electric car owners

A data analyst creates many new tables in their company's database. When the project is complete, the analyst wants to remove the tables so they don't clutter the database. What SQL commands can they use to delete the tables?

DROP TABLE IF EXISTS

Return the length of a string of text by counting the number of characters it contains

LENGTH()/LEN()

A data analyst is working with product sales data. They import new data into a database. The database recognizes the data for product price as text strings. What SQL function can the analyst use to convert text strings to floats?

cast

Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition.

cells

Documentation is the process of tracking _____ during data cleaning. Select all that apply.

changes, additions, deletions, and errors

In data analytics, _____ describes how well two or more datasets are able to work together.

compatibility

NULL/missing value for the item "Number of employees per store" is an example of

completeness

The degree to which all required measures are known

completeness

The degree to which a set of measures is equivalent across systems

consistency

Which process do data analysts use to make data more organized and easier to read?

data manipulation

Which of the following are limitations that might lead to insufficient data? Select all that apply.

data that updates continually, outdated data, and data from a single source

A data analyst wants to determine the length of a text string by counting the number of characters it contains. They can use the MID function.

false

In SQL databases, what data type refers to a number that contains a decimal?

float

Data _____ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.

integrity

TRIM is a function that removes _____ spaces in data. Select all that apply

leading, trailing, and repeated spaces in data

LEN is a function data analysts use to determine the _____ of a text string by counting the number of characters it contains.

length

Margin of error is the _____ amount that the sample results are expected to differ from those of the actual population

maximum

Making sure data is properly verified is an important part of the data-cleaning process. Which of the following tasks are involved in this verification? Select all that apply.

recheck the data-cleaning effort, manually fix errors in the data, and consider whether the data is credible and appropriate for the project.

A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?

the results should be real and not caused by random chance

Why is it important for a data analyst to document the evolution of a dataset? Select all that apply

to recover data-cleaning errors, inform other users of changes, and determine the quality of the data

While cleaning data, documentation is used to track _____. Select all that apply.

to track changes, deletions, and errors

In which of the following situations would a data analyst use SQL instead of a spreadsheet? Select all that apply.

to work with a huge amount of data. SQL can also quickly pull information from many different sources in a database and record queries and changes throughout a project.

When a data _____ is interrupted, it can result in an incomplete dataset

transfer

The V in VLOOKUP stands for what?

vertical

A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A100. What is the correct syntax?

=COUNTIF(A2:A100,"<5")

Describe the difference between a null and a zero in a dataset.

A null indicates that a value does not exist. A zero is a numerical response.

A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?

A sample of all electric car owners

Describe the relationship between a text string and a substring.

A text string is a group of characters within a cell. A substring is a smaller subset of that text string.

Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply.

Analysts can use SQL and spreadsheets to perform arithmetic, use formulas, and join data

To correct a typo in a database column, where should you insert a CASE statement in a query?

As a SELECT clause

In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population's true response?

Between 70% and 80%

In SQL databases, the _____ function can be used to convert data from one datatype to another

CAST

Convert data from one datatype to another

CAST()

The _____ function can be used to return non-null values in a list.

COALESCE

Return non-null values in a list

COALESCE()

Data analysts use which function to add strings together and create new text strings for unique keys?

CONCAT

Add strings together to create new text strings that can be used as unique keys

CONCAT()

As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. This is an example of data visualization.

False

A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%

False, The confidence level percentage and margin of error percentage do not have to add up to 100%. They are independent of each other

What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.

Gather related data on a small scale and request additional time to find more complete data, Perform the analysis by finding and using proxy data from other datasets.

What is the last name of the customer that appears in row 10 of your query result?

Hughes

A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database?

INSERT INTO

Add new data into a database

INSERT INTO

If you have to complete your analysis with insufficient data, how should you address this limitation?

Identify trends with the available data

Pull data from any table in a database

SELECT FROM

Pull data from a specific place in a table, typically a table column

SELECT FROM WHERE

A data analyst is analyzing medical data for a health insurance company. The dataset contains billions of rows of data. Which of the following tools will handle the data most efficiently?

SQL

Data analysts usually use _____ to deal with very large datasets.

SQL

_____ can process large amounts of data much more quickly than spreadsheets

SQL

Which of the following are benefits of using SQL? Select all that apply.

SQL can handle huge amounts of data, can be adapted and used with multiple database programs, and offers powerful tools for cleaning data.

Which of the following SQL functions can data analysts use to clean string variables? Select all that apply

SUBSTR and TRIM

Return a limited number of characters to create substrings from longer strings of text

SUBSTR()

A data analyst wants to find out how many people in Utah have swimming pools. It's unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?

Sample

SQL is a language used to communicate with databases. Like most languages, SQL has dialects. What are the advantages of learning and using standard SQL? Select all that apply.

Standard SQL works with a majority of databases and requires a small number of syntax changes to adapt to other dialects.

To remove leading, trailing, and repeated spaces in data, analysts use the ____ function.

TRIM

Remove leading, trailing, and repeated spaces in data

TRIM()

A data analyst is cleaning a dataset with inconsistent formats and repeated cases. They use the TRIM function to remove extra spaces from string variables. What other tools can they use for data cleaning? Select all that apply

TRIM, remove duplicates, and find and replace for data cleaning

In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?

The entire population

Addresses in the business database are identified as incorrect when compared to the public postal service database is an example of

accuracy

The degree of conformity of a measure to a standard or a true value

accuracy

Which of the following principles are key elements of data integrity? Select all that apply.

accuracy, completeness, consistency, and trustworthiness

To evaluate how well two or more data sources work together, data analysts use data mapping.

True

A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe?

Typecasting

_____ refers to the process of converting data from one type to another.

Typecasting

Change existing data in a database

UPDATE

A data analyst is cleaning a dataset. They want to confirm that users entered five-digit zip codes correctly by checking the data in a certain spreadsheet column. What would be most helpful as the next step?

Using the field length tool to specify the number of characters in each cell in the column

A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?

VLOOKUP

The concept of using data integrity principles to ensure measures conform to defined business rules or constraints is the definition of what term?

Validity

To count the total number of spreadsheet values within a specified range, a data analyst uses the _____ function.

COUNTA

A data analyst at an e-commerce company is working with a spreadsheet containing last month's sales. The most expensive product their company sells costs $49.99, so they want to quickly confirm that all of the data in the Sales column is $49.99 or less. What function can they use?

COUNTIF

An analyst is working on a project involving customers from Bogota, Colombia. They receive a spreadsheet with 5,000 rows of customer information. What function can they use to confirm that the column for City contains the word Bogota exactly 5,000 times?

COUNTIF

Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?

Change all of the dates to the same format

Remove data from a database

DELETE

What is the process of combining two or more datasets into a single dataset?

Data merging

A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?

Data that keeps updating

A financial analyst imports a dataset to their computer from a storage device. As it's being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?

Data transfer

A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?

Delimiter

What is the term for a character that indicates the beginning or end of a data item, such as a comma?

Delimiter

Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.

Insufficient data and sampling bias

The _____ function retrieves characters from the middle of the text you supply.

MID

A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply.

average population of a certain country from 2015 through 2020 and the difference in population between two specific countries in 2018.

Every database has its own formatting, which can cause the data to seem inconsistent. Data analysts use the _____ tool to create a clean and consistent visual appearance for their spreadsheets.

clear formats

A data analyst uses the CASE statement to consider one or more _____, then returns a value

conditions

What is involved in seeing the big picture when verifying data cleaning? Select all that apply

consider the business problem, the goal, and the data

Date of store opening stored in both MM/DD/YYYY and MM/YY formats is an example of

consistency

What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.

responsible for ensuring data is available, secure, and backed up to prevent loss.

Data uniformity is not a process, but the degree to which the data uses the _____ unit of measurement.

same

A predetermined structure that includes a function's required information and its proper placement is called _____.

syntax

For a function to work properly, data analysts must follow each function's predetermined structure. What is this structure called?

syntax

A team of analysts is working on a data analytics project. How could data in a SQL database be more useful to the team than data in spreadsheets? Select all that apply.

they can access the data at the same time, use SQL to interact with the database program, and track changes to SQL queries across the team.

Documenting data-cleaning makes it possible to achieve what goals? Select all that apply.

to be transparent about your process, keep team members on the same page, and demonstrate to project stakeholders that you are accountable

What are some of the benefits of using SQL for analysis? Select all that apply

tracking changes across a team, interacting with database programs, and pulling information from different database sources

What are the most common processes and procedures handled by data engineers? Select all that apply.

transform data into a useful format for analysis; give it a reliable infrastructure; and develop, maintain, and test databases and related systems

Conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions

true

The CAST function can be used to convert the DATE datatype to the DATETIME datatype.

true

Data collected five years ago used technology that is not approved or supported by the business is an example of what?

validity


Ensembles d'études connexes

ANSI 1124 Exam 2 Soft Chalk Questions

View Set

Chapter 13: Nursing Care During Labor and Birth, Nursing Care during labor and birth, Chapter 15 : Nursing care of a Family during Labor and Birth, OB CH 15: Nursing Care During Labor and Birth Review Questions

View Set

Bio Unit 6 Gene Expression and Regulation

View Set

Home care instructions for Removable prosthesis

View Set

Financial Management Exam 2 Study Guide

View Set

Bio 1110 ch 2.3 practice questions

View Set