course 3 Prepare Data for Exploration

¡Supera tus tareas y exámenes ahora con Quizwiz!

What are the main benefits of open data? Select all that apply.

making good data more widely available and combining data from different fields of knowledge

Using encryption to protect data is an example of what?

Data security

The date and time a photo was taken is an example of which kind of metadata?

Administrative

A data analyst completes a project. They move project files to another location to keep them separate from their current work. This is an example of what process?

Archiving files

A database table is named blueFlowers. What type of case is this?

Camel case

What is the process of structuring folders broadly at the top, then breaking down those folders into more specific topics?

Creating a hierarchy

A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?

Descriptive

_____ are a visual way to understand the relationship between entities in the data model.

Entity Relationship Diagram (ERD)

A data analyst chooses not to use external data because it represents diverse perspectives. This is an appropriate decision when working with external data.

False

An employer accesses an employee's credit report without their consent. This is not a violation of the employee's privacy because they work at the company.

False

An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This is called ownership.

False

To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.

Freeze

A data analyst wants to bring data from a CSV file into a spreadsheet. This is an example of what process?

Importing data

_____ is data about data; in database management, it helps data analysts understand the contents of the data within a database.

Metadata

Data analysts use guidelines to describe a file's version, content, and date created. What are these guidelines called?

Naming conventions

Which of the following are types of data bias often encountered in data analytics? Select all that apply

Observer bias, interpretation bias, and confirmation bias

What aspect of data ethics promotes the free access, usage, and sharing of data?

Openness

What is data privacy?

Preserving a data subject's information and activity for all data transactions

Relational databases contain a series of tables connected to form relationships. Which two types of fields exist in two connected tables?

Primary and foreign keys

Collected by a researcher from first-hand sources- Data from an interview you conducted - Data from a survey returned from 20 participants - Data from questionnaires you got back from a group of workers

Primary data

You are writing a SQL query to filter data from a database that describes trees in Omaha, Nebraska. You want to only display entries for trees that have a diameter of 30 inches. The name of the table you're using is Nebraska_trees and the name of the column that shows the diameters of the trees is trunk_diameter. What is the correct query syntax that will retrieve and filter data from this table?

SELECT * FROM Nebraska_trees WHERE trunk_diameter = 30

Organizations such as the U.S. Centers for Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis?

Second-party data

What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?

Sorting

A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections a piece of data lives in?

Structural

A data analyst at a construction company is working on a report for a quickly approaching deadline. Why might they choose to analyze only historical data?

The project has a very short time frame.

A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy?

The same piece of data being stored in two different places

_____ data is sold by a provider that didn't collect the data themselves.

Third-party

When writing a query, the name of the dataset can either be inside two backticks, or not, and the query will still run properly.

True

_____ is inaccurate, incomplete, and biased

Unreliable data

An unbiased sample is representative of the population being measured. Which of the following helps ensure unbiased sampling?

Using random sampling during data collection

Which of the following are usually good data sources? Select all that apply.

Vetted public datasets, academic papers, and governmental agency data are usually good data sources.

The use of external data is particularly valuable in which circumstances?

When analysis depends on as many data sources as possible

A table in a relational database can have only one foreign key.

false

What can a data analyst achieve more easily with a metadata repository? Select all that apply

bring together multiple sources of data, confirm how or when data was collected, and verify that data from an outside source is being used appropriately.

A data analyst is evaluating data to determine whether it is good or bad. Which qualities characterize good data? Select all that apply.

comprehensive, cited and current

Which of the following "C's" describe qualities of good data? Select all that apply.

comprehensive, current, and cited

The tendency to search for or interpret information in a way that validates pre-existing beliefs is _____ bias.

confirmation

The running time of a movie is an example of _____ data.

continuous

A preference in favor of or against a person, group of people, or thing is called _____. It is an error in data analytics that can systematically skew results in a certain direction.

data bias

To reduce clutter, a data analyst hides cells that contain long, complex formulas. To view the formulas again, the analyst will need to adjust the spreadsheet sharing or encryption settings.

false

What is the process of showing only the data that meets a specified criteria while hiding the rest?

filtering

A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?

filtering out non-condominium sales

Which of the following are qualities of unreliable data? Select all that apply.

inaccurate, incomplete, and biased

If you create a database table and include a primary key in the table, what must you ensure? Select all that apply.

must be unique and its value must not be null or blank.

What are the characteristics of unstructured data? Select all that apply.

not organized, although it may have an internal structure

A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply.

not original or current

A company needs to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply.

standardize the data and evaluate the third-party data's quality and credibility

Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.

store, search, and analyze

Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply.

student ID number, enrollment date, and classes the student is enrolled in represent structural metadata

A relational database contains a series of _____ that can be connected to form relationships

tables

Structured data is likely to be found in which of the following formats? Select all that apply.

tables and spreadsheets

Consistent naming conventions describe which properties of a file? Select all that apply.

the content, creation date, and version of a file

Data analysts use foldering to achieve what goals? Select all that apply.

to keep project-related files together and organize them into subfolders

A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation

true

Data anonymization applies to both text and images.

true

Metadata is data about data. What kinds of information can metadata offer about a particular dataset? Select all that apply.

type of data, if it is clean and reliable, and how it can be combined with another dataset

What tasks can data analysts accomplish using metadata? Select all that apply.

use metadata to combine data, evaluate data, and interpret a database.

Continuous data is measured and has a limited number of values.

false

Data security involves using _____ to protect data from unauthorized access or corruption.

safety measures

A _____ is a part of a population that is representative of the population.

sample

A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of?

sampling bias

In MySQL, what is acceptable syntax for the SELECT keyword? Select all that apply.

select and SELECT

A CSV file makes it easier for data analysts to complete which tasks? Select all that apply.

examine a small part of a large dataset, import data to a new spreadsheet, and distinguish values from one another

Which of the following terms are also ways of describing observer bias? Select all that apply.

experimenter bias or research bias

A table in a relational database is allowed to have multiple _____.

foreign keys

Data _____ is the process of ensuring the formal management of a company's data assets.

governance

If you need an immediate answer, you might not have time to collect new data. In this case, you would need to use _____ that already exists.

historical data

A data analyst reviews a spreadsheet of boat auction sales to find the last five sailboats sold in Kentucky. What steps would they take in order to narrow the scope? Select all that apply.

The analyst can filter out sales outside of Kentucky and sort by date in descending order.

If you have a short time frame for data collection and need an answer immediately, you likely will have to use historical data.

True

A _____ is an identifier that references a database column in which each value is unique.

primary key

Which of the following are commonly used methods for anonymizing data? Select all that apply

Blanking, hashing, and masking

A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars?

By return date, in descending order

What does data transformation enable data analysts to accomplish?

Change the structure of the data

A CSV file saves data in a table format. What does CSV stand for?

Comma-separated values

What is the process of protecting people's private or sensitive data by eliminating identifying information?

Data anonymization

Which of the following is a benefit of internal data?

Internal data is more reliable and easier to collect.

Which type of bias is the tendency to always construe ambiguous situations in a positive or negative way?

Interpretation

To determine if a data source is cited, you should ask which of the following questions? Select all that apply.

Is this dataset from a credible organization? and "Who created this dataset?

A data analyst reviews a national database of movie theater showings. They want to find the first movies shown in San Francisco in 2001. How can they organize the data to return the first 10 movies shown at the top of their list? Select all that apply.

The analyst can filter out showings outside of San Francisco in 2001 and sort by date in ascending order

A group of high school students take a survey that asks," Are you on an athletic team? Please reply yes or no." What kind of data is being collected?

boolean

if you are analyzing trends over time, make sure you use time series data — in other words, data that includes _____.

dates

CSV files use plain text and are _____ by characters, such as a comma.

delineated

Data _____ refers to well-founded standards of right and wrong that dictate how data is collected, shared, and used.

ethics

Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply

everyone must be able to use, reuse, and redistribute open data. Also, no one can place restrictions on data to discriminate against a person or group.

In the following FROM clause, what is the table name in the SQL query? FROM bigquery-public-data.sunroof_solar.solar_potential_by_postal_code

solar_potential_by_postal_code

second-party data is collected directly by another group and then _____.

sold

In general, the usefulness of data decreases as time passes.

true

To align file naming and storage practices, it's useful to develop metadata practices with your data analytics team.

true

Boolean data has only _____ possible values, such as yes or no.

two

A delineator indicates a boundary or separation between _____.

two things

A clinic surveys a group of male and female patients about their experience with physical therapy. The survey does not include people with disabilities. Is the survey data biased?

yes

The government of a large city collects data on the quality of the city's infrastructure. Any business, nonprofit organization, or person can access the government's databases and re-use or redistribute the data. Is this an example of open data?

yes

In BigQuery, what optional syntax can be removed from the following FROM clause without stopping the query from running? FROM `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code`

Backticks

Data analysts create hierarchies to organize their folders. How are folder hierarchies structured?

Broad topics at the top, then more specific topics below

A data analyst removes personally identifying information from a dataset. What task are they performing?

Data anonymization

_____ is the process of creating a model that is used for organizing data elements and how they relate to one another.

Data modeling

_____ is the process of changing the data's format, structure, or values

Data transformation

Data analysts use naming conventions to help them identify or locate a file. Which of the following is an example of an effective file name?

Elementary_Students_20090221_V03

The _____ lets you move forward if either one of your two conditions is met.

OR operator

Which method of data-collection is most commonly used by scientists?

Observations

Which of the following statements accurately describes a key difference between wide and long data?

Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

What aspects of a file do file-naming conventions typically describe?

a file's content, creation date, and version number

To separate current from past work and reduce clutter, data analysts create _____. This involves moving files from completed projects to a separate location.

archives

Data analysts use a process called encryption to organize folders into subfolders

false

Imagine that a company uses your personal data as part of a financial transaction. Before it occurs, you are not made aware of the nature and scale of this transaction. What concept of data ethics does this violate?

openness

An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply.

ordinal and discrete

In data analytics, a _____ refers to all possible data values in a certain dataset.

population

Which of the following are examples of sampling bias? Select all that apply

A survey of high-school-age students that does not include homeschooled students, a national election poll that only interviews people with college degrees, and a clinical study that includes three times more men than women are not representative of the population.

In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?

A unique data variable

Successful file naming conventions include information that's useful when trying to locate or update a file. Which of the following is an effective file name?

AirportCampaign_2013_10_09_V01

Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?

Consent

A data analyst creates a spreadsheet with five tabs. They want to share the data in tabs 1-4 with a client. Tab 5 contains private information about other clients. Which of the following tactics will enable them to keep tab 5 private?

Copying tabs 1-4 into a separate spreadsheet, then sharing the new file with the client will keep tab 5 private. In addition, making a copy of the spreadsheet, deleting tab 5, then sharing the new file with the client will keep tab 5 private.

_____ data isn't limited to dollar amounts. Examples of other discrete data are stars and points.

Discrete

Which of the following is an example of unstructured data?

Email message

_____ is preferred when Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank and Performing advanced statistical analysis or graphing

Long data

Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.

OR

Foldering may be used by data analysts to organize folders into what?

Subfolders

There are 50 students in a class. A data analyst wants to know if a majority of students like the instructor. They decide to survey the 15 students who earned an A in the class because these students were clearly paying attention to the instructor. Which of the following statements best describes this sample?

This is a biased sample because it only includes students who earned A's. It's not representative of the population.

_____ is preferred when Creating tables and charts with a few variables about each subject and Comparing straightforward line graphs

Wide data

_____ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.

Transaction transparency

_____ diagrams are very detailed diagrams that describe the structure of a system by showing the system's entities, attributes, operations, and their relationships

Unified Modeling Language (UML)

Which statements define primary keys and foreign keys and describe their relationship? Select all that apply.

an identifier that references a column in which each value is unique. A foreign key is a field within a table that's a primary key in another table. Primary and foreign keys are two connected identifiers within separate tables in a relational database.

When partial measurements (half-stars or quarter-points) aren't allowed, the _____ is discrete. If you don't accept anything other than full stars or points, the data is considered discrete.

data

When using data security measures, analysts can choose between protecting an entire spreadsheet or protecting certain cells within the spreadsheet.

encryption and sharing permissions

Nominal qualitative data has a set order or scale.

false

When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column?

field

Data that you collect yourself is called _____ data.

first-party

What are the benefits of data modeling? Select all that apply.

keeps data consistent, provides a map of how data is organized, and makes data easier to understand.

The AND operator lets you stack _____ conditions.

multiple

Secondary data is gathered by _____ or from other research - Data you bought from a local data analytics firm's customer profiles - Demographic data collected by a university - Census data gathered by the federal government

other people

Which of the following are protections afforded by data privacy? Select all that apply.

preserving a data subject's information and activity for all data transactions. They also include providing users the right to inspect, update, and correct their own data.

What tools can data analysts use to control who can access or edit a spreadsheet?

true

Which of the following are uses of relational databases? Select all that apply.

used to contain and describe a series of tables that can be connected to form relationships. They also present the same information to each collaborator by keeping data consistent regardless of where it's accessed.

Which of the following values are examples of a Boolean data type? Select all that apply.

yes/no and true/false


Conjuntos de estudio relacionados

4444- All of the quizzes to date

View Set

Pharaoh Khufu and the Great Pyramid

View Set

MSN5050 Management of Chronic Diseases: Week 1A & 1B

View Set

Chapter 1 Yin and Yang - "TCVM Fundamental Principles"

View Set

Spotlight on Obesity - Nutrition

View Set

Forensics Chapter 13: metals, paints, and soil

View Set