Prepare Data for Exploration - Course 3
Data security involves using _________ to protect data from unauthorized access or corruption. 1. safety measures 2. data validation 3. foldering 4. metadata
1. safety measures
A relational database contains a series of _______ that can be connected to form relationships. 1. spreadsheets 2. fields 3. tables 4. cells
3. tables
In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data? 1. A unique format 2. A specific data type 3. A specific constraint 4. A unique data variable
4. A unique data variable
Which of the following situations are examples of bias? Select all that apply. 1. A dancing competition judge who is a close friend of the dancer who wins the competition 2. A scholar who only reads sources that support their argument 3. A researcher who surveys a sample group that is representative of the population 4. A daycare that won't hire men for childcare positions
1. A dancing competition judge who is a close friend of the dancer who wins the competition 2. A scholar who only reads sources that support their argument 4. A daycare that won't hire men for childcare positions
In instances when collecting data from an entire population is challenging, data analysts may choose to use what? 1. A sample 2. A segment 3. A selection 4. A specimen
1. A sample
Which of the following are usually good data sources? Select al that apply. 1. Academic papers 2. Vetted public datasets 3. Governmental agency data 4. Social media sites
1. Academic papers 2. Vetted public datasets 3. Governmental agency data
There are 50 students in a class. A data analyst wants to know if a majority of students like the instructor. They decide to survey the 15 students who earned an A in the class because these students were clearly paying attention to the instructor. Which of the following statements best describes this sample? 1. Biased 2. Impartial 3. Objective 4. Representative
1. Biased
What does data transformation enable data analysts to accomplish? 1. Change the structure of the data 2. Restore the data after it has been lost 3. Retrieve the data faster 4. Inspect the data for accuracy
1. Change the structure of the data
Which of the following 'Cs' describe qualities of good data? Select all that apply. 1. Cited 2. Comprehensive 3. Consequential 4. Current
1. Cited 2. Comprehensive 4. Current
A CSV file saves data in a table format. What does CSV stand for? 1. Comma-separated values 2. Cell-structured variables 3. Compatible scientific variables 4. Calculated spreadsheet values
1. Comma-separated values
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called? 1. Consent 2. Privacy 3. Discretion 4. Currency
1. Consent
A CSV file makes it easier for data analysts to complete which tasks? Select all that apply. 1. Examine a small subset of a large dataset 2. Manage multiple tabs within a worksheet 3. Import data to a new spreadsheet 4. Distinguish values from one another
1. Examine a small subset of a large dataset 3. Import data to a new spreadsheet 4. Distinguish values from one another
Which of the following terms are also ways of describing observer bias? Select all that apply. 1. Experimenter bias 2. Perception bias 3. Spectator bias 4. Research bias
1. Experimenter bias 4. Research bias
What are some key benefits of using external data? Select all that apply. 1. External data has broad reach 2. External data is free to use 3. External data is always reliable 4. External data can provide industry-level perspectives
1. External data has broad reach 4. External data can provide industry-level perspectives
A data analyst reviews a national database of movie theater showings. They want to find the first movies shown in San Francisco in 2001. How can they organize the data to return the first 10 movies shown at the top of their list? Select all that apply. 1. Filter out showings not in 2001 2. Filter out showings outside of San Francisco 3. Sort by date in ascending order 4. Sort by date in descending order
1. Filter out showings not in 2001 2. Filter out showings outside of San Francisco 3. Sort by date in ascending order
To keep a header row at the top of a spreadsheet, highlight the row and select _______ from the View menu. 1. Freeze 2. Lock 3. Set 4. Pin
1. Freeze
Why is internal data considered more reliable and easier to collect than external data? 1. Internal data lives within a company's own systems 2. Internal data circumvents privacy restrictions 3. Internal data comes from people you know 4. Internal data has much larger sample sizes
1. Internal data lives within a company's own systems
To determine if a data source is cited, you should ask which of the following questions? Select all that apply. 1. Is this dataset from a credible organization? 2. Who created this dataset? 3. Is the data relevant to the problem I'm trying to solve? 4. Has this dataset been properly cleaned?
1. Is this dataset from a credible organization? 2. Who created this dataset?
Which of the following is an example of continuous data? 1. Movie run time 2. Movie budget 3. Box office returns 4. Leading actors in movie
1. Movie run time
Which of the following are examples of discrete data? Select all that apply. 1. Number of actors in movie 2. Movie budget 3. Movie running time 4. Box office returns
1. Number of actors in movie 2. Movie budget 4. Box office returns
What are the main benefits of open data? Select all that apply. 1. Open data makes good data more widely available 2. Open data increases the amount of data available for purchase 3. Open data combines data from different fields of knowledge 4. Open data restricts data access to certain groups of people
1. Open data makes good data more widely available 3. Open data combines data from different fields of knowledge
Relational databases illustrate relationships between tables. Which fields represent the connection between these tables? Select all that apply. 1. Primary keys 2. Foreign keys 3. Secondary keys 4. External keys
1. Primary keys 2. Foreign keys
Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply. 1. Store 2. Search 3. Analyze 4. Rewrite
1. Store 2. Search 3. Analyze
Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply. 1. Student's ID number 2. Grades the student earns 3. Student's enrollment date 4. Classes the student is enrolled in
1. Student's ID number 3. Student's enrollment date 4. Classes the student is enrolled in
Structured data is likely to be found in which of the following formats? Select al that apply. 1. Table 2. Audio file 3. Digital photo 4. Spreadsheet
1. Table 4. Spreadsheet
Ownership is a key issue in data ethics. Who owns data? 1. The individual who originally generates the data 2. The organization that invests time and money collecting, processing, and analyzing the data 3. The law enforcement agencies that enforce data protection laws 4. The government that passes data-protection legislation
1. The individual who originally generates the data
Data analysts use metadata for what tasks? Select all that apply. 1. To interpret the contents of a database 2. To evaluate the quality of data 3. To perform data analyses 4. To combine data from more than one source
1. To interpret the contents of a database 2. To evaluate the quality of data 4. To combine data from more than one source
An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This concept refers to which aspect of data ethics? 1. Transaction transparency 2. Consent 3. Currency 4. Ownership
1. Transaction transparency
A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation. 1. True 2. False
1. True
Data anonymization applies to both text and images. 1. True 2. False
1. True
In general, the usefulness of data decreases as time passes. 1. True 2. False
1. True
To align file naming and storage practices, it's useful to develop metadata practices with your data analytics team. 1. True 2. False
1. True
When using data security measures, analysts can choose between protecting an entire spreadsheet or protecting certain cells within the spreadsheet. 1. True 2. False
1. True
What can a data analyst achieve more easily with a metadata repository? Select all that apply. 1. Verify that data from an outside source is being used appropriately 2. Bring together multiple sources of data 3. Confirm how or when data was collected 4. Identify trustworthy third-party data providers
1. Verify that data from an outside source is being used appropriately 2. Bring together multiple sources of data 3. Confirm how or when data was collected
A clinic surveys a group of male and female patients about their experience with physical therapy. The survey does not include people with disabilities. Is the survey data biased? 1. Yes 2. No
1. Yes
The government of a large city collects data on the quality of the city's infrastructure. Any business, nonprofit organization, or person can access the government's databases and re-used or redistribute the data. Is this an example of open data? 1. Yes 2. No
1. Yes
Which of the following values are examples of a Boolean data type? Select all that apply. 1. Yes or no 2. True or false 3. Yes, no, or unsure 4. One, two, or three
1. Yes or no 2. True or false
A ______ is an identifier that references a database column in which each value is unique. 1. primary key 2. foreign key 3. relation 4. field
1. primary key
In the following FROM clause, what is the name in the SQL query? FROM bigquery-public-data.sunroof_solar.solar_potential_by_postal_code 1. solar_potential_by_postal_code 2. sunroof_solar 3. public-data.sunroof 4. solar.solar
1. solar_potential_by_postal_code
Which of the following are examples of sampling bias? Select all that apply. 1. An online marketing analytics firm stores data in a spreadsheet 2. A national election poll only interviews people with college degrees 3. A clinical study three times more men than women 4. A survey of high-school-age students does not include homeschooled students
2. A national election poll only interviews people with college degrees 3. A clinical study three times more men than women 4. A survey of high-school-age students does not include homeschooled students
A data analyst completes a project. They move project files to another location to keep them separate from their current work. This is an example of what process? 1. Duplicating files 2. Archiving files 3. Destroying files 4. Renaming files
2. Archiving files
In BigQuery, what optional syntax can be removed from the following FROM clause without stopping the query from running? FROM `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code` 1. Dashes 2. Backticks 3. Underscores 4. Dots
2. Backticks
Which of the following are commonly used methods for anonymizing data? Select all that apply. 1. Deleting 2. Blanking 3. Masking 4. Hashing
2. Blanking 3. Masking 4. Hashing
When working with data from an external source, what can metadata help data analysts do? Select all that apply. 1. Choose which analyses to run 2. Combine data from more than one source 3. Understand the contents of a database 4. Ensure data is clean and reliable
2. Combine data from more than one source 3. Understand the contents of a database 4. Ensure data is clean and reliable
To track people's online activities and interests, which method of data collection is most effective? 1. Surveys 2. Cookies 3. Observations 4. Interviews
2. Cookies
What is the process of structuring folders broadly at the top, then breaking down those folders into more specific topics? 1. Producing a backup 2. Creating a hierarchy 3. Assigning naming conventions 4. Developing metadata
2. Creating a hierarchy
The right to inspect, update, or correct your own data is part of which aspect of data ethics? 1. Data ownership 2. Data privacy 3. Data consent 4. Data openness
2. Data privacy
A data analyst adds sharing permissions to limit who can edit the data contained within a file. This is an example of what? 1. Data validation 2. Data security 3. Data integrity 4. Data ethics
2. Data security
What tools can data analysts use to control who can access or edit a spreadsheet? Select all that apply. 1. Tabs 2. Encryption 3. Sharing permissions 4. Filters
2. Encryption 3. Sharing permissions
Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply. 1. All corporations are allowed to sell open data 2. Everyone must be able to use, re-use, and redistribute open data 3. Certain groups of people must share their private data 4. No one can place restrictions on data to discriminate against a person or group
2. Everyone must be able to use, re-use, and redistribute open data 4. No one can place restrictions on data to discriminate against a person or group
A key aspect of open data is free access to people's personal information. 1. True 2. False
2. False
A table in relational database can have only one foreign key. 1. True 2. False
2. False
An employer accesses an employee's credit report without their consent. This is not a violation of the employee's privacy because they work at the company. 1. True 2. False
2. False
An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This is called ownership. 1. True 2. False
2. False
When writing a query, it's necessary for the name of the dataset to be inside two backticks in order for the query to run properly. 1. True 2. False
2. False
When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column? 1. Point 2. Field 3. Character 4. Subject
2. Field
A data analytics team labels its files to indicate their content, creation date, and version number. The team is using what data organization tool? 1. File-naming verifications 2. File-naming conventions 3. File-naming references 4. File-naming attributes
2. File-naming conventions
What are the characteristics of unstructured data? Select all that apply. 1. Has a clearly identifiable structure 2. Is not organized 3. May have an internal structure 4. Fits neatly into rows and columns
2. Is not organized 3. May have an internal structure
Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator ________ expands the number of results when used in a keyword serach. 1. WITH 2. OR 3. AND 4. NOT
2. OR
What aspect of data ethics promotes the free access, usage, and sharing of data? 1. Privacy 2. Openness 3. Transaction transparency 4. Consent
2. Openness
An entertainment website displays a start rating for a movie based on user reviews. Users can select from one to five whole starts to rate the movie. The star rating is an example of what type of data? Select all that apply. 1. Nominal 2. Ordinal 3. Discrete 4. Continuous
2. Ordinal 3. Discrete
Organizations such as the U.S Centers for Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis? 1. Third-party data 2. Second-party data 3. Multiple-party data 4. First-party data
2. Second-party data
In what circumstance might a data analyst choose not to use external data in their analysis? 1. The data is too thorough 2. The data cannot be confirmed to be reliable 3. The data is free for anyone to access 4. The data represents diverse perspectives
2. The data cannot be confirmed to be reliable
A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply. 1. The data is biased 2. The data is not current 3. The data is not original 4. The data is not accurate
2. The data is not current 3. The data is not original
If you create a database table and include a primary key in the table, what must you ensure? Select all that apply. 1. The primary key isn't a foreign key in another table 2. The primary key is unique 3. The primary key's value isn't null or blank 4. The primary key has a numeric value
2. The primary key is unique 3. The primary key's value isn't null or blank
Data analysts use foldering to achieve what goals? Select all that apply. 1. To transfer files from one place to another 2. To keep project-related files together 3. To organize files into subfolders 4. To assign metadata about the folders
2. To keep project-related files together 3. To organize files into subfolders
A company needs to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply. 1. Alter the company's metadata to more closely reflect the incoming metadata 2. Use the metadata to evaluate the third-party data's quality and credibility 3. Replace the incoming data's metadata with its own company metadata 4. Use the metadata to standardize the data
2. Use the metadata to evaluate the third-party data's quality and credibility 4. Use the metadata to standardize the data
To separate current from past work and reduce clutter, data analysts create ________. This involves moving files from completed projects to a separate location. 1. copies 2. archives 3. structures 4. backups
2. archives
The tendency to search for or interpret information in a way that validates pre-existing beliefs is ________ bias. 1. sampling 2. confirmation 3. observer 4. interpretation
2. confirmation
Data transformation enables data analysts to change the ______ of the data. 1. value 2. structure 3. accuracy 4. meaning
2. structure
The data and time a photo was taken is an example of which kinda of metadata? 1. Representative 2. Structural 3. Administrative 4. Descriptive
3. Administrative
The data-collection process involves deciding what data to use, determining how much data to collect, and selecting the right data type. Which of the following are also steps in the data-collection process? Select all that apply. 1. Analyzing the data to answer business questions 2. Creating data visualizations 3. Choosing data sources 4. Determining the time frame
3. Choosing data sources 4. Determining the time frame
A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers? 1. Administrative 2. Representative 3. Descriptive 4. Structural
3. Descriptive
which of the following is an example of unstructured data? 1. GPS location 2. Contact saved on a phone 3. Email message 4. Rating of a local favorite restaurant
3. Email message
A data analyst wants to bring data from a CSV file into a spreadsheet. This is an example of what process? 1. Editing data 2. Normalizing data 3. Importing data 4. Filing data
3. Importing data
Which of the following questions collects nominal qualitative data? 1. How many times have you dined at this restaurant? 2. How many people do you usually dine with? 3. Is this your first time dining at this restaurant? 4. On a scale of 1 - 10, how would you rate your service today?
3. Is this your first time dining at this restaurant?
Which method of data-collection is most commonly used by scientists? 1. Questionnaires 2. Surveys 3. Observations 4. Interviews
3. Observations
In MySQL, what is acceptable syntax for the SELECT keyword? Select all that apply. 1. 'SELECT' 2. "SELECT" 3. SELECT 4. select
3. SELECT 4. select
A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of? 1. Confirmation bias 2. Observer bias 3. Sampling bias 4. Interpretation bias
3. Sampling bias
What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize? 1. Filtering 2. Prioritizing 3. Sorting 4. Reframing
3. Sorting
A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections a piece of data lives in? 1. Descriptive 2. Representative 3. Structural 4. Administrative
3. Structural
Foldering may be used by data analysts to organize folders into what? 1. Version 2. Tables 3. Subfolders 4. Databases
3. Subfolders
A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy? 1. A database that forms two or more relationships 2. Team members in different office locations working with the same data 3. The same piece of data being stored in two different places 4. A database containing two foreign keys
3. The same piece of data being stored in two different places
The use of external data is particularly valuable in which circumstances? 1. When analysis requires a lot of structured data 2. When analysis involves data that hasn't been cleaned 3. When analysis depends on as many data sources as possible 4. When analysis includes data from audio files
3. When analysis depends on as many data sources as possible
CSV files use plain text and are ______ by characters, such as a comma. 1. detailed 2. described 3. delineated 4. defined
3. delineated
In data analytics, a _______ refers to all possible data values in a certain dataset. 1. representation 2. sample 3. population 4. source
3. population
Data analysts create hierarchies to organize their folders. How are folder hierarchies structured? 1. Broad topics at the left, then more specific topics at the right 2. Specific topics at the top, then more broad topics below 3. Broad topics at the right, then more specific topics at the left 4. Broad topics at the top, then more specific topics below
4. Broad topics at the top, then more specific topics below
A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars? 1. By return date, in ascending order 2. By car numerical ID, in descending order 3. By car numerical ID, in ascending order 4. By return date, in descending order
4. By return date, in descending order
A database table is named blueFlowers. What type of case is this? 1. Snake case 2. Lowercase 3. Sentence case 4. Camel case
4. Camel case
If a company uses your personal data as part of a financial transaction, you should be made aware of the nature and scale of the transaction. What concept of data ethics does this refer to? 1. Ownership 2. Privacy 3. Consent 4. Currency
4. Currency
A data analyst remove personally identifying information from a dataset. What task are they performing? 1. Data sorting 2. Data visualization 3. Data collection 4. Data anonymization
4. Data anonymization
A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can they analyst narrow their scope? 1. Sort by non-condominium sales 2. Filter out condominium sales 3. Sort by condominium sales 4. Filter out non-condominium sales
4. Filter out non-condominium sales
Which type of bias is the tendency to always construe ambiguous situations in a positive or negative way? 1. Confirmation 2. Observer 3. Sampling 4. Interpretation
4. Interpretation
Data analysts use guidelines to describe a file's version, content, and date created. What are these guidelines called? 1. Naming attributes 2. Naming verifications 3. Naming references 4. Naming conventions
4. Naming conventions
A data analyst at a book publisher is working on an urgent report for executives. They are using only historical data. What is the most likely reason for choosing to analyze only historical data? 1. The data is constantly changing 2. There is plenty of time to research historical data 3. The data is unknown 4. The project has a very short time frame
4. The project has a very short time frame
______ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data. 1. Openness 2. Currency 3. Privacy 4. Transaction transparency
4. Transaction transparency
An unbiased sample is representative of the population being measured. Which of the following helps ensure unbiased sampling? 1. Writing survey questions that encourage specific responses 2. Skewing results in a certain direction 3. Storing data in a spreadsheet 4. Using random sampling during data collection
4. Using random sampling during data collection
Which of the following statements accurately describes a key difference between wide and long data? 1. Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns 2. Every wide data subject has multiple columns. Every long data subject has data in a single column 3. Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns 4. Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes
4. Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes
The running time of a movie is an example of ________ data. 1. discrete 2. nominal 3. qualitative 4. continuous
4. continuous
Data _________ is the process of ensuring the formal management of a company's data assets. 1. aggregation 2. mapping 3. integrity 4. governance
4. governance