Quiz Questions data prep week 2
The date and time a photo was taken is an example of which kind of metadata?
Administrative
Fill in the blank: To separate current from past work and reduce clutter, data analysts create _____. This involves moving files from completed projects to a separate location.
Archives
The question in column E asks, "Was your order accurate? Please respond yes or no." What kind of data is this?
Boolean Data
A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars?
By return date in descending order
A database table is named blueFlowers. What type of case is this?
Camel Case
What does data transformation enable data analysts to accomplish?
Change the structure of the data
Fill in the blank: The tendency to search for or interpret information in a way that validates pre-existing beliefs is _____ bias.
Confirmation
Using encryption to protect data is an example of what?
Data Security
Question 4 Which of the following is an example of unstructured data? -Email message -Contact saved on a phone -GPS location -Rating of a local favorite restaurant
TRUE or FALSE: Nominal qualitative data has a set order or scale.
False
A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?
Filter out non-condominiums
What process do data analysts use to keep project-related files together and organize them into subfolders?
Foldering
Fill in the blank: Data _____ is the process of ensuring the formal management of a company's data assets.
Governance
A data analyst is working on an urgent traffic study. As a result of the short time frame, which type of data are they most likely to use?
Historical
Reviewing the data enables you to describe how you will use it to achieve your client's goals. First, you notice that all of the data is first-party data. What does this mean
It's data that was collected by Garden employees using the company's own resources.
Fill in the blank: A data analytics team uses _____ to indicate consistent naming conventions for a project. This is an example of using data about data.
Metadata
Data analysts use archiving to separate current from past work. What does this process involve?
Moving files from completed projects to another location
Data analysts use guidelines to describe a file's version, content, and date created. What are these guidelines called?
Naming Conventions
Ownership is a key issue in data ethics. Who owns data?
The individual who originally generates the data
Fill in the blank: _____ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
Transaction Transparency
TRUE or FALSE: Data transformation can change the structure of the data. An example of this is taking data stored in one format and converting it to another.
True
In the following FROM clause, what is the table name in the SQL query? FROM bigquery-public-data.sunroof_solar.solar_potential_by_postal_code
solar_potential_by_postal_code
What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?
sorting
Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.
OR
Which method of data-collection is most commonly used by scientists?
Observations
TRUE or FALSE: A key aspect of open data is free access to people's personal information.
False
Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.
Tables
Successful file naming conventions include information that's useful when trying to locate or update a file. Which of the following is an effective file name? -Data_519 -CampaignData_03 -AirportCampaign_2013_10_09_V01 -May30-2019_AirportAdvertisingCampaignResults_Terminals3-5_InclCustSurveyResponses_PLUS_IdeasforJune
-AirportCampaign_2013_10_09_V01
Which of the following are types of data bias often encountered in data analytics? Select all that apply. -Confirmation bias -Interpretation bias -Observer bias -Educational bias
-Confirmation bias -Interpretation bias -Observer bias
What aspects of a file do file-naming conventions typically describe? Select all that apply. -Collaborators -Creation date -Version number -Content
-Creation date -Version number -Content
Which of the following is an example of continuous data? -Leading actors in movie -Movie budget -Box office returns -Movie run time
-Movie run time
For your final question, your interviewer explains that Sewati Financial Services cares about data privacy. The company needs its clients' trust, and this is an important responsibility for the data analytics team. He asks: What does data privacy involve? Select all that apply. -Putting privacy measures in place to protect people's data -Encryption and sharing permissions -A person's legal right to their data -Preserving a data subject's information and activity any time a data transaction occurs
-Putting privacy measures in place to protect people's data -A person's legal right to their data
A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply. -The data is not current -The data is biased -The data is not original -The data is not accurate
-The data is not current -The data is not original
In BigQuery, what optional syntax can be removed from the following FROM clause without stopping the query from running? FROM `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code`
Backticks
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
Consent
Fill in the blank: File-naming conventions are _____ that describe a file's content, creation date, or version.
Consistant guidelines
What is the process of structuring folders broadly at the top, then breaking down those folders into more specific topics?
Creating a hierarchy
If a company uses your personal data as part of a financial transaction, you should be made aware of the nature and scale of the transaction. What concept of data ethics does this refer to?
Currency
TRUE or FALSE: A social media post is an example of structured data.
False
TRUE or FALSE: When writing a query, you must remove the two backticks around the name of the dataset in order for the query to run properly.
False
Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.
Freeze
Which of the following terms are also ways of describing observer bias? Select all that apply.
Researcher & Experimentor Bias
Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply. -Search -Rewrite -Analyze -Store
Search, Analyze, & Store
Organizations such as the U.S. Centers for Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis?
Second Party Data
A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections a piece of data lives in?
Structual
TRUE or FALSE: In general, the usefulness of data decreases as time passes.
True
Which of the following are examples of sampling bias? Select all that apply. -A clinical study includes three times more men than women. -An online marketing analytics firm stores data in a spreadsheet. -A national election poll only interviews people with college degrees. -A survey of high-school-age students does not include homeschooled students.
-A clinical study includes three times more men than women. -A national election poll only interviews people with college degrees. -A survey of high-school-age students does not include homeschooled students.
Which of the following are usually good data sources? Select all that apply. -Academic papers -Governmental agency data -Vetted public datasets -Social media sites
-Academic papers -Governmental agency data -Vetted public datasets
Think about data as driving a taxi cab. In this metaphor, which of the following are examples of metadata? Select all that apply. -Company that owns the taxi -License plate number -Make and model of the taxi cab -Passengers the taxi picks up
-Company that owns the taxi -License plate number -Make and model of the taxi cab
When working with data from an external source, what can metadata help data analysts do? Select all that apply. -Ensure data is clean and reliable -Combine data from more than one source -Understand the contents of a database -Choose which analyses to run
-Ensure data is clean and reliable -Combine data from more than one source -Understand the contents of a database
A data analyst reviews a national database of movie theater showings. They want to find the first movies shown in San Francisco in 2001. How can they organize the data to return the first 10 movies shown at the top of their list? Select all that apply. -Sort by date in descending order -Filter out showings outside of San Francisco -Filter out showings not in 2001 -Sort by date in ascending order
-Filter out showings outside of San Francisco -Filter out showings not in 2001 -Sort by date in ascending order
Our data analytics team often uses both internal and external data. Describe the difference between the two. -Internal data came from a company's own systems. External data comes from outside the organization. -Internal data is often generated from within the company. External data is generated outside the organization. -External data is often generated from within the company. Internal data is generated outside the organization. -External data came from a company's own systems. Internal data came from the organization.
-Internal data came from a company's own systems. External data comes from outside the organization. -Internal data is often generated from within the company. External data is generated outside the organization.
Which of the following is a benefit of internal data? -Internal data is more reliable and easier to collect. -Internal data is less vulnerable to biased collection. -Internal data is the only data relevant to the problem. -Internal data is less likely to need cleaning.
-Internal data is more reliable and easier to collect.
A data analyst is working with a file from a customer satisfaction survey. The survey was sent to anyone who became a customer between April and June, 2020. Which of the following is an effective name for the file? -Survey_Responses -April_May_June_2020_Responses_to_New_Customer_Survey_ANALYS-SDATA_928310 -Apr-June2020_CustSurvey_V -NewCustomerSurvey_2020-6-20_V03
-NewCustomerSurvey_2020-6-20_V03
Our analysts often work with the same spreadsheet, but for different purposes. How would you use sorting to help in this situation? -Sort data to make it easier to understand, analyze and visualize -Sort the data to arrange data in a meaningful order -Sort data to show only the data that meets a specific criteria while hiding the rest -Sort data to highlight the header row.
-Sort data to make it easier to understand, analyze and visualize -Sort the data to arrange data in a meaningful order
In what circumstance might a data analyst choose not to use external data in their analysis? -The data is free for anyone to access -The data cannot be confirmed to be reliable -The data represents diverse perspectives -The data is too thorough
-The data cannot be confirmed to be reliable
In MySQL, what is acceptable syntax for the SELECT keyword? Select all that apply. -SELECT -"SELECT" -select -'select'
-select SELECT
Fill in the blank: A Boolean data type can have _____ possible values.
2
TRUE or FALSE: Now that you're familiar with the data, you want to build trust with the team at Garden. You decide to impress them by taking the initiative to reach out to your social media followers. You explain that Garden is a new client, and you show them the pictures of Garden's sandwich deliveries from the client file. Then, you ask them if they have any photos of sandwich deliveries that you can evaluate. This is an example of going above and beyond expectations and a great way to build trust.
FALSE
What aspect of data ethics promotes the free access, usage, and sharing of data?
Openness
Fill in the blank: In data analytics, a _____ refers to all possible data values in a certain dataset.
Population
Primary and foreign keys are two connected identifiers within separate tables. These tables exist in what kind of database?
Relational
Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply. -Certain groups of people must share their private data. -All corporations are allowed to sell open data. -No one can place restrictions on data to discriminate against a person or group. -Everyone must be able to use, re-use, and redistribute open data.
-No one can place restrictions on data to discriminate against a person or group. -Everyone must be able to use, re-use, and redistribute open data
What are the main benefits of open data? Select all that apply. -Open data makes good data more widely available. -Open data combines data from different fields of knowledge. -Open data increases the amount of data available for purchase. -Open data restricts data access to certain groups of people.
-Open data makes good data more widely available. -Open data combines data from different fields of knowledge.
A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy? -Team members in different office locations working with the same data -A database that forms two or more relationships -The same piece of data being stored in two different places -A database containing two foreign keys
-The same piece of data being stored in two different places
A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?
Descriptive
Fill in the blank: The running time of a movie is an example of _____ data.
continuous
Data analysts use foldering to achieve what goals? Select all that apply. -To organize files into subfolders -To assign metadata about the folders -To transfer files from one place to another -To keep project-related files together
-To organize files into subfolders -To keep project-related files together
Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the sample is representative of the population as a whole? -Use a randomized sample of the population that includes all genders. -Make sure the sample is chosen at random. -Only include participants who can answer survey questions in a timely manner. -Include clients with disabilities in the survey sample.
-Use a randomized sample of the population that includes all genders. -Make sure the sample is chosen at random. -Include clients with disabilities in the survey sample.
To determine if a data source is cited, you should ask which of the following questions? Select all that apply. -Who created this dataset? -Is the data relevant to the problem I'm trying to solve? -Is this dataset from a credible organization? -Has this dataset been properly cleaned?
-Who created this dataset? -Is this dataset from a credible organization?
Which of the following statements accurately describes a key difference between wide and long data? -Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns. -Every wide data subject has multiple columns. Every long data subject has data in a single column. -Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns. -Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.
-Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.
What is the process of protecting people's private or sensitive data by eliminating identifying information?
Data Anonymization
Fill in the blank: A preference in favor of or against a person, group of people, or thing is called _____. It is an error in data analytics that can systematically skew results in a certain direction.
Data Bias
A data analyst removes personally identifying information from a dataset. What task are they performing?
Data anonymization
TRUE or FALSE: An employer accesses an employee's credit report without their consent. This is not a violation of the employee's privacy because they work at the company.
False
TRUE or FALSE: Data analysts create hierarchies to organize their folders. They do this by structuring folders by specific topics at the top, then more broadly below
False
TRUE or FALSE: To reduce clutter, a data analyst hides cells that contain long, complex formulas. To view the formulas again, the analyst will need to adjust the spreadsheet sharing or encryption settings.
False
TRUE or FALSE: The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is an example of structured data.
False: Unstructured
Fill in the blank: A _____ is an identifier that references a database column in which each value is unique.
Primary Key
The use of external data is particularly valuable in which circumstances? -When analysis includes data from audio files -When analysis involves data that hasn't been cleaned -When analysis requires a lot of structured data -When analysis depends on as many data sources as possible
When analysis depends on as many data sources as possible.