D467 WGU Exploring Data
Which statistical power is typically considered the minimum for statistical significance?
0.8 (80%)
Fill in the blank: Typically, a data professional aims to achieve a statistical power of at least _____ to consider their results statistically significant.
0.8, or 80%
To track people's online activities and interests, which method of data collection is most effective?
Cookies
What is the process of structuring folders broadly at the top, then breaking down those folders into more specific topics?
Creating a hierarchy
A data analyst removes personally identifying information from a dataset. What task are they performing?
Data anonymization
Which process do data analysts use to make data more organized and easier to read?
Data manipulation
Which process may restrict data analysis needs and should be balanced with data access needs?
Data security
Which tool is used by data analysts to store and organize data, making it easier for them to manage and access information?
Database
What is an example of conceptual data modeling that an analyst uses?
Defining the business requirements for a new database
Which of the following questions would enable a data professional to collect nominal qualitative data?
Did anyone recommend our music lessons to you?
What data-security measure uses a unique algorithm to alter data and make it inaccessible without the algorithm?
Encryption
In data analytics, what is the term for data that is generated from, and lives, outside of an
External
When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column?
Field
A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?
Filter out non-condominium sales
What is a feature of the filtering process when applied to spreadsheets?
Filtering hides the data temporarily.
If you have to complete your analysis with insufficient data, how should you address this limitation?
Identify trends with the available data
Bringing data from a .csv file into a spreadsheet is an example of what process?
Importing data
How does an analyst apply the principle of ownership to ethics and privacy in data collection?
Individuals who create the data should own it.
A manager in charge of selling a particular product interprets any ambiguous customer feedback about the product as being positive. What type of bias does this represent?
Interpretation
What is the general rule regarding the suggested length of each line in a query to maintain indentation best practices?
Less than or equal to 100 characters
A data analytics team uses data about data to indicate consistent naming conventions for a project. What type of data is involved in this scenario?
Metadata
What is the term for an identifier that references a database column in which each value is unique?
Primary key
Which file name follows formatting conventions?
SalesReport_2021
A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias does this scenario describe?
Sampling
A grocery store chain purchases customer data from a credit card company. The grocer uses this data to identify its most loyal customers and offer them special promotions and discounts. What type of data is being used in this scenario?
Second-party
What are cookies?
Small files stored on computers that contain information about users
What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?
Sorting
Which type of structured data does an analyst use?
Store inventory
A large company has several databases across its many departments. What kind of metadata describes how many locations contain a certain piece of data?
Structural
When using tokenization as a safety measure, what is replaced as a randomly generated token?
The data elements to be protected
In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?
The entire population
What concept states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data?
Transaction transparency
What is the best practice for naming folders and subfolders to organize data?
Use descriptive names.
A data analyst at a retail company uses a tool to explore the data in its customer database. They learn the definition of each column, the data types contained, and the relationships between different tables. What does this scenario describe?
Using a metadata repository
A data analyst wants to find out how many middle school students in Helsinki have laptops. It is unlikely that they can survey every middle schooler in the city. Instead, they survey enough students to represent all middle schoolers. This describes what data analytics concept?
Using a sample
A political scientist needs to poll all voters in Seoul, South Korea, in order to predict the outcome of an election. Because it would be impossible to collect data from every single person in the city, the political scientist polls a part of the population that is representative of the whole. What does this scenario describe?
Using a sample
What data-security practice enables all collaborators within a file to track changes, such as who made what edits to the file, when they were made, and why?
Version control
Fill in the blank: When using SQL, the _____ clause can be used to filter a dataset of customers to only include people who have made a purchase in the past month.
WHERE
What are data ethics?
Well-founded standards of right and wrong that dictate how data is collected, shared, and used
Fill in the blank: A data type is a specific kind of data _____ that tells what kind of value the data is.
attribute
Fill in the blank: Sampling _____ occurs when some members of a population are overrepresented or underrepresented in the data.
bias
Fill in the blank: Bias is a _____ preference in favor of or against a person, group of people, or thing.
conscious or subconscious
Fill in the blank: Naming _____ are consistent guidelines used to describe the content, date, or version of a file.
conventions
Fill in the blank: The number of stars awarded to a product review is an example of _____ data.
discrete
Fill in the blank: For data analytics projects, _____ data is typically preferred because users know it originated within the organization.
first-party
A data analyst uses _____ to organize multiple files for a given project so they can be found and accessed in an efficient manner.
foldering
Fill in the blank: Openness refers to _____ access, usage, and sharing of data.
free
Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.
freeze
Fill in the blank: Data _____ is a process data professionals use to ensure the formal management of their organization's data assets.
governance
Fill in the blank: Data professionals use data _____ to handle issues related to internal and external data flows while ensuring data assets are formally managed.
governance
Fill in the blank: To keep files organized, use a logical _____ to organize folders and subfolders.
hierarchy
Fill in the blank: Data _____ involves the accuracy, completeness, consistency, and trustworthiness of data throughout its lifecycle.
integrity
Fill in the blank: Hypothesis testing is a way to see if a survey or experiment has _____ results.
meaningful
What is an acceptable syntax for the SELECT keyword in MySQL?
select
Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.
tables
You are in charge of your company's weekly accounting spreadsheet. It has 15 sheets, each containing a different employee's purchases. You add restrictions to the spreadsheet to make sure employees can only edit their own sheets. What practice does this scenario describe?
Data security
A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?
Descriptive
Which element of a Notepad file would be considered data as opposed to underlying metadata?
File contents
What is an example of administrative metadata for a digital file?
File permission
A data team at a trade school is sending a text alert to all students who have fewer than 10 credits. What spreadsheet tool will enable them to display only the students who meet that condition?
Filter out students with more than 10 credits
Which process utilizes logical and descriptive names for files, making them easier to find and use?
Foldering
A data analyst works on an urgent traffic study. As a result of the short time frame, which type of data might yield the best results?
Historical
In Google Sheets, what function enables a data analyst to specify a range of cells in one spreadsheet to be duplicated in another?
IMPORTRANGE
How does an analyst ensure that a data source is reliable?
It is accurate, complete, and unbiased information.
An expert in query languages searched for month_name = using Vertica. The data set contains variations of the word December, such as dec, Dec, etc. What will the output of this search query be?
It will return all entries that match DEC only.
What process do data professionals use to eliminate data redundancy, increase data integrity, and reduce complexity in a database?
Normalization
When using long data, each subject has data in multiple rows. This is because each row represents what?
One observation per subject
A financial institution publishes data about stock prices and market trends, which any business, nonprofit, or citizen can access, reuse, or redistribute through its online databases. What type of data is described in this scenario?
Open
What leads to confirmation bias in data collection?
People search to verify preexisting beliefs.
In data analytics, what term refers to all possible data values in a dataset?
Population
An analyst used a column of a table to uniquely identify each record within a table. Which tool did they use?
Primary key
Legal right to access your data is an element of which aspect of data ethics?
Privacy
What is the difference between raw data and information?
Raw data is unorganized, while information is structured.
A research team conducts an experiment to determine if a new cybersecurity tool is more effective than the previous version. What type of results are required for the experiment to be statistically significant?
Results that are real and not caused by random chance
Fill in the blank: A data model is used to organize _____ and how they relate to one another.
data elements
Fill in the blank: When using a relational database, data analysts write _____ to request data from the related tables.
queries
Fill in the blank: Data is considered _____ when it is accurate, complete, and unbiased information that has been vetted and proven fit for use.
reliable
Fill in the blank: Data security involves adopting _____ in order to protect data from unauthorized access or corruption.
safety measures
Fill in the blank: The data ethics principle of _____ states that an individual has the right to understand all of the data processing activities and algorithms used on their data.
transaction transparency
Which example shows the use of primary data?
A company's survey data of its customers' satisfaction
What is the preferred method for open data to be made available?
A convenient and modifiable internet download
Which of the following examples would be the most effective file name?
AirportCampaign_2013_10_09_V01
What can be removed from the following query without preventing it from running? SELECT * FROM `Uni_dataset.new_table` WHERE ID = 'Lawrence'
Backticks (`)
What should an analyst consider at the start of data collection to reduce errors?
Bias and fairness
A data scientist at a tech company records whether users have accepted their company's terms of service or not. What data type is being collected in this scenario?
Boolean
How does an analyst apply the principle of consent to ethics and privacy in data collection?
By disclosing how and why the data will be used before the survey
A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How should they sort the spreadsheet to find the most recently returned cars?
By return date, in descending order
How does an analyst apply data ethics to privacy and collection?
By using the collected data responsibly
A database table is named WebTrafficAnalytics. What type of case is this?
Camel case
Before analysis, a company collects data from countries that use different date formats. Which of the following actions would improve the data integrity?
Change all of the dates to the same format
An analyst needs to show the geographic distribution of customers in the United States by region. Which visual representation should this analyst use?
Charts
A data team at a nature preserve researches the origin of a dataset to confirm it was created by a reputable source, such as a nonprofit research institution. Which aspect of good data are they prioritizing?
Cited
In a data table, where are fields contained?
Columns
What type of file saves data in a table format?
Comma-separated values (.csv)
What is the term for the tendency to search for or interpret information in a way that validates pre-existing beliefs?
Confirmation bias
Before completing a survey, a respondent learns more about how their data will be used. They understand why their data is being collected and how long it will be stored. What data ethics concept does this describe?
Consent
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
Consent
What type of data is the height of a skyscraper?
Continuous
Which role does an analyst have in collecting second-party data?
Contracts with an external entity
A junior data analyst learns that the dataset they have been given is six years old. After looking into this further, they also discovered that the age of the data is making the information irrelevant to their project. What good data source principle have they used to evaluate the dataset?
Current