D467 - Exploring Data

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

A data team at a hospital uses metadata to track the source of medical images. They find out which technician took each image, as well as the date and time they were taken and the type of imaging device used. What type of metadata are they using? Descriptive Structural General Administrative

Administrative

A data analyst runs the following query. What do they want to retrieve from the database? SELECT * FROM Customer_Orders WHERE Country = 'Finland' All fields of orders from customers outside of Finland All fields of the customers who have placed an order All fields of the names of customers living in Finland All fields of orders from customers in Finland

All fields of orders from customers in Finland

A data team at a manufacturing company finishes a project about production rates, so they delete the related files. However, they are later tasked with another production project, which the deleted files could have informed. What should the data team have done with the original project files? Print them out Archive them Save them on a USB device Keep them on their local drive

Archive them

What are some benefits of using external data for an analysis project? Select all that apply. External data is very reliable and comes pre-cleaned. External data, when validated and trusted, provides more data points and helps analysts identify broad insights. External data provides industry-level perspectives. External data is always open and free to use.

External data, when validated and trusted, provides more data points and helps analysts identify broad insights. External data provides industry-level perspectives.

A data team at a trade school is sending a text alert to all students who have fewer than 10 credits. What spreadsheet tool will enable them to display only the students who meet that condition? Filter out students who have fewer than 10 credits Filter out students with more than 10 credits Sort the number of student credits in descending order Sort the number of student credits in ascending order

Filter out students with more than 10 credits

Which SQL statement will return only elementary school students from the Grade column of the Students database table? SELECT * FROM Students WHERE Grade = 'elementary' SELECT * FROM Grade WHERE 'elementary' SELECT * FROM Grade = 'elementary' SELECT * WHERE 'elementary'

SELECT * FROM Students WHERE Grade = 'elementary'

An analyst needs to show the geographic distribution of customers in the United States by region. Which visual representation should this analyst use? Sales reports Field descriptions Charts A data dictionary

Charts

Fill in the blank: A data type is a specific kind of data _____ that tells what kind of value the data is. attribute frame model point

attribute

Fill in the blank: The data ethics principle of _____ states that an individual has the right to understand all of the data-processing activities and algorithms used on their data. transaction transparency consent ownership currency

transaction transparency

Which of the following statements accurately describe first-, second-, and third-party data? Select all that apply. -When using third-party data, it's important to confirm its accuracy. -Second-party data is sold by a trusted partner to another party. -Third-party data is collected by an individual or group using their own resources. -A key benefit of using first-party data is that the user knows where it came from.

-When using third-party data, it's important to confirm its accuracy. -Second-party data is sold by a trusted partner to another party. -A key benefit of using first-party data is that the user knows where it came from.

Which of the following are examples of sampling bias? Select all that apply. A teacher gives higher grades to essays written in their own writing style. A clinical study includes three times more men than women. A survey of students does not include homeschooled students. An election poll only interviews people with college degrees.

A clinical study includes three times more men than women. A survey of students does not include homeschooled students. An election poll only interviews people with college degrees.

Which example shows the use of primary data? U.S. census data used by a university A company's survey data of its customers' satisfaction Data purchased from a market firm containing customer profiles Data from a published journal cited in a student's research paper

A company's survey data of its customers' satisfaction

In a data table, where are fields contained? Rows Columns Favorites Charts

Columns

What are the main benefits of open data? Select all that apply. Combines data from different fields of knowledge Good data is more widely available Restricts data access to certain groups of people Increases the amount of data available for purchase

Combines data from different fields of knowledge Good data is more widely available

What type of file saves data in a table format? Calculated spreadsheet values (.csv) Comma-separated values (.csv) Cell-structured variables (.csv) Compatible scientific variables (.csv)

Comma-separated values (.csv)

In data analytics, what is the term for data that is generated from, and lives, outside of an organization? Peripheral Outer Internal External

External

How does an analyst ensure that a data source is reliable? It includes data needed to answer the research question. It is accurate, complete, and unbiased information. It is validated against the original source. It includes data that is current and relevant to the study.

It is accurate, complete, and unbiased information.

An expert in query languages searched for month_name = DEC using Vertica. The data set contains variations of the word December, such as dec, Dec, etc. What will the output of this search query be? It will return all entries that match DEC only. It will return all entries that match dec only. It will return all entries such as dec, Dec, DEC. It will return all entries that match Dec only.

It will return all entries that match DEC only.

What process do data professionals use to eliminate data redundancy, increase data integrity, and reduce complexity in a database? Composition Normalization Manipulation Iteration

Normalization

What is the difference between raw data and information? Raw data is universal, while information is specific. Raw data is organized, while information is unstructured. Raw data is corrupt, while information is secure. Raw data is unorganized, while information is structured.

Raw data is unorganized, while information is structured.

Which of the following terms are also ways of describing observer bias? Select all that apply. Perception bias Research bias Spectator bias Experimenter bias

Research bias Spectator bias Experimenter bias

Which of the following are usually good data sources? Select all that apply. Vetted public datasets Social media sites Governmental agency data Academic papers

Vetted public datasets Governmental agency data Academic papers

An analyst wants to extract data from a table by filtering for certain conditions prior to performing data cleansing. Which Structured Query Language (SQL) statement will perform this function? LENGTH statement DISTINCT statement TRIM statement WHERE statement

WHERE statement

Fill in the blank: Bias is a _____ preference in favor of or against a person, group of people, or thing. conscious or subconscious sensible or insensible fair or unfair standard or substandard

conscious or subconscious

What is the preferred method for open data to be made available? A convenient and modifiable internet download A secure password-protected file A compressed file format that keeps file size small A print copy that is easily shared by anyone

A convenient and modifiable internet download

A data analyst removes personally identifying information from a dataset. What task are they performing? Data anonymization Data sorting Data collection Data visualization

Data anonymization

Which process may restrict data analysis needs and should be balanced with data access needs? Data security Data storage Data organization Data normalization

Data security

Which tool is used by data analysts to store and organize data, making it easier for them to manage and access information? Primary key Table Database Foreign key

Database

What is an outcome of the verification step of data cleansing? Compares dirty data with clean data Ensures the data collected and cleansed will address the original purpose Chronologically documents how the data set evolved during the project Helps build trust in the cleansing process and data

Ensures the data collected and cleansed will address the original purpose

What is the general rule regarding the suggested length of each line in a query to maintain indentation best practices? Less than or equal to 50 characters Greater than or equal to 100 characters Less than or equal to 100 characters Greater than or equal to 50 characters

Less than or equal to 100 characters

An analyst used a column of a table to uniquely identify each record within a table. Which tool did they use? Normalization Field Foreign key Primary key

Primary key

Fill in the blank: A data model is used to organize _____ and how they relate to one another. data visualizations database structures data elements spreadsheet fields

data elements

Fill in the blank: The data ethics principle of transaction transparency states that an individual has the right to understand all of the _____ and algorithms used on their data. free access data-processing activities raw data financial transactions

data-processing activities

Fill in the blank: The number of points scored in a basketball game is an example of _____ data. discrete open nominal continuous

discrete

A data scientist at a tech company records whether users have accepted their company's terms of service or not. What data type is being collected in this scenario? Text Boolean String Numerical

Boolean

A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How should they sort the spreadsheet to find the most recently returned cars? By return date, in ascending order By return date, in descending order By car numerical ID, in ascending order By car numerical ID, in descending order

By return date, in descending order

A data professional working on a project about commuters researches the origin of a dataset to confirm it was created by a reputable source, such as a government transportation agency. Which aspect of good data are they prioritizing? Original Comprehensive Cited Reliable

Cited

What is the term for the tendency to search for or interpret information in a way that validates pre-existing beliefs? Sampling bias Confirmation bias Observer bias Interpretation bias

Confirmation bias

What type of data is the height of a skyscraper? Discrete Qualitative Nominal Continuous

Continuous

What is an example of administrative metadata for a digital file? File size File name File contents File permission

File permission

An analyst performing data cleansing on invoice data would like to select and view rows that have an amount paid that is greater than $100. Which spreadsheet functionality should the analyst use? COUNTIF Conditional formatting Filter Remove duplicates

Filter

What leads to confirmation bias in data collection? People experience the same circumstance differently. People view the same object differently. People search to verify preexisting beliefs. People perceive ambiguous situations in a positive or negative way.

People search to verify preexisting beliefs.

An analyst wants to present a high-level summarized version of the data at the end of data cleansing. Which spreadsheet functionality should the analyst use? Conditional formatting Pivot table Find and replace Filters

Pivot table

In data analytics, what term refers to all possible data values in a dataset? Source Representation Population Sample

Population

Which of the following items are examples of structured data? Select all that apply. Price list Scanned medical images Data table Audio recording

Price list Data table

What is the term for an identifier that references a database column in which each value is unique? Field Relation Primary key Foreign key

Primary key

A junior data analyst learns that the dataset they have been given is six years old. After looking into this further, they also discover that the age of the data is making the information irrelevant to their project. What good data source principle have they used to evaluate the dataset? Comprehensive Original Reliable Current

Reliable

Which file name follows formatting conventions? SalesReport 2021 SalesReport*2021 SalesReport_2021 SalesReport!2021

SalesReport_2021

A large company has several databases across its many departments. What kind of metadata describes how many locations contain a certain piece of data? Structural Administrative Descriptive Representative

Structural

A data analyst is performing analysis on data stored in a big data platform with state-of-the-art analysis tools. Insufficient sample size is rendering the current analysis ineffective. What is the primary challenge of a larger sample size? Collecting a larger sample size is more expensive. Analyzing a larger sample size is complex. Storing a larger sample size is difficult. Cleansing a larger sample size is complicated.

Collecting a larger sample size is more expensive.

Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called? Privacy Discretion Currency Consent

Consent

Freedom from inappropriate use of your data is an element of which aspect of data ethics? Consent Transparency Privacy Currency

Consent

Which data ethics principle gives an individual the right to know why their data is collected and how long it will be stored? Consent Anonymization Credibility Privacy

Consent

What are the key characteristics of a text, or string, data type? Select all that apply. Contains textual information Only two possible values Sequence of characters and punctuation Has numerical percentages

Contains textual information Sequence of characters and punctuation

Which role does an analyst have in collecting second-party data? Contracts with data aggregators Acquires internal data Contracts with an external entity Surveys sample populations

Contracts with an external entity

On very short notice, a data analyst is asked to create a report for stakeholders. Because of the challenging time frame, what type of data might yield the best results? Theoretical Fabricated Unclean Historical

Historical

In Google Sheets, what function enables a data analyst to specify a range of cells in one spreadsheet to be duplicated in another? SPECIFY DUPLICATE IMPORTRANGE CELLRANGE

IMPORTRANGE

What is the most likely reason why a data analyst would use historical data instead of gathering new data? The data is unknown The data is constantly changing The project has a very short time frame The project is unimportant

The project has a very short time frame

Which statement is true about sampling, irrespective of sample size? The sample standard deviation (Stdev) is the same as the population Stdev. The sample distribution approximates to normal distribution. The sample bias is reduced if the sample being selected is representative of the population. The sample mean approaches the population mean

The sample bias is reduced if the sample being selected is representative of the population.

What concept states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data? Ownership Currency Privacy Transaction transparency

Transaction transparency

What are the key characteristics of unstructured data? Select all that apply. Fits neatly into rows and columns Unorganized May have an internal structure Clearly identifiable construction

Unorganized May have an internal structure

What is the best practice for naming folders and subfolders to organize data? Use special character names. Use descriptive names. Use numeric names. Use spaces in the names

Use descriptive names.

What is the definition of verification in the data cleaning process? A process of ensuring the degree to which a set of measures is equivalent across systems A process of chronologically listing the modifications made to a set of data files during the data cleansing process A process to confirm how accurate and reliable a data set is following data cleaning A process to report on the results of data cleansing efforts to help build trust in the data and cleansing

A process to confirm how accurate and reliable a data set is following data cleaning

A team leader is assigned the task of evaluating the schema of a data set as part of data cleansing. How would the team leader define a schema to the analyst collaborating on the project prior to commencing cleansing? How well two or more data sets work together A way of describing how something is organized A process of combining two or more data sets into a single data set A way of matching fields in separate databases

A way of describing how something is organized

Fill in the blank: For data analytics projects, _____ data is typically preferred because users know it originated within the organization. second-party third-party multi-party first-party

first-party

Fill in the blank: Openness refers to _____ access, usage, and sharing of data. protected limited free disclosed

free

Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu. set lock freeze pin

freeze

Fill in the blank: When using a relational database, data analysts write _____ to request data from the related tables. relationships keys queries programs

queries

Fill in the blank: Data is considered _____ when it is accurate, complete, and unbiased information that has been vetted and proven fit for use. original current comprehensive reliable

reliable

What is an acceptable syntax for the SELECT keyword in MySQL? "select" select SELECT_Keyword 'select'

select

Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships. tables cells fields spreadsheets

tables

What is an example of conceptual data modeling that an analyst uses? Using mathematical models for predictive analysis of experiments Defining how individual records are uniquely identified in a database Defining the business requirements for a new database Using table names, column names, and data types for the database

Defining the business requirements for a new database

A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers? Administrative Structural Descriptive Representative

Descriptive

Which process utilizes logical and descriptive names for files, making them easier to find and use? Safety measures Foldering Normalization Data validation

Foldering

Bringing data from a .csv file into a spreadsheet is an example of what process? Filing data Importing data Editing data Normalizing data

Importing data

What are the key aspects of universal participation? Select all that apply. Certain groups of people must share their private data. No one can place restrictions on data to discriminate against a person or group. Everyone must be able to use, reuse, and redistribute open data. All corporations are allowed to sell open data.

No one can place restrictions on data to discriminate against a person or group. Everyone must be able to use, reuse, and redistribute open data.

When using long data, each subject has data in multiple rows. This is because each row represents what? Data in different formats True or false data points One observation per subject Multiple values

One observation per subject

A magazine conducts research about people's reading preferences. They only include respondents who currently subscribe. What type of bias does this scenario describe? Confirmation Interpretation Sampling Observer

Sampling

What are cookies? Types of malware that can damage computers Small files stored on computers that contain information about users Programs that enable users to access websites Pieces of code that store information about a website

Small files stored on computers that contain information about users

A political scientist needs to poll all voters in Seoul, South Korea, in order to predict the outcome of an election. Because it would be impossible to collect data from every single person in the city, the political scientist polls a part of the population that is representative of the whole. What does this scenario describe? Using a population Choosing a data type Using a sample Choosing quantitative data

Using a sample

An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply. Continuous Discrete Ordinal Nominal

Discrete Ordinal

Which of the following items are examples of continuous data? Select all that apply. Duration of a customer service call Favorite social media platform Number of employees at a company Temperature of a swimming pool

Duration of a customer service call Temperature of a swimming pool

When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column? Field Subject Character Point

Field

Which element of a Notepad file would be considered data as opposed to underlying metadata? File contents File description FIle date File size

File contents

A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope? Sort by condominium sales Filter out non-condominium sales Filter out condominium sales Sort by non-condominium sales

Filter out non-condominium sales

What is a feature of the filtering process when applied to spreadsheets? Filtering hides the data temporarily. Filtering orders the data temporarily. Filtering removes the data permanently. Filtering orders the data meaningfully.

Filtering hides the data temporarily.

An investor with a background working in the tech industry interprets any pitch from a tech startup as being more promising than others, even if the information is confusing and ambiguous. What type of bias does this scenario describe? Sampling Interpretation Observer Confirmation

Interpretation

Which of the following questions would enable a data professional to collect nominal qualitative data? How many books do you own? How many years of experience do you have? Is this your first time dining at this restaurant? What is your height?

Is this your first time dining at this restaurant?

Why is clean data critical for data analysis? It ensures data drives the decision the analyst intends to communicate. It ensures data used for analysis reflects operational reality. It ensures data is structured to enable effective analysis. It ensures data is visualized to make decisions.

It ensures data used for analysis reflects operational reality.

What are some key benefits of open-data initiatives? Select all that apply. Limit opportunities for collaboration Make government activities more transparent Support innovation and economic growth Help educate citizens about important issues

Make government activities more transparent Support innovation and economic growth Help educate citizens about important issues

A hospital system wants to protect the personally identifiable information of its patients, such as names and medical records. They ask their data team to anonymize the data. What techniques might they use to achieve this goal? Hashing Masking Sorting Blanking

Masking Blanking

What does an UPDATE statement do after execution in Structured Query Language (SQL)? Modifies values in certain cells of a table based on conditions Inserts new rows into the table based on data provided in the query Removes tables based on the query Creates a temporary table to be downloaded for analysis and visualization

Modifies values in certain cells of a table based on conditions

A financial institution publishes data about stock prices and market trends, which any business, nonprofit, or citizen can access, reuse, or redistribute through its online databases. What type of data is described in this scenario? Open Allowable Free Closed

Open

A government agency allows any business, nonprofit, or citizen to access its databases and reuse or redistribute the data. What type of data is described in this scenario? Closed Allowable Free Open

Open

A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias does this scenario describe? Sampling Observer Interpretation Confirmation

Sampling

A grocery store chain purchases customer data from a credit card company. The grocer uses this data to identify its most loyal customers and offer them special promotions and discounts. What type of data is being used in this scenario? First-party Multi-party Third-party Second-party

Second-party

What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize? Filtering Prioritizing Reframing Sorting

Sorting

Which type of structured data does an analyst use? Social media posts User-created content Videos of store traffic Store inventory

Store inventory

An international nonprofit organization wants to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply. Use metadata to standardize the datasets. Replace the incoming data's metadata with its own company metadata. Use metadata to evaluate the third-party data's quality and credibility. Alter the internal metadata to more closely reflect the incoming metadata.

Use metadata to standardize the datasets Use metadata to evaluate the third-party data's quality and credibility

Fill in the blank: Data _____ is a process data professionals use to ensure the formal management of their organization's data assets. sourcing governance organization storage

governance

What strategy do data professionals use in order to ensure unbiased sampling? Use random sampling during data collection Write survey questions that encourage specific responses Store data in a spreadsheet Skew results in a certain direction

Use random sampling during data collection

In the next six months, an analyst is expected to analyze and present the effects of monthly promotions on sales of a new product released one month ago. Which solution for insufficient data should this analyst pursue? Look for a new data set Identify trends with available data Speak with stakeholders and adjust the objective Wait for more data

Wait for more data

What are data ethics? Established methods for ensuring data is clean, well-organized, and appropriate for a project Long-standing techniques for confirming that data is always used to benefit society Approved strategies data professionals use to safeguard the privacy and security of a dataset Well-founded standards of right and wrong that dictate how data is collected, shared, and used

Well-founded standards of right and wrong that dictate how data is collected, shared, and used

To determine if a data source is cited, ask which of the following questions? Select all that apply. When was this data last refreshed? Who created this dataset? Is this dataset from a credible organization? Has this dataset been properly cleaned?

When was this data last refreshed? Who created this dataset? Is this dataset from a credible organization?

An analyst wants to combine the fields for city and state into a single field prior to extraction for data cleansing. Which Structured Query Language (SQL) statement will perform this function? CONCAT statement CAST statement ORDER BY statement COALESCE statement

CONCAT statement

A manager in charge of selling a particular product interprets any ambiguous customer feedback about the product as being positive. What type of bias does this represent? Confirmation Sampling Interpretation Observer

Confirmation

A vendor asks a data team at their partner company to share a spreadsheet. The spreadsheet contains three tabs. Tabs 1 and 2 are meant for the vendor to review, but tab 3 contains sensitive internal information. Which of the following tactics will enable the data team to keep tab 3 private? Select all that apply. Rename tab 3 "Sensitive," then share the spreadsheet with the vendor. Hide tab 3, then share the spreadsheet with the vendor. Copy tabs 1 and 2 into a separate spreadsheet, then share the new file with the vendor. Make a copy of the spreadsheet, delete tab 3, then share the new file with the vendor.

Copy tabs 1 and 2 into a separate spreadsheet, then share the new file with the vendor. Make a copy of the spreadsheet, delete tab 3, then share the new file with the vendor.

What aspects of a file do file-naming conventions typically describe? Select all that apply. Creation date Version number Content description Collaborator names

Creation date Version number Content description

Which of the following statements accurately describe primary and foreign keys in a relational database? Select all that apply. A foreign key uniquely identifies a record in a relational database table. A table can have multiple foreign keys. Primary keys are unique identifiers for each row in a table. Primary keys cannot contain null or blank values.

A table can have multiple foreign keys. Primary keys are unique identifiers for each row in a table. Primary keys cannot contain null or blank values.

What should an analyst consider at the start of data collection to reduce errors? Bias and fairness Publication of study Timeline and budget Length of survey

Bias and fairness

How does an analyst apply the principle of consent to ethics and privacy in data collection? By including participants in designing survey questions By including participants in all future opportunities By disclosing how and why the data will be used before the survey By addressing participants' concerns after publishing

By disclosing how and why the data will be used before the survey

How does an analyst apply data ethics to privacy and collection? By assuming consent has been received By restricting the distribution of data By using a segment of the data set in analysis By using the collected data responsibly

By using the collected data responsibly

You are the manager of your organization's monthly purchasing spreadsheet. It has eight sheets, each of which denotes a different department's purchases. You apply restrictions on the spreadsheet to ensure each department can only access their own sheet. What practice does this scenario describe? Data preservation Data integrity Data security Data hygiene

Data security

How does an analyst apply the principle of ownership to ethics and privacy in data collection? The third party that promotes the data should own it. The organization that collects the data should own it. The agency that sold the data should own it. Individuals who create the data should own it.

Individuals who create the data should own it.

What are some benefits of using internal data for an analysis project? Select all that apply. Internal data fully represents a universal topic. Internal data is likely to be reliable Internal data represents the entire industry. Internal data is free to access because the company owns it.

Internal data is likely to be reliable Internal data is free to access because the company owns it.

A junior data professional prepares for an analysis project about a very broad and global topic. However, they will only have access to internal data. What are some potential limitations that they should be aware of? Select all that apply. The data may not fully represent the facts. The data is not owned by the company. It may be difficult to gather data from multiple departments. It will be more difficult to confirm the reliability of the data.

It may be difficult to gather data from multiple departments. It will be more difficult to confirm the reliability of the data.

A data analytics team uses data about data to indicate consistent naming conventions for a project. What type of data is involved in this scenario? Aggregated data Metadata Long data Big data

Metadata

A grocery store collects inventory data about its produce section. What is an appropriate naming convention for this file? Todays Produce 2022-15-09 Todays_Produce Produce_Inventory_2022-09-15_V01 Inventory_Produce 2022-09-15 V01

Produce_Inventory_2022-09-15_V01

A junior data analyst at a dental care provider uses a tool to explore the data in its patient database. They learn about the data types it contains, the quality of that data, and the table relationships. What does this scenario describe? Designing a database Combining data from more than one source Using a metadata repository Performing data analysis

Using a metadata repository

A data analyst uses _____ to organize multiple files for a given project so they can be found and accessed in an efficient manner. data hygiene foldering version control data grouping

foldering

Fill in the blank: Data professionals use data _____ to handle issues related to internal and external data flows while ensuring data assets are formally managed. governance strategy integrity mapping

governance

Fill in the blank: Broader-topic folders are located at the top of a _____, and more specific subfolders and files are contained within those folders. hierarchy file extension permission encryption

hierarchy


Kaugnay na mga set ng pag-aaral

Chapter 45 Assisting in Microbiology and Immunology Mod 1

View Set

Chapter 11: Prosocial Behavior (Psych 280)

View Set

High Yield Immunology Questions HDM

View Set

US History Chapter 17 Sections 3 and 4

View Set

REL 212----Unit 2 Final Milestone

View Set

APUSH Unit 2 - Chapter 2: Transplantations and Borderlands

View Set