Data Analytics Quiz #2
Descriptive attributes
Attributes that exist in relational databases that are neither primary nor foreign keys. These attributes provide business information, but are not required to build a database. An example would be "Company Name" or "Employee Address."
Data request form
A method for obtaining data if you do not have access to obtain the data directly yourself.
What are the steps of the IMPACT model?
I) Identify the questions M) Master the data P) Perform test plan A) Address and refine results C) Communicate insights T) Track outcomes
What are the detailed steps of ETL?
Step 1: Determine the purpose and scope of the data request (extract). Step 2: Obtain the data (extract). Step 3: Validating the data for completeness and integrity (transform). Step 4: Cleaning the data (transform). Step 5: Loading the data for data analysis (load).
ETL
The extract, transform, and load process that is integral to mastering the data.
Flat file
a means of storing data in one place, such as in an Excel spreadsheet, as opposed to storing the data in multiple tables, such as in a relational database.
Relational database
a means of storing data in order to ensure that the data are complete, not redundant, and to help enforce business rules. Relational databases also aid in communication and integration of business processes across an organization.
Composite primary key
a special case of a primary key that exists in linking tables. The composite primary key is made up of the two primary keys in the table that it is linking.
Why is Supplier ID considered to be a primary key for a Supplier table? a) it contains a unique identifier for each supplier. b) it is a 10-digit number. c) it can either be for a vendor or miscellaneous provider. d) it is used to identify different supplier categories.
a) it contains a unique identifier for each supplier.
The purpose of transforming data is: a) to validate the data for completeness and integrity. b) to load the data into the appropriate tool for analysis. c) to obtain the data from the appropriate source. d) to identify which data are necessary to complete the analysis.
a) to validate the data for completeness and integrity.
Foreign key
an attribute that exists in relational databases in order to carry out the relationship between two tables. This does not serve as the "unique identifier" for each record in a table. These must be identified when mastering the data from a relational database in order to extract the data correctly from more than one table.
Primary key
an attribute that is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table.
The metadata that describes each attribute in a database is which of the following? a) composite primary key. b) data dictionary. c) descriptive attributes. d) flat file.
b) data dictionary.
What are attributes that exist in a relational database that are neither primary nor foreign keys? a) nondescript attributes. b) descriptive attributes. c) composite key. d) relational table attributes.
b) descriptive attributes.
As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? a) remove headings and subtotals. b) format negative numbers. c) clean up trailing zeros. d) correct inconsistencies across data.
c) clean up trailing zeros.
Mastering the data can also be described via the ETL process. The ETL process stands for: a) extract, total, and load data. b) enter, transform, and load data. c) extract, transform, and load data. d) enter, total, and load data.
c) extract, transform, and load data.
Which attribute is required to exist in each table of a relationship database and serves as the "unique identifier" for each record in a table? a) foreign key. b) unique identifier. c) primary key. d) key attribute.
c) primary key.
Data dictionary
centralized repository of descriptions for all of the data attributes of a dataset.
The goal of the ETL process is to: a) identify which approach to data analytics should be used. b) load the data into a relationship database for storage. c) communicate the results and insights found through the analysis. d) identify and obtain the data needed for solving the problem.
d) identify and obtain the data needed for solving the problem.
Which of these is not included in the five steps of the ETL process? a) determine the purpose and scope of the data request. b) obtain the data. c) validate the data for completeness and integrity. d) scrub the data.
d) scrub the data.
Completeness
ensures that all data required for a business process are included in the dataset.
The advantages of storing data in a relational database include which of the following? a) help in enforcing business rules. b) increased information redundancy. c) integrating business processes. d) all of the above are advantages of a relational database. e) only a and b. f) only b and c. g) only a and c.
g) only a and c.
Business rules are enforced
relational databases can be designed to aid in the placement and enforcement of internal controls and business rules in ways that flat files cannot.
Communication and integration of business processes
relational databases should be designed to support business processes across the organization, which results in improved communication across functional areas and more integrated business processes.
No redundancy
storing redundant data is to be avoided for several reasons: it takes up unnecessary space (which is expensive), it takes up unnecessary processing to run reports to ensure that there aren't multiple versions of the truth, and it increases the risk of data-entry errors. Storing data in flat files yields a great deal of redundancy, but normalized relational databases require there to be one version of the truth and for each element of data to be stored in only one place.
Mastering the data
the second step in the IMPACT cycle; it involves identifying and obtaining the data needed for solving the data analysis problem, as well as cleaning and preparing the data for analysis.