ACC Systems 2 Trinkle Ch 1-4
As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation?
Clean up trailing zeroes.
Big Data is often described by the three V's, or
Volume, velocity, and variety
__________ mark the split between one class and another.
Decision boundaries
Which approach to Data Analytics attempts to identify similar individuals based on data known about them?
Similarity matching
Models associated with regression and classification data approaches have all but this important part:
Test data.
The observation that the frequency of leading digits in many real-life sets of numerical data is called:
Benford's law.
Which approach to Data Analytics attempts to assign each unit in a population into a small set of classes where the unit belongs?
Classification
Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs?
Classification
Which skills were not emphasized that analytic-minded accountants should have?
Classification of test approaches & Data and systems analysis and design
The metadata that describes each attribute in a database is which of the following?
Data dictionary
Which of these terms is defined as being a central repository of descriptions for all of the data attributes of the dataset?
Data dictionary
What are attributes that exist in a relational database that are neither primary nor foreign keys?
Descriptive attributes
Mastering the data can also be described via the ETL process. The ETL process stands for:
Extract, transform, and load data.
The goal of the ETL process is to
Identify and obtain the data needed for solving the problem.
The Fahrenheit scale of temperature measurement would best be described as an example of:
Interval data.
Why is Supplier ID considered to be a primary key for a Supplier table?
It contains a unique identifier for each supplier.
Which of these is not included in the five steps of the ETL process?
Learn what data is available in the data warehouse.
Which approach to Data Analytics attempts to predict relationship between two data items?
Link prediction
Which approach to data analytics attempts to predict a relationship between two data items?
Link prediction
__________ data would be considered the least sophisticated type of data.
Nominal
The advantages of storing data in a relational database include which of the following: Option A - Help in enforcing business rules. Option B - Increased information redundancy. Option C - Integrating business processes.
Only Option A and Option C.
Gold, silver and bronze medals would be examples of
Ordinal data.
In general, the more complex the model, the greater the chance of
Overfitting the data.
Which attribute is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table?
Primary key
Line charts are not recommended for what type of data?
Qualitative data
__________ data would be considered the most sophisticated type of data.
Ratio
Which of the following is not a typical example of nominal data?
SAT scores
By the year 2020, about 1.7 megabytes of new information be created every:
Second.
In the late 1960s, Ed Altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic, Altman's Z-score, now a widely used score in finance. Based on the name of the statistic, which statistical distribution would you guess this came from?
Standardized normal distribution
Data that are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. The term matching this definition is:
Structured data.
__________ is a discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line.
Support vector machines
__________ is a set of data used to assess the degree and strength of a predicted relationship.
Test data
Justin Zobel suggests that revising your writing requires you to "be egoless—ready to dislike anything you have previously written," suggesting that it is __________ you need to please:
The reader
The purpose of transforming data is:
To validate the data for completeness and integrity.
In general, the more simple the model, the greater the chance of
Underfitting the data.
The IMPACT cycle includes all but the following process:
Visualize the data. & Data preparation.