Chapters 1-4 Multiple Choice Questions
What are attributes that exist in a relational database that are neither primary nor foreign keys? A. Nondescript attributes B. Descriptive attributes C. Composite key D. Relational table attributes
B. Descriptive attributes
Gold, silver, and bronze medals would be examples of: A. Nominal data. B. Ordinal data. C. Structured data. D. Test data.
B. Ordinal data.
Line charts are not recommended for what type of data? A. Normalized data B. Qualitative data C. Continuous data D. Trend lines
B. Qualitative data
Which of the following *is not* a typical example of nominal data? A. Gender B. SAT scores C. Hair color D. Ethnic group
B. SAT scores
By the year 2020, about 1.7 megabytes of new information will be created every A. Week. B. Second. C. Minute. D. Day.
B. Second.
______________________________ is a discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line. A. Linear classifier B. Support vector machine C. Decision tree D. Multiple regression
B. Support vector machine
Justin Zobel suggests that revising your writing requires you to be egoless—ready to dislike anything you have previously written," suggesting that it is ______________ you need to please: A. Yourself B. The reader C. The customer D. Your boss
B. The reader
In general, the simpler the model, the greater the chance of: A. Overfitting the data. B. Underfitting the data. C. Pruning the data. D. The need to reduce the amount of data considered.
B. Underfitting the data.
Big Data is often described by the three Vs, or A. Volume, velocity, and variability. B. Volume, velocity, and variety. C. Volume, volatility, and variability. D. Variability, velocity, and variety.
B. Volume, velocity, and variety.
The observation that is frequency of leading digits in many real-life sets of numerical data is called: A. Leading digits hypothesis. B. Moore's law. C. Benford's law. D. Clustering.
C. Benford's law.
Which skills were not emphasized that analytic-minded accountants should have? A. Develop an analytics mindset B. Data scrubbing and data preparation C. Classification of test approaches D. Define and address problems through statistical data analysis
C. Classification of test approaches
As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? A. Remove headings and subtotals. B. Format negative numbers. C. Clean up trailing zeroes. D. Correct inconsistencies across data.
C. Clean up trailing zeroes.
Which of these terms is defined as being a central repository of descriptions for all of the data attributes of the dataset? A. Big Data B. Data warehouse C. Data dictionary D. Data Analytics
C. Data dictionary
______________________________ mark (marks) the split between one class and another. A. Decision trees B. Identified questions C. Decisions boundaries D. Linear classifiers
C. Decisions boundaries
Mastering the data can also be described via the ETL process. The ETL process stands for: A. Extract, total, and load data. B. Enter, transform, and load data. C. Extract, transform, and load data. D. Enter, total, and load data.
C. Extract, transform, and load data.
Which approach to Data Analytics attempts to predict relationship between two data items? A. Profiling B. Classification C. Link prediction D. Regression
C. Link prediction
Which approach to data analytics attempts to predict a relationship between two data items? A. Similarity matching B. Classification C. Link prediction D. Co-occurrence grouping
C. Link prediction
Which attribute is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table? A. Foreign key B. Unique identifier C. Primary key D. Key attribute
C. Primary key
Which approach to Data Analytics attempts to identify similar individuals based on data known about them? A. Classification B. Regression C. Similarity matching D. Data reduction
C. Similarity matching
The advantages of storing data in a relational database include which of the following? A. Help in enforcing business rules. B. Increased information redundancy. C. Integrating business processes. D. All of the above are advantages of a relational database. E. Only A and B. F. Only B and C. G. Only A and C.
G. Only A and C.
Which approach to Data Analytics attempts to assign each unit in a population into a small set of classes where the unit belongs? A. Classification B. Regression C. Similarity matching D. Co-occurrence grouping
A. Classification
Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs? A. Classification B. Regression C. Similarity matching D. Co-Occurrence grouping
A. Classification
The Fahrenheit scale of temperature measurement would best be described as an example of: A. Interval data. B. Discrete data. C. Nominal data. D. Continuous data.
A. Interval data.
Why is Supplier ID considered to be a primary key for a Supplier table? A. It contains a unique identifier for each supplier. B. It is a 10-digit number. C. It can either be for a vendor or miscellaneous provider. D. It is used to identify different supplier categories.
A. It contains a unique identifier for each supplier.
In general, the more complex the model, the greater the chance of: A. Overfitting the data. B. Underfitting the data. C. Pruning the data. D. The need to reduce the amount of data considered.
A. Overfitting the data.
______________ data would be considered the most sophisticated type of data. A. Ratio B. Interval C. Ordinal D. Nominal
A. Ratio
What is the most appropriate chart when showing a relationship between two variables (according to Exhibit 4-8)? A. Scatter chart B. Bar chart C. Pie graph D. Histogram
A. Scatter chart
The purpose of transforming data is: A. To validate the data for completeness and integrity. B. To load the data into the appropriate tool for analysis. C. To obtain the data from the appropriate source. D. To identify which data are necessary to complete the analysis.
A. To validate the data for completeness and integrity.
The IMPACT cycle includes all *except* the following process: A. Visualize the data. B. Identify the questions. C. Master the data. D. Track outcomes.
A. Visualize the data.
The metadata that describes each attribute in a database is which of the following? A. Composite primary key B. Data dictionary C. Descriptive attributes D. Flat file
B. Data dictionary
The IMPACT cycle includes all *except* the following process: A. Communicate insights. B. Data preparation. C. Address and refine results. D. Perform test plan.
B. Data preparation.
In the late 1960s, Ed Altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic, which statistical distribution would you guess this came from? A. Normal distribution B. Poisson distribution C. Standardized normal distribution D. Uniform distribution
C. Standardized normal distribution
Data that are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms, The term matching this definition is: A. Training data. B. Unstructured data. C. Structured data. D. Test data.
C. Structured data.
Which skills were not emphasized that analytic-minded accountants should have? A. Data quality B. Descriptive data analysis C. Data visualization D. Data and systems analysis and design
D. Data and systems analysis and design
The goal of the ETL Process is to: A. Identify which approach to data analytics should be used. B. Load the data into a relational database for storage. C. Communicate the results and insights found through the analysis. D. Identify and obtain the data needed for solving the problem.
D. Identify and obtain the data needed for solving the problem.
______________ data would be considered the least sophisticated type of data. A. Ratio B. Interval C. Ordinal D. Nominal
D. Nominal
Exhibit 4-8 gives chart suggestions for what data you'd like to portray. Those options include all of the following except: A. Relationship. B. Comparison. C. Distribution. D. Normalization.
D. Normalization.
Which of these is not included in the five steps of the ETL process? A. Determine the purpose and scope of the data request. B. Obtain the data. C. Validate the data for completeness and integrity. D. Scrub the data.
D. Scrub the data.
______________________________ is a set of data used to assess the degree and strength of a predicted relationship. A. Training data B. Unstructured data C. Structured data D. Test Data
D. Test Data
Models associated with regression and classification data approaches have *all except* this important part: A. Identifying which variables (we'll call these independent variables) might help predict an outcome (we'll call this the dependent variable). B. The functional form of the relationship (linear, nonlinear, etc.). C. The numeric parameters of the model (detailing the relative weights of each of the variables associated with the prediction). D. Test data.
D. Test data.