Chapter #3
All of the following are examples of an unsupervised approach to evaluation data except: A) Similarity matching B) Clustering C) Profiling D) Co-occurrence grouping
A
In general, the more complex the model, the greater the chance of ________. A) Overfitting the data B) Underfitting the data C) Pruning the data D) The need to reduce the amount of data considered
A
All of the following are examples of a supervised approach to evaluation data except: A) Causal modeling B) Data reduction C) Link prediction D) Regression
B
Data reduction typically involves the following steps except: A) Identify the attribute you would like to reduce or focus on. B) Identify the parameters of the model. C) Filter the results. D) Interpret the results.
B
Understanding and predicting inventory obsolescence is an important determination for retail companies. When using competitor selling prices to estimate the inventory obsolescence reserve, the inventory obsolescence reserve represents which of the following? A) Independent variable B) Dependent variable C) Function D) Statistical Model
B
When working with a predictive model, under fitting the data is most likely caused by ________. A) an overly complex model B) an overly simple model C) over pruning the data D) a lack of data reduction
B
Which of the following best describes an independent variable? A) Output B) Input C) Application D) Operation
B
Regression analysis typically involves the following steps except: A) Identify the variables that might predict an outcome. B) Identify the parameters of the model. C) Set boundaries or thresholds. D) Determine the functional form of the relationship.
C
Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? A) Classification B) Regression C) Profiling D) Link prediction
C
Using social media to look for relationships between related parties that are not otherwise disclosed to identify related party transactions is an example of ________. A) Classification B) Regression C) Profiling D) Link prediction
D
Which approach to data analytics attempts to discover associations between individuals based on transactions involving them? A) Classification B) Regression C) Similarity matching D) Co-occurrence grouping
D
A target is a manually assigned category applied to a record based on an event.
False
Benford's Law is an absolute and all data must conform.
False
A decision tree can be used to divide data into smaller groups.
True
XBRL is used to facilitate the exchange of financial reporting information between the company and the Securities and Exchange Commission.
True
Which of the following best describes a dependent variable? A) Output B) Input C) Application D) Operation
A
________ is existing data that has been manually evaluated and assigned a class and ________ is existing data used to evaluate the model. A) Test data; Training data B) Training data; Test data C) Structured data; Unstructured data D) Unstructured data; Structured data
B
Unaware of data analysis tools available to the internal auditors, a store employee frequently processes cash returns without a receipt for $99, which is just below the amount requiring manager approval of $100. An analysis using which of the following would likely (and quickly) identify the employee's fraudulent behavior? A) Leading digits hypothesis B) Moore's law C) Benford's law D) Clustering
C
Which approach to data analytics attempts to identify similar individuals based on data known about them? A) Classification B) Clustering C) Similarity matching D) Co-occurrence grouping
C
Data reduction is a data approach used to reduce the amount of information that needs to be considered to focus on the most critical items.
True
Fuzzy matching is a computer-assisted technique of finding matches that are less than 100 percent perfect by finding correspondences between portions of the text of each potential match.
True
Regression is a data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.
True
The P in IMPACT Cycle represents performing test plan.
True
The data approach used to characterize the typical behavior of an individual, group or population by generating summary statistics about the data is referred to as classification.
True
When considering a question such as "Do our customers form natural groups based on similar attributes?" you would use an unsupervised approach.
True
XBRL is a global standard for exchanging financial reporting information that uses XML.
True
Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? A) Clustering B) Regression C) Similarity matching D) Classification
A
Which approach to data analytics attempts to predict, for each unit, the numerical value of some variable? A) Classification B) Regression C) Similarity matching D) Link prediction
B
Which of the following best describes an unsupervised approach to the evaluation of data? A) Data exploration that is free from oversight by a superior B) Data exploration that examines the relationships between variables that are hypothesized to exist C) Data exploration that looks for potential patterns of interest D) Data exploration that is conducted with direct oversight by a superior
C
________ mark the split between one class and another. A) Decision trees B) Identifying questions C) Decision boundaries D) Linear classifiers
C
________ refers to data that is stored in a database or spreadsheet that is readily searchable. A) Training data B) Unstructured data C) Structured data D) Test data
C
________ states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. A) Leading digits hypothesis B) Moore's law C) Benford's law D) Classification
C
Co-occurrence grouping is an example of a supervised approach.
False
Alibaba and its attempt to identify seller and customer fraud based on various characteristics known about them is an example of similarity matching.
True
An example of classification would be a credit card company flagging a transaction as being approved or potentially being fraudulent and denying payment.
True
Clustering is a data approach used to divide individuals into groups in a useful or meaningful way.
True
Co-occurrence grouping could be used to match vendors by geographic region.
True
Data profiling is used to assess data quality and internal controls. It typically involves the following steps except: A) Filter the results. B) Identify the objects or activity you want to profile. C) Determine the types of profiling you want to perform. D) Set boundaries or thresholds for the activity.
A
One of the key tasks of bank auditors is to consider the amount of the loan loss reserve. When developing a model to estimate the current year's loan loss reserve amount, which of the following would be least likely to be included as an independent variable? A) Original loan approval amount B) Customer loan history C) Current aged loans D) Collections success
A
The short surveys regarding dining preferences requested at the bottom of the restaurant bill are an example of which data approach? A) Clustering B) Regression C) Similarity matching D) Link prediction
A
Understanding and predicting warranty expense is an important determination for manufacturing firms. When using historical claims data to estimate the current period's warranty expense, the historical claims data represents which of the following? A) Independent variable B) Dependent variable C) Function D) Statistical Model
A
Which approach to data analytics attempts to assign each unit in a population into a small set of categories? A) Classification B) Regression C) Similarity matching D) Co-occurrence grouping
A
Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way? A) Clustering B) Data reduction C) Similarity matching D) Co-occurrence grouping
A
Which approach to data analytics attempts to forecast a relationship between two data items? A) Link prediction B) Regression C) Similarity matching D) Co-occurrence grouping
A
Which of the following best describes a supervised approach to the evaluation of data? A) Data exploration that is free from oversight by a superior B) Data exploration that is conducted with direct oversight by a superior C) Data exploration that examines the relationships between variables that are hypothesized to exist D) Data exploration that looks for potential patterns of interest
C
While overfitting data could lead to an error rate of 0 (zero), it is unlikely that you would be able to ________ your results. A) define B) specify C) articulate D) generalize
D
Data profiling typically involves unstructured data.
False
Existing data that has been manually evaluated and assigned a class is often referred to as test data.
False
Fuzzy matching is a data approach used to identify similar individuals based on data known about them.
False
Link prediction is a data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.
False