Chapter 3 - SmartBook
______ are designed to be interactive and adapt to the information collected by the user. Artificial intelligence Machine learning Decision support systems Predictive analytics
Decision support systems
After you have identified the objects or activity you wish to profile, what should you do next? Determine the types of profiling you want to perform. Interpret the results and monitor the activity. Set boundaries or thresholds for the activity. Follow up on exceptions.
Determine the types of profiling you want to perform.
_____ looks for similarities between portions, or segments, of the text of each potential match. Similarity match Fuzzy match Text matching Data matching
Fuzzy match
True or false: When clustering works well, observations within a segment should be different, and the data across segments should be very similar.
false
A/an __________ approach is used when you don't have a specific question and are simply exploring the data for potential patterns of interest.
unsupervised
After you have identified the attribute you would like to reduce or focus on, what is the next step? Filter the results. Interpret the results. Set boundaries or thresholds for the activity. Follow up on results.
Filter the results.
A/an __________ approach is used when you are performing analysis that uses historical data to predict a future outcome based on a specific question.
supervised
Clustering is a/an __________ method that is used to find natural groupings within the data.
supervised
Using a classification model, you can predict __________ a new vendor belongs to one class or another based on the behavior of others.
whether
_____ is an observation about the frequency of leading digits in many real-life sets of numerical data. Benford's law First-digit law Leading digit law
Benford's law
What is the terminology for the items that are useful for ranking observations rather than simply predicting class probability? Classification Regression Linear classifiers Pruning
Linear classifiers
In the following question, what would be the target? Given a set of customer data, we are trying to predict the total transaction amount based on a variety of attributes. Transaction amount The number of customers Customer name The entire dataset
Transaction amount
Classification predicts a class for a new observation based on the __________ , identification of classes from previous observations.
manual
Generally the more complex and complete the model, the higher degree of the model _____ the data. overfitting underfitting
overfitting
Profiling is a/an unsupervised method that is used to discover __________ of behavior, based on the distance of z-scores from the mean.
patterns
Machine learning, artificial intelligence and decision support systems are all examples of _____ analytics. diagnostic descriptive predictive prescriptive
prescriptive
Decision support systems are an example of _____. predictive analytics diagnostic analytics descriptive analytics prescriptive analytics
prescriptive analytics
Structured data is stored in a database or spreadsheet and are readily __________.
searchable
In the profiling example regarding T&E Expenses, which of the following is NOT one of the areas that the analyst would try to uncover? change in procedures lack of controls individuals more willing to spend excessively significant variances in standard cost
significant variances in standard cost
Benford's law states that in many naturally occurring collections of numbers, the significant leading digit is likely to be ______. large small larger if the data is describing geographic elements larger if the data is describing financial elements
small
__________ data are existing data that have been manually evaluated and assigned a class. __________ data are existing data used to evaluate the model.
training / test
A decision __________ is a tool used to divide data into smaller groups. Decision __________ is a technique used to mark the split between one class and another.
tree / boundaries
Profiling is a/an __________ method that is used to discover patterns of behavior, based on the distance of z-scores from the mean.
unsupervised
Knowing the mean and standard deviation, and assuming a normal distribution, one can compute which statistic that can be used to identify abnormal transactions? q-score m-score z-score c-score
z-score
True or false: Classification requires that we know a great deal about the observation that we're attempting to place in a class.
false
Place the steps of Data Reduction in order:
1) Identify the attribute you would like to reduce or focus on. 2) Filter the results. 3) Interpret the results. 4) Follow up on the results.
Any transaction that has a Z-score of ____ or above would represent abnormal transactions. 2 4 1 3
3
A class is a manually assigned __________ applied to a record based on an event.
categories
A target is an expected attribute or value that you want to __________.
evaluate
Place the steps of classification into order.
1) Identify the classes you wish to predict. 2) Manually classify an existing set of records. 3) Select a set of classification models. 4) Divide your data into training and testing sets. 5) Generate your model. 6) Interpret the results and select the "best" model.
Place the steps of profiling in order, from 1 through 5.
1) Identify the objects or activity you want to profile. 2) Determine the types of profiling you want to perform. 3) Set boundaries or thresholds for the activity. 4) Interpret the results and monitor the activity and/or generate a list of exceptions. 5) Follow up on exceptions.
In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit [risk] score. Which of the following is/are the explanatory variable(s)? Loan rejection Credit [risk] score Debt-to-income ratio Length of employment
Credit [risk] score Debt-to-income ratio Length of employment
In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit [risk] score. Which is the response variable? Debt-to-income ratio Length of employment Credit [risk] score Loan rejection
Loan rejection
_____ include both unsupervised exploratory analysis and supervised model generation to provide insight and predictive foresight into the business and decisions made by accountants and auditors. Diagnostic analytics Machine learning and artificial intelligence Decision support systems and business intelligence Descriptive analytics
Machine learning and artificial intelligence
After you have identified the classes you wish to predict, what is the next step? Generate your model Interpret the results and select the "best" model. Manually classify an existing set of records. Select a set of classification models.
Manually classify an existing set of records.
_____ might be used to identify areas where there is a lack of controls, changes in procedures, or individuals more willing to spend excessively in potential types of T&E expenses which might be associated with higher risk. Profiling Classification Clustering Data reduction
Profiling
What is the terminology for removing branches from a decision tree to avoid overfitting the model? Pruning Classification Linear classifiers Segregating
Pruning
What body mandates submission of XBRL to facilitate the exchange of financial reporting information? a) Securities and Exchange Commission b) Public Company Accounting Oversight Board c) New York Stock Exchange d) Financial Accounting Standards Board
Securities and Exchange Commission
In the example of profiling for management accounting regarding Advanced Environmental Recycling Technologies, what are they looking for significant variances in? Feet of Decking Recipe Standards Standard Cost Travel and Entertainment Expenses
Standard Cost
In the example of profiling for management accounting regarding Advanced Environmental Recycling Technologies, what are they looking for significant variances in? Recipe Standards Feet of Decking Standard Cost Travel and Entertainment Expenses
Standard Cost
Which of the following is true regarding the profiling approach? a) It is generally performed on data that is readily available. b) It is never as simple as calculating summary statistics. c) It is primarily done using unstructured data. d) It is rarely used to assess internal controls.
a) It is generally performed on data that is readily available.
What is the purpose of clustering? a) To identify groups of similar data elements and the underlying drivers of these groups. b) To reduce the amount of detailed information considered to focus on the most interesting or abnormal items. c) To gain an understanding of a typical behavior of an individual, group, population, or sample. d) It allows analysts to develop models to predict expected outcomes.
a) To identify groups of similar data elements and the underlying drivers of these groups.
What is the purpose of classification? a) To predict which class an observation that we know little about will belong to. b) To gain an understanding of a typical behavior of an individual, group, population, or sample. c) It allows analysts to develop models to predict expected outcomes. d) To reduce the amount of detailed information considered to focus on the most interesting or abnormal items.
a) To predict which class an observation that we know little about will belong to.
In the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In this scenario, select the explanatory variable(s). a) health of the economy b) employee turnover c) current professional salaries d) salaries offered by other accounting firms
a) health of the economy c) current professional salaries d) salaries offered by other accounting firms
Which of the following is true regarding the Data Reduction approach? a) It works best when there is not any particular attribute you would like to focus on. b) It primarily uses structured data that is readily searchable. c) It is most useful when performed on a small dataset.
b) It primarily uses structured data that is readily searchable.
What is the purpose of Data Reduction? a) To estimate or predict, for each unit, the numerical value of some variable. b) To reduce the amount of detailed information considered to focus on the most interesting or abnormal items. c) To gain an understanding of a typical behavior of an individual, group, population, or sample. d) To predict the class of a new observation.
b) To reduce the amount of detailed information considered to focus on the most interesting or abnormal items.
Select the correct definition of class. a) Summary statistics, such as minimums, maximums, and averages in a dataset. b) An expected attribute or value that we want to evaluate in a dataset. c) A manually assigned category applied to a record based on an event.
c) A manually assigned category applied to a record based on an event.
When evaluating classifiers, you need to be careful to strike a balance between what two things? explanatory and response variables positive and negative relationships in the model complexity of the model and accuracy of the classification
complexity of the model and accuracy of the classification
What is XBRL used for? a) to look up correspondences between portions, or segments, of a set of text for a potential match. b) a technique used by analysts to develop models to predict expected outcomes. c) to provide a description of each field in the tables of a relational database. d) to facilitate the exchange of financial reporting information between a company and the SEC.
d) to facilitate the exchange of financial reporting information between a company and the SEC.
Variance analysis, a common practice in management accounting, is an example of _____ analytics. predictive prescriptive diagnostic descriptive
diagnostic
In the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In this scenario, what is the response variable? health of the economy employee turnover current professional salaries salaries offered by other accounting firms
employee turnover
A specific type of data profiling that is used to look for correspondences between portions, or segments, of text for potential matches is called __________ match.
fuzzy
Clustering is an unsupervised method that is used to find natural __________ within the data.
groupings