Chapter 3 Analytics - Performing the Test Plan and Analyzing the results
Place the steps of Data Reduction in order:
1. Identify the attribute you would like to reduce or focus on 2. Filter the results 3. Interpret the results 4. Follow up on results
Profiling steps in order
1. identify the objects you want to profile 2. Determine the types of profiling you want to perform 3. Set boundaries or thresholds for the activity 4. Interpret the results and monitor the activity and/or generate the list of exceptions 5. Follow up on exceptions
Any transaction that has a Z-score of ______ or above would represent abnormal transactions.
3
Select the correct definition of class.
A manually assigned category applied to a record based on an event.
Select the appropriate definition for regression:
A method used to predict specific values
Select the correct definition of a target.
An expected attribute or value that we want to evaluate.
______ is an observation about the frequency of leading digits in many real-life sets of numerical data.
Benford's law
______ are designed to be interactive and adapt to the information collected by the user
Decision support systems
What types of analytics summarizes existing data to determine past performance?
Descriptive analytics
After you have identified the objects or activity you wish to profile, what should you do next?
Determine the types of profiling you want to perform.
An example of time series analysis would be a prediction of future earnings based on past sales.
False
Classification requires that we know a great deal about the observation that we're attempting to place in a class.
False
Dependent variables can only be explained by a maximum of one independent variable.
False
Diagnostic analytics forecast future performance.
False
The co-occurrence grouping data approach is associated with predictive analytics.
False
Time series analysis is a predictive analytics technique used to predict future values based on past values of other variables.
False
When clustering works well, observations within a cluster should be different, and the data across clusters should be very similar.
False
After you have identified the attribute you would like to reduce or focus on, what is the next step?
Filter the results.
______ looks for similarities between portions, or segments, of the text of each potential match
Fuzzy match
What is the purpose of regression analysis
It allows analysts to develop models to predict expected outcomes.
Which of the following is true regarding the profiling approach?
It is generally performed on data that is readily available.
Which of the following is true regarding the Data Reduction approach?
It primarily uses structured data that is readily searchable.
In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit [risk] score. Which of the following is/are the explanatory variable(s)?
Length of employment Debt-to-income ratio Credit [risk] score
What is the terminology for the items that are useful for ranking observations rather than simply predicting class probability?
Linear classifiers
In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit [risk] score. Which is the dependent variable?
Loan rejection
______ include both unsupervised exploratory analysis and supervised model generation to provide insight and predictive foresight into the business and decisions made by accountants and auditors.
Machine learning and artificial intelligence
After you have identified the classes you wish to predict, what is the next step?
Manually classify an existing set of records.
______ might be used to identify areas where there is a lack of controls, changes in procedures, or individuals more willing to spend excessively in potential types of T&E expenses which might be associated with higher risk.
Profiling
What is the terminology for removing branches from a decision tree to avoid overfitting the model?
Pruning
XBRL is used to facilitate the exchange of financial reporting information between the company and the ______?
Securities and Exchange Commission
In the example of profiling for management accounting regarding Advanced Environmental Recycling Technologies, what are they looking for significant variances in?
Standard Cost
What is the purpose of clustering?
To identify groups of similar data elements and the underlying relationship of these groups.
What is the purpose of classification?
To predict which class an observation that we know little about will belong to.
What is the purpose of Data Reduction?
To reduce the amount of detailed information considered to focus on the most interesting or abnormal items.
__________ data are existing data that have been manually evaluated and assigned a class. ____________ data are existing data used to evaluate the model.
Training Test
In the following question, what would be the target? Given a set of customer data, we are trying to predict the total transaction amount based on a variety of attributes.
Transaction amount
The null hypothesis assumes the hypothesized relationship does not exist.
True
Decision Boundaries
a technique used to mark the split between one class and another
Decision Tree
a tool that is used to divide data into smaller groups
A class is a manually assigned _______ applied to a record based on an event.
category
Using a _________ model, you can predict whether a new vendor belongs to one class or another based on the behavior of others.
classification
When evaluating classifiers, you need to be careful to strike a balance between what two things?
complexity of the model and accuracy of the classification
In the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In this scenario, select the explanatory variable(s).
current professional salaries health of the economy salaries offered by other accounting firms
Profiling is a/an____________ analytics method that is used to discover patterns of behavior, based on the distance of z-scores from the mean.
diagnostic
Variance analysis, a common practice in management accounting, is an example of ______ analytics.
diagnostic
In the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In this scenario, what is the dependent variable?
employee turnover
Training Data
existing data that have been manually evaluated and assigned a class
Test Data
existing data used to evaluate the model
A specific type of data profiling that is used to look for correspondences between portions, or segments, of text for potential matches is called ________ match.
fuzzy
Clustering is an unsupervised method that is used to find _________ of similar data elements and the underlying relationships of those groups.
groups
Classification predicts a class for a new observation based on the ________ identification of classes from previous observations.
manual
Generally the more complex and complete the model, the higher degree of the model ______ the data.
overfitting
Profiling is used to discover _____ of behavior, based on the distance of z-scores from the mean.
patterns
Machine learning, artificial intelligence and decision support systems are all examples of ______ analytics.
prescriptive
Decision support systems are an example of ______.
prescriptive analytics
Structured data is stored in a database or spreadsheet and are readily
searchable
In the profiling example regarding T&E Expenses, which of the following is NOT one of the areas that the analyst would try to uncover?
significant variances in standard cost
Benford's law states that in many naturally occurring collections of numbers, the significant leading digit is likely to be ______.
small
A/an _________ approach is used when you are performing analysis that uses historical data to predict a future outcome based on a specific question.
supervised
Regression is a/an ________ method used to predict specific values given an explanatory variable (or variables).
supervised
What is XBRL used for?
to facilitate the exchange of financial reporting information between a company and the SEC.
A/an __________ approach is used when you don't have a specific question and are simply exploring the data for potential patterns of interest.
unsupervised
Clustering is a/an ________ method that is used to find natural groupings within the data.
unsupervised
Knowing the mean and standard deviation, and assuming a normal distribution, one can compute which statistic that can be used to identify abnormal transactions?
z-score
