Ch.3
T or F: When clustering works well, observations within a segment should be different, and the data across segments should be very similar
False; it's the opposite
Machine learning, artificial intelligence and decision support systems are all examples of
Prescriptive Analytics
Clustering is an unsupervised method that's used to find natural ______ within the data
groups
A decision ______ is a tool used to divide data into smaller groups. Decision _____ is a technique used to mark the split b/t one class and another
tree; boundaries
Profiling is a/an ________ method that's used to discover patterns of behavior, based on the distance of z-scores from the mean
unsupervised
_______ data are existing data that have been manually evaluated and assigned a class, ______ data are existing data used to evaluate the model
Training; Test
A/an _______ approach is used when you're performing analysis that uses historical data to predict a future outcome based on a specific question
supervised
What would be the target? Given a set of customer data, we're trying to predict the total transaction amount based on a variety of attributes
the transaction amount
A/an ______ approach is used when you don't have a specific question and are simply exploring the data for potential patterns of interest
unsupervised
Knowing the mean and standard deviation, and assuming a normal distribution, one can compute which statistic that can be used to identify abnormal transactions?
z-score
Data Reduction Steps
1. Identify the attribute you would like to reduce or focus on 2. Filter the results 3. Interpret the results 4. Follow up on results
Classification Steps
1. Identify the classes you wish to predict 2. Manually classify an existing set of records 3. Select a set of classification models 4. Divide your data into training and testing sets 5. Generate your model 6. Interpret the results and select the "best" model
Data Profiling Steps
1. Identify the objects or activity you want to profile 2. Determine the types of profiling you want to perform 3. Set boundaries or thresholds for the activity 4. Interpret the results and monitor the activity and/or generate a list of exceptions 5. Follow up on exceptions
_____ is an observation about the frequency of leading digits in many real-life sets of numerical data
Benford's Law
__________ are designed to be interactive and adapt to the information collected by the user
Decision Support Systems
Variance analysis, a common practice in management accounting, is an example of
Diagnostic Analytics
What is true regarding the Data Reduction approach?
It primarily uses structured data that is readily searchable
What's the terminology for the items that are useful for ranking observations rather than simply predicting class probability?
Linear Classifiers
______ might be used to identify areas where there is a lack of controls, changes in procedures, or individuals more willing to spend excessively in potential types of T&E expenses which might be associated w/higher risk
Profiling
What's the terminology for removing branches from a decision tree to avoid overfitting the model?
Pruning
What is XBRL used for?
To facilitate the exchange of financial reporting information b/t a company and the SEC
What's the purpose of profiling?
To gain an understanding of a typical behavior of an individual, group, population, or sample
What's the purpose of clustering?
To identify groups of similar data elements and the underlying drivers of these groups
A class is a manually assigned _______ applied to a record based on an event
category
Classification predicts a class for a new observation based on the _________ identification of classes from previous observations
manual
Generally the more complex and complete the model, the higher degree of the model _____ the data
overfitting
T or F: Classification requires that we know a great deal about the observation that we're attempting to place in a class
False
________ include both unsupervised exploratory analysis and supervised model generation to provide insight and predictive foresight into the business and decisions made by accountants and auditors
Machine Learning and Artificial Intelligence
When evaluating classifiers, you need to be careful to strike a balance b/t what 2 things?
complexity of the model and accuracy of the classification
A specific type of data profiling that is used to look for correspondences between portions, or segments, of text for potential matches is called
fuzzy match