ch3: master the data

Ace your homework & exams now with Quizwiz!

Place the steps of classification into order.

-identify the classes you wish to predict -manually classify an existing set of records -select a set of classification models -divide your data into training and testing sets -generate your model -interpret the results and select the "best" model

Place the steps of profiling in order, from 1 through 5.

-identify the objects or activity you want to profile -determine the types of profiling you want to perform -set boundaries or thresholds for the activity -interpret the results and monitor the activity and/or generate a list of exceptions -follow up on exceptions

______ include both unsupervised exploratory analysis and supervised model generation to provide insight and predictive foresight into the business and decisions made by accountants and auditors.

Machine learning and artificial intelligence

After you have identified the classes you wish to predict, what is the next step?

Manually classify an existing set of records.

Decision support systems are an example of ______.

Prescriptive analytics

In the example of profiling for management accounting regarding Advanced Environmental Recycling Technologies, what are they looking for significant variances in?

Standard Cost

Using a ___model, you can predict whether a new vendor belongs to one class or another based on the behavior of others

classification

When evaluating classifiers, you need to be careful to strike a balance between what two things?

complexity of the model and accuracy of the classification

In the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In this scenario, select the explanatory variable(s). Select all that apply.

current professional salaries health of the economy salaries offered by other accounting firms

______ might be used to identify areas where there is a lack of controls, changes in procedures, or individuals more willing to spend excessively in potential types of T&E expenses which might be associated with higher risk.

profiling

What is the terminology for removing branches from a decision tree to avoid overfitting the model?

pruning

Structured data is stored in a database or spreadsheet and are readily ___

searchable

XBRL is used to facilitate the exchange of financial reporting information between the company and the Blank______?

securities and exchange commission

In the profiling example regarding T&E Expenses, which of the following is NOT one of the areas that the analyst would try to uncover?

significant variances in standard cost

____data are existing data that have been manually evaluated and assigned a class. ____data are existing data used to evaluate the model.

training; test

A decision _______is a tool used to divide data into smaller groups. Decision ____mark the split between one class and another.

trees; boundaries

The null hypothesis assumes the hypothesized relationship does not exist.

true

a _________ approach is used when you don't have a specific question and are simply exploring the data for potential patterns of interest.

unsupervised

A manually assigned category applied to a record based on an event.

a class

An expected attribute or value that we want to evaluate in a dataset.

a target

Any transaction that has a Z-score of ______ or above would represent abnormal transactions.

3

Select the appropriate definition for regression:

A method used to predict specific values

Select the correct definition of a target.

An expected attribute or value that we want to evaluate.

______ is an observation about the frequency of leading digits in many real-life sets of numerical data.

Benford's law

______ are designed to be interactive and adapt to the information collected by the user.

Decision support systems

After you have identified the objects or activity you wish to profile, what should you do next?

Determine the types of profiling you want to perform.

An example of time series analysis would be a prediction of future earnings based on past sales.

False; An example of time series analysis would be a prediction of future earnings based on past earnings.

What is the purpose of regression analysis?

It allows analysts to develop models to predict expected outcomes.

Which of the following is true regarding the profiling approach?

It is generally performed on data that is readily available.

Which of the following is true regarding the Data Reduction approach?

It primarily uses structured data that is readily searchable.

In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit [risk] score. Which of the following is/are the explanatory variable(s)?

Length of employment Credit [risk] score Debt-to-income ratio

What is the terminology for the items that are useful for ranking observations rather than simply predicting class probability?

Linear classifiers

What is the purpose of clustering?

To identify groups of similar data elements and the underlying relationship of these groups.

What is the purpose of classification?

To predict which class an observation that we know little about will belong to.

What is the purpose of Data Reduction?

To reduce the amount of detailed information considered to focus on the most interesting or abnormal items.

in the following question, what would be the target? Given a set of customer data, we are trying to predict the total transaction amount based on a variety of attributes.

Transaction amount

A class is a manually assigned ________ applied to a record based on an event.

category

A method for simplifying large datasets into obvious categories.

data reduction

What types of analytics summarizes existing data to determine past performance?

descriptive analytics

Variance analysis, a common practice in management accounting, is an example of ______ analytics.

diagnostic

In the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In this scenario, what is the dependent variable?

employee turnover

__ is a/an unsupervised method that is used to discover patterns of behavior, based on the distance of z-scores from the mean.

profiling

Dependent variables can only be explained by a maximum of one independent variable.

false

Diagnostic analytics forecast future performance.

false

True or false: Classification requires that we know a great deal about the observation that we're attempting to place in a class

false

True or false: When clustering works well, observations within a cluster should be different, and the data across clusters should be very similar.

false

Time series analysis is a predictive analytics technique used to predict future values based on past values of other variables.

false; Time series analysis is a predictive analytics technique used to predict future values based on past values of the same variable.

True or false: Classification requires that we know a great deal about the observation that we're attempting to place in a class.

false; the power of Classification is that we can know very little about a given observation in order to predict which class it will belong to.

After you have identified the attribute you would like to reduce or focus on, what is the next step?

filter the results

A specific type of data profiling that is used to look for correspondences between portions, or segments, of text for potential matches is called __ match.

fuzzy

______ looks for similarities between portions, or segments, of the text of each potential match.

fuzzy match

Clustering is an unsupervised method that is used to find ___ of similar data elements and the underlying relationships of those groups.

groups

steps of data reduction in order

identify the attribute you would like to reduce or focus on filter the results interpret the results follow up on results

In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit [risk] score. Which is the dependent variable?

loan rejection

Classification predicts a class for a new observation based on the ___ identification of classes from previous observations.

manual

Classification predicts a class for a new observation based on the _____identification of classes from previous observations.

manual

Generally the more complex and complete the model, the higher degree of the model Blank______ the data.

overfitting

Profiling is used to discover ___ of behavior, based on the distance of z-scores from the mean.

patterns

Machine learning, artificial intelligence and decision support systems are all examples of Blank______ analytics.

prescriptive

Which analytics type works to identify the best possible options given constraints or changing conditions?

prescriptive analytics

Which of the following data approaches are associated with diagnostic analytics?

profiling

Benford's law states that in many naturally occurring collections of numbers, the significant leading digit is likely to be ______

small

In the example of profiling for management accounting regarding Advanced Environmental Recycling Technologies, what are they looking for significant variances in?

standard cost

Clustering is a/an ___method that is used to find natural groupings within the data.

supervised

Regression is a/an ______method used to predict specific values given an explanatory variable (or variables).

supervised

a/an ___ approach is used when you are performing analysis that uses historical data to predict a future outcome based on a specific question.

supervised

regression is a __method used to predict specific values given an explanatory variable (or variables).

supervised

What is XBRL used for?

to facilitate the exchange of financial reporting information between a company and the SEC.

Knowing the mean and standard deviation, and assuming a normal distribution, one can compute which statistic that can be used to identify abnormal transactions?

z-score


Related study sets

A + P Ch. 4 (Cellular Metabolism)

View Set

Social Media Marketing Chapter 4 Quiz

View Set

econ 380 - exam 2 ch 6-10 labor economics Mcgann

View Set

APUSH Chapter 27-28 Notecards JSerra

View Set

Comm 1200 Public Speaking Midterm

View Set

AP Bio Unit 3: cellular respiration and photosynthesis practice questions

View Set

Chapter 3: Challenges in the Late 1800s Topic Review

View Set