ACCT 3130 Chapter 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

Using social media to look for relationships between related parties that are not otherwise disclosed to identify related party transactions is an example of ________. A) Classification B) Regression C) Profiling D) Link prediction

D

Which approach to data analytics attempts to discover associations between individuals based on transactions involving them? A) Classification B) Regression C) Similarity matching D) Co-occurrence grouping

D

While overfitting data could lead to an error rate of 0 (zero), it is unlikely that you would be able to ________ your results. A) define B) specify C) articulate D) generalize

D

A target is a manually assigned category applied to a record based on an event.

False

Benford's Law is an absolute and all data must conform.

False

Co-occurrence grouping is an example of a supervised approach.

False

Data profiling typically involves unstructured data.

False

Existing data that have been manually evaluated and assigned a class is often referred to as test data.

False

Fuzzy matching is a data approach used to identify similar individuals based on data known about them.

False

Link prediction is a data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

False

________ mark the split between one class and another. A) Decision trees B) Identifying questions C) Decision boundaries D) Linear classifiers

C

________ refers to data that are stored in a database or spreadsheet that is readily searchable. A) Training data B) Unstructured data C) Structured data D) Test data

C

________ states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. A) Leading digits hypothesis B) Moore's law C) Benford's law D) Classification

C

Regression analysis typically involves the following steps except: A) Identify the variables that might predict an outcome. B) Identify the parameters of the model. C) Set boundaries or thresholds. D) Determine the functional form of the relationship.

C

Unaware of data analysis tools available to the internal auditors, a store employee frequently processes cash returns without a receipt for $99, which is just below the amount requiring manager approval of $100. An analysis using which of the following would likely (and quickly) identify the employee's fraudulent behavior. A) Leading digits hypothesis B) Moore's law C) Benford's law D) Clustering

C

Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? A) Classification B) Regression C) Profiling D) Link prediction

C

Which approach to data analytics attempts to identify similar individuals based on data known about them? A) Classification B) Clustering C) Similarity matching D) Co-occurrence grouping

C

Assume that you will be up for a promotion next month and you'd like to impress your boss with your data analytic skills. The company you work for normally books the current month's bad debit for the same amount as the prior month's actual accounts receivable write-offs. Using your general accounting knowledge, explain why this process is not the best method. Next, assuming that you will use a regression analysis, explain the process and describe the data/information you would request/include to perform the analysis.

1. GAAP states that the allowance must still be established in the same accounting period as the sale but is based on an anticipated and estimated figure. Using the prior month's actual write-offs does not follow the matching principal. Also, this method does not account for year-to-year or month-to-month fluctuation. 2. Regression analysis involves the following process: a. Identify the variables that might predict an outcome. i. Independent variables: Current AR aging, Customer payment history, Collections success, Current month's sales b. Determine the functional form of the relationship. i. Produce a scatter plot of sales to actual write-offs over time to determine the prior relationship, which can be used to estimate current and future relationships. c. Identify the parameters of the model. i. Identify the relative weights of each variable ii. Identify the tables that contain the information you need. You can do this by looking through the data dictionary or the relationship model.

All of the following are examples of an unsupervised approach to evaluation data except: A) Similarity matching B) Clustering C) Profiling D) Co-occurrence grouping

A

Data profiling is used to assess data quality and internal controls and typically involves the following steps except: A) Filter the results. B) Identify the objects or activity you want to profile. C) Determine the types of profiling you want to perform. D) Set boundaries or thresholds for the activity.

A

In general, the more complex the model, the greater the chance of ________. A) Overfitting the data B) Underfitting the data C) Pruning the data D) The need to reduce the amount of data considered

A

One of the key tasks of auditors of a bank is to consider the amount of the loan loss reserve. When developing a model to estimate the current year's loan loss reserve amount, which of the follow be least likely to be included as an independent variable? A) Original loan approval amount B) Customer loan history C) Current aged loans D) Collections success

A

Which of the following best describes a supervised approach to the evaluation of data? A) Data exploration that is free from oversight by a superior B) Data exploration that is conducted with direct oversight by a superior C) Data exploration to examine the relationships between variables that are hypothesized to exist D) Data exploration looking for potential patterns of interest

C

Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? A) Clustering B) Regression C) Similarity matching D) Classification

A

The short surveys, regarding dining preferences, requested at the bottom of the restaurant bill are an example of which data approach? A) Clustering B) Regression C) Similarity matching D) Link prediction

A

Understanding and predicting warranty expense is an important determination for manufacturing firms. When using historical claims data to estimate the current period's warranty expense, the historical claims data represents which of the following: A) Independent variable B) Dependent variable C) Function D) Statistical Model

A

Which approach to data analytics attempts to assign each unit in a population into a small set of categories? A) Classification B) Regression C) Similarity matching D) Co-occurrence grouping

A

Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way? A) Clustering B) Data reduction C) Similarity matching D) Co-occurrence grouping

A

Which approach to data analytics attempts to forecast a relationship between two data items? A) Link prediction B) Regression C) Similarity matching D) Co-occurrence grouping

A

Which of the following best describes a dependent variable? A) Output B) Input C) Application D) Operation

A

All of the following are examples of a supervised approach to evaluation data except: A) Causal modeling B) Data reduction C) Link prediction D) Regression

B

Data reduction typically involves the following steps except: A) Identify the attribute you would like to reduce or focus on. B) Identify the parameters of the model. C) Filter the results. D) Interpret the results.

B

Understanding and predicting inventory obsolescence is an important determination for retail companies. When using competitor selling prices to estimate the inventory obsolescence reserve, the inventory obsolescence reserve represents which of the following: A) Independent variable B) Dependent variable C) Function D) Statistical Model

B

When working with a predictive model, underfitting the data is most likely caused by ________. A) an overly complex model B) an overly simple model C) over pruning the data D) a lack of data reduction

B

Which approach to data analytics attempts to predict, for each unit, the numerical value of some variable? A) Classification B) Regression C) Similarity matching D) Link prediction

B

Which of the following best describes an independent variable? A) Output B) Input C) Application D) Operation

B

________ are existing data that have been manually evaluated and assigned a class and ________ are existing data used to evaluate the model. A) Test data; Training data B) Training data; Test data C) Structured data; Unstructured data D) Unstructured data; Structured data

B

Which of the following best describes an unsupervised approach to the evaluation of data? A) Data exploration that is free from oversight by a superior B) Data exploration to examine the relationships between variables that are hypothesized to exist C) Data exploration looking for potential patterns of interest D) Data exploration that is conducted with direct oversight by a superior

C

Benford's Law (be sure to answer all 3 parts): Part A: Briefly describe Benford's Law. Part B: Draw a graph that exemplifies data which conforms to Benford's Law (i.e., what it should look like). Part C: Briefly describe how auditors could utilize Benford's Law while conducting testwork.

Part A: • The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time. • The real world is more likely to have a low number as the first digit (e.g., 1 or 2) than a high number (e.g., 8 or 9). Benford used this realization to predict the likelihood of digit frequencies across a wide variety of data, including financial data Part B: Graph Part C: • Benford's Law can help identify unusual accounting activity. • Auditors can compare the frequency of digits predicted by Benford's Law to the actual frequency in a data file. Any category of numbers outside of expectations (with a margin for error) should be investigated further.

A decision tree can be used to divide data into smaller groups.

True

Alibaba and its attempt to identify seller and customer fraud based on various characteristics known about them is an example of similarity matching.

True

An example of classification would be a credit card company flagging a transaction as being approved or potentially being fraudulent and denying payment.

True

Clustering is a data approach used to divide individuals into groups in a useful or meaningful way.

True

Co-occurrence grouping could be used to match vendors by geographic region.

True

Data reduction is a data approach used to reduce the amount of information that needs to be considered to focus on the most critical items.

True

Fuzzy matching is a computer-assisted technique of finding matches that are less than 100 percent perfect by finding correspondences between portions of the text of each potential match.

True

Regression is a data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

True

The P in IMPACT Cycle represents performing test plan.

True

The data approach used to characterize the typical behavior of an individual, group or population by generating summary statistics about the data is referred to as classification.

True

When considering a question such as "Do our customers form natural groups based on similar attributes?" you would use an unsupervised approach.

True

XBRL is a global standard for exchanging financial reporting information that uses XML.

True

XBRL is used to facilitate the exchange of financial reporting information between the company and the Securities and Exchange Commission.

True

Chapter 3 discussed 5 (five) data analytics approaches or techniques are most common to address our accounting questions. List and define 3 of the 5 data analytics approaches. Next, describe how each of the 3 data analytics approaches you list could be used by credit card companies to identify fraudulent credit card activity.

• Classification: A data approach used to assign each unit in a population into a few categories potentially to help with predictions. o Credit card companies establish models to predict fraud and decide whether to accept or reject a proposed credit card transaction. A potential model may be the following: Transaction approval =f(location of current transaction, location of last transaction, amount of current transaction, prior history of travel of credit card holder, etc.) • Clustering: data approach used to divide individuals (like customers) into groups (or clusters) in a useful or meaningful way. o Heat map could be used to determine if purchases are outside of the person's "home" region • Data reduction: A data approach used to reduce the amount of information that needs to be considered to focus on the most critical items (i.e., highest cost, highest risk, largest impact, etc.). o Looking at the only transactions over a certain dollar threshold • Profiling: A data approach used to characterize the "typical" behavior of an individual, group or population by generating summary statistics about the data (including mean, standard deviations, etc.). o Looking to characteristics such as the amounts, totals, and types of expenditures to identify potential anomalies. For example, depending on the individual's spending history, a $1,000 purchase at Spools and Bolts (a quilting shop) might not be an anomaly. • Regression: A data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model. o Credit card companies establish models to predict fraud and decide whether to accept or reject a proposed credit card transaction. A potential model may be the following: Transaction approval =f(location of current transaction, location of last transaction, amount of current transaction, prior history of travel of credit card holder, etc.)

Decision trees are used to divide data into smaller groups by splitting the data at each branch into two or more groups. However, this method could lead to unintended consequences if the decision tree is not pruned. Describe the pruning process, when it can occur and the benefits.

• Pruning removes branches from a decision tree to avoid overfitting the model. o Prepruning occurs during the model generation. The model stops creating new branches when the information usefulness of an additional branch is low. o Postpruning evaluates the complete model and discards branches after the fact.

What is the difference between structured data and unstructured data? Provide an example of each.

• Structured data are data that are organized and reside in a fixed field with a record or a file. Examples include: Relational database, spreadsheet, or other formats that are readily searchable by search algorithms. • Unstructured data are data that either does not have a pre-defined data model or is not organized in a pre-defined manner. Examples include: Photographs, Instagram, Twitter, or satellite Images.


Conjuntos de estudio relacionados

N400 (E3) Ch 27: Safety, Security, and Emergency Preparedness

View Set

1 HO-3 Section II - Liability Coverages Quiz

View Set

Chapter 4 Choosing A Form of Business Ownership

View Set

India Intro to Comparative Politics

View Set