ch 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

25) Which approach to data analytics attempts to assign each unit in a population into a small set of categories? A) Classification B) Regression C) Similarity matching D) Co-occurrence grouping

a

26) Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way? A) Clustering B) Data reduction C) Similarity matching D) Co-occurrence grouping

a

29) Which approach to data analytics attempts to forecast a relationship between two data items? A) Link prediction B) Regression C) Similarity matching D) Co-occurrence grouping

a

22) All of the following are examples of an unsupervised approach to evaluation data except: A) Similarity matching B) Clustering C) Profiling D) Co-occurrence grouping

a

34) Data profiling is used to assess data quality and internal controls. It typically involves the following steps except: A) Filter the results. B) Identify the objects or activity you want to profile. C) Determine the types of profiling you want to perform. D) Set boundaries or thresholds for the activity.

a

38) In general, the more complex the model, the greater the chance of ________. A) Overfitting the data B) Underfitting the data C) Pruning the data D) The need to reduce the amount of data considered

a

41) Which of the following best describes a dependent variable? A) Output B) Input C) Application D) Operation

a

43) Understanding and predicting warranty expense is an important determination for manufacturing firms. When using historical claims data to estimate the current period's warranty expense, the historical claims data represents which of the following? A) Independent variable B) Dependent variable C) Function D) Statistical Model

a

44) One of the key tasks of bank auditors is to consider the amount of the loan loss reserve. When developing a model to estimate the current year's loan loss reserve amount, which of the following would be least likely to be included as an independent variable? A) Original loan approval amount B) Customer loan history C) Current aged loans D) Collections success

a

45) The short surveys regarding dining preferences requested at the bottom of the restaurant bill are an example of which data approach? A) Clustering B) Regression C) Similarity matching D) Link prediction

a

46) Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? A) Clustering B) Regression C) Similarity matching D) Classification

a

21) All of the following are examples of a supervised approach to evaluation data except: A) Causal modeling B) Data reduction C) Link prediction D) Regression

b

30) Which approach to data analytics attempts to predict, for each unit, the numerical value of some variable? A) Classification B) Regression C) Similarity matching D) Link prediction

b

36) Data reduction typically involves the following steps except: A) Identify the attribute you would like to reduce or focus on. B) Identify the parameters of the model. C) Filter the results. D) Interpret the results.

b

37) When working with a predictive model, under fitting the data is most likely caused by ________. A) an overly complex model B) an overly simple model C) over pruning the data D) a lack of data reduction

b

40) Which of the following best describes an independent variable? A) Output B) Input C) Application D) Operation

b

42) Understanding and predicting inventory obsolescence is an important determination for retail companies. When using competitor selling prices to estimate the inventory obsolescence reserve, the inventory obsolescence reserve represents which of the following? A) Independent variable B) Dependent variable C) Function D) Statistical Model

b

47) ________ is existing data that has been manually evaluated and assigned a class and ________ is existing data used to evaluate the model. A) Test data; Training data B) Training data; Test data C) Structured data; Unstructured data D) Unstructured data; Structured data

b

23) Which of the following best describes an unsupervised approach to the evaluation of data? A) Data exploration that is free from oversight by a superior B) Data exploration that examines the relationships between variables that are hypothesized to exist C) Data exploration that looks for potential patterns of interest D) Data exploration that is conducted with direct oversight by a superior

c

24) Which of the following best describes a supervised approach to the evaluation of data? A) Data exploration that is free from oversight by a superior B) Data exploration that is conducted with direct oversight by a superior C) Data exploration that examines the relationships between variables that are hypothesized to exist D) Data exploration that looks for potential patterns of interest

c

27) Which approach to data analytics attempts to identify similar individuals based on data known about them? A) Classification B) Clustering C) Similarity matching D) Co-occurrence grouping

c

31) Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? A) Classification B) Regression C) Profiling D) Link prediction

c

32) ________ refers to data that is stored in a database or spreadsheet that is readily searchable. A) Training data B) Unstructured data C) Structured data D) Test data

c

35) Regression analysis typically involves the following steps except: A) Identify the variables that might predict an outcome. B) Identify the parameters of the model. C) Set boundaries or thresholds. D) Determine the functional form of the relationship.

c

48) ________ mark the split between one class and another. A) Decision trees B) Identifying questions C) Decision boundaries D) Linear classifiers

c

49) ________ states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. A) Leading digits hypothesis B) Moore's law C) Benford's law D) Classification

c

50) Unaware of data analysis tools available to the internal auditors, a store employee frequently processes cash returns without a receipt for $99, which is just below the amount requiring manager approval of $100. An analysis using which of the following would likely (and quickly) identify the employee's fraudulent behavior? A) Leading digits hypothesis B) Moore's law C) Benford's law D) Clustering

c

28) Which approach to data analytics attempts to discover associations between individuals based on transactions involving them? A) Classification B) Regression C) Similarity matching D) Co-occurrence grouping

d

33) Using social media to look for relationships between related parties that are not otherwise disclosed to identify related party transactions is an example of ________. A) Classification B) Regression C) Profiling D) Link prediction

d

39) While overfitting data could lead to an error rate of 0 (zero), it is unlikely that you would be able to ________ your results. A) define B) specify C) articulate D) generalize

d

1) Benford's Law is an absolute and all data must conform.

false

17) Data profiling typically involves unstructured data.

false

18) A target is a manually assigned category applied to a record based on an event.

false

20) Co-occurrence grouping is an example of a supervised approach.

false

5) Link prediction is a data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

false

6) Existing data that has been manually evaluated and assigned a class is often referred to as test data.

false

8) Fuzzy matching is a data approach used to identify similar individuals based on data known about them.

false

10) Fuzzy matching is a computer-assisted technique of finding matches that are less than 100 percent perfect by finding correspondences between portions of the text of each potential match.

true

11) The P in IMPACT Cycle represents performing test plan.

true

12) Clustering is a data approach used to divide individuals into groups in a useful or meaningful way.

true

13) An example of classification would be a credit card company flagging a transaction as being approved or potentially being fraudulent and denying payment.

true

14) The data approach used to characterize the typical behavior of an individual, group or population by generating summary statistics about the data is referred to as classification.

true

15) XBRL is a global standard for exchanging financial reporting information that uses XML.

true

16) XBRL is used to facilitate the exchange of financial reporting information between the company and the Securities and Exchange Commission.

true

19) When considering a question such as "Do our customers form natural groups based on similar attributes?" you would use an unsupervised approach.

true

2) A decision tree can be used to divide data into smaller groups.

true

3) Data reduction is a data approach used to reduce the amount of information that needs to be considered to focus on the most critical items.

true

4) Regression is a data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

true

7) Co-occurrence grouping could be used to match vendors by geographic region.

true

9) Alibaba and its attempt to identify seller and customer fraud based on various characteristics known about them is an example of similarity matching.

true

52) Decision trees are used to divide data into smaller groups by splitting the data at each branch into two or more groups. However, this method could lead to unintended consequences if the decision tree is not pruned. Describe the pruning process, when it can occur and the benefits of using it.

• Pruning removes branches from a decision tree to avoid overfitting the model. o Prepruning occurs during the model generation. The model stops creating new branches when the information usefulness of an additional branch is low. o Postpruning evaluates the complete model and discards branches after the fact.

51) What is the difference between structured data and unstructured data? Provide an example of each.

• Structured data are data that are organized and reside in a fixed field with a record or a file. Examples include: Relational database, spreadsheet, or other formats that are readily searchable by search algorithms. • Unstructured data are data that either does not have a pre-defined data model or is not organized in a pre-defined manner. Examples include: Photographs, Instagram, Twitter, or satellite Images.


Conjuntos de estudio relacionados

SPEECH, ESSENTIALS OF COMMUNICATION, COMMUNICATION FOUNDATIONS

View Set

AP World History Period 3 Objectives

View Set

PrepUs for Pediatrics Chapter 28

View Set