ACTG Data & Analytics Midterm

¡Supera tus tareas y exámenes ahora con Quizwiz!

a specific type of data profiling that is used to look for correspondences between portions, or segments, of text potential matches is called __________ match

Fuzzy

what are the three types of attributes?

Primary Foreign Descriptive

Clustering is an ______________ method that is used to find natural groupings within the data

unsupervised

what approach is used when you dont have a specific question; when exploring data ex: do our customers form natural groups based on similar attributes?

Unsupervised approach

What is the goal of the classification process?

predict whether an individual we know very little about will belong to one class or another

What is the terminology for removing branches from a decision tree to avoid overfitting the model? segregating pruning classification linear classifiers

pruning

A UML class diagram is used to support and design a ________ database

relational

What does the profiling process do?

relies on gathering summary statistics and identifying outliers

structured data is stored in a database or spreadsheet and are readily ____________

searchable

A _________ approach is used when you are performing analysis that uses historical data to predict a future based on a specific question

supervised

Regression is a __________ method used to predict specific values given an explanatory variable (or variables)

supervised

What is a response variable?

the focus of a study or experiment

Match the classification terminology with its definition training data test data decision tree decision boundaries A) existing data that have been manually evaluated and assigned a class B) a tool that is used to divide data into smaller groups C) a technique used to mark the split between one class and another D) existing data used to evaluate the model

training data - A test data - D decision tree - B decision boundaries - C

How do we evaluate classifiers?

try to avoid overfitting (models that are too accurate) -look for a sweet spot where we maximize the accuracy of the testing data

Profiling is a/an _____________ method that is used to discover patterns of behavior, based on the distance of z-scores from the mean

unsupervised

Using a classification model, you can predict _____________ a new vendor belongs to one class or another based on the behavior of others

whether

In a significant paradigm shift, data analytics will allow auditors to -stay engaged with clients beyond the audit -perform an audit in a less expansive manner -be able to perform an audit much quicker

-stay engaged with clients beyond the audit

What is XBRL used for? -a technique used by analysts to develop models to predict expected outcomes -to provide a description of each field in the tables of relational database -to facilitate the exchange of financial reporting info between a company and the SEC -to look up correspondences between portions, or segments, of a set of text for a potential match

-to facilitate the exchange of financial reporting info between a company and the SEC

what is the purpose of classification -to reduce the amount of detailed info considered to focus on the most interesting or abnormal items -it allows analysts to develop models to predict expected outcomes -to gain an understanding of a typical behavior of an individual, group, population, or sample -to predict which class an observation that we know little about will belong to

-to predict which class an observation that we know little about will belong to

Any transaction that has a Z-score of ________ or above would represent abnormal transactions 1 2 3 4 5

3

Flat file database

Database which consists of just one table. - no interrelationship example: excel table

What is an explanatory variable?

It is an independent variable, not affect by what you do

When you need to retrieve data that is stored in more than one table, which type of clause should you use in your sql query? combine together concatenate join

Join

SQL can extract data from two related tables. Place the following lines of SQL code in order to create a query that would retrieve all of the data from the Sales_Subset and the Customer tables INNER JOIN sale_subset FROM customer ON Customer.CustomerID = SalessubsetcustomerID SELECT A*

SELECT A* FROM customer INNER JOIN sale_subset ON Customer.CustomerID = SalessubsetcustomerID

What approach is used when you are trying to predict a specific outcome based on historical data "will a new customer pay its A/R balance on time"

Supervised approach

In which step of the IMPACT cycle do data and analytics slice and dice the data, find correlations, ask ourselves further questions, ask colleagues what they think and revise and rerun the analysis -track outcomes -perform test data -address and refine results -communicate insights

address and refine results

Data sets that are too large and complex for businesses existing systems to handle are called data analytics big data voluminous

big data

In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit (Risk) score. Which of the following is/are the explanatory variables? debt-to-income ratio credit (risk) score length of employment loan rejection

debt-to-income ratio credit (risk) score length of employment

Place the five steps of the ETL process in order clean the data obtain the data load the data for data analysis validate the data for completeness and integrity determine the purpose and the scope of the data request

determine the purpose and the scope of the data request obtain the data validate the data for completeness and integrity clean the data load the data for data analysis

What does regression allow accountants to do?

develop models to predict expected outcomes its a prediction not necessarily the causation

In the example provided in the text regarding regression in auditing, the analyst is trying to predict the allowance for loan losses based on current aged loans, loan type, customer loan history, and collection success. Select the explanatory variables loan type current aged loans customer loan history allowances for loan losses collection success

loan type current aged loans customer loan history collection success

What is a class Class

manually assigned category or grouping that a record or target is assigned to (fraud/not fraud - accept/reject loan)

In the profiling ex. regarding T&E expenses, which of the following is NOT one of the areas that the analyst would try to uncover? lack of control change in procedures individuals more willing to spend excessively significant variances in standard cost

significant variances in standard cost

Benfords law states that in many natural occurring collections of number, the significant leading digits is likely to be_____________ larger if the data is describing geographic elements larger if the data is describing financial elements small large

small

What is a Target

specific attribute or value that we want to evaluate (fraud/credit score)

In the example of profiling for management accounting regarding advanced environmental recycling technologies, what are they looking for significant variances in? feet of decking recipe standards travel and entertainment expenses standard cost

standard cost

What is a relational database?

structure to recognize relationships among stored items of information -each table (relation) contains one or more data categories in columns (AKA Attributes) -each row (AKA record) contains a unique instances of data for the categories defined by the columns

Normalized database

*relational database enables users to manage predefined data relationships across many data entities

Name the unsupervised approaches

-Data reduction (aggregate data) -Profiling (typical behavior) -Co-occurance grouping (events that happen together) -Clustering (undiscovered ground)

Name the supervised approaches

-Similarly matching (natural grouping) -Link predictions (social networking) -Classification (whether or not) -Casual modeling (event influences other) -Regression (how much)

select the appropriate definition for regression -a method that can be used to predict the class of a new observation -a method used to predict specific values -a method for simplifying large datasets into obvious categories

-a method used to predict specific values

When evaluating classifiers, you need to be careful to strike a balance between what two things? -explanatory and respond variables -complexity of the model and accuracy of the classification -positive and negative relationships in the model

-complexity of the model and accuracy of the classification

After you have identified the objects or activity you wish to profile, what should you do next? -interpret the results and monitor the activity -follow up on exceptions -set boundaries of thresholds for the activity -determine the types of profiling you want to perform

-determine the types of profiling you want to perform

The Forbes insight/ KPMG report, "audit 2020: a focus on change" found that the vast majority of survey respondents believe that technology will -will make auditing more challenging because digital assets are harder to audit than hard assets -enhance the quality, transparency and accuracy of the audit -be required to audit companies in the near future

-enhance the quality, transparency and accuracy of the audit

What does IMPACT stand for?

-identify questions -master the data -perform testing -address refine -communicate insights -track outcomes

Place the steps of data reduction in order: -follow up on the results -interpret the results -identify the attribute you would like to reduce or focus on -filter the results

-identify the attribute you would like to reduce or focus on -filter the results -interpret the results -follow up on the results

Place the steps of classification into order -identify the classes you wish to predict -divide your data into training and testing sets -manually classify an existing set of records -generate your model -select a set of classification models -interpret the results and select the "best" model

-identify the classes you wish to predict -manually classify an existing set of records -select a set of classification models -divide your data into training and testing sets -generate your model -interpret the results and select the "best" model

Place the steps of profiling in order, from 1 to 5 -interpret the results and monitor the activity and/or generate a list of exceptions -set boundaries or thresholds for the activity -identify the objects or activity you want to profile -determine the types of profiling you want to perform -follow up on exceptions

-identify the objects or activity you want to profile -determine the types of profiling you want to perform -set boundaries or thresholds for the activity -interpret the results and monitor the activity and/or generate a list of exceptions -follow up on exceptions

What is the 3 step regression process?

-identify the variables -determine the functional form of the relationship -identify the parameters of the model

When is a foreign key required? -if two tables are related in a relational database, one of the two must have a foreign key -if two tables are related in a relational database, they both must have a foreign key -foreign keys never required, they are optional attributes -every table in a relational database requires a foreign key

-if two tables are related in a relational database, one of the two must have a foreign key

What is the purpose of regression analysis? -to predict the class of a new observation -it allows analysts to develop models to predict expected outcomes -to reduce the amount of detailed info considered to focus on the most interesting or abnormal items -to gain an understanding of a typical behavior of an individual, group, population, or sample

-it allows analysts to develop models to predict expected outcomes

in the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In the scenario, select the explanatory variables -salaries offered by other accounting firms -employee turnover -current professional salaries -health of the economy

-salaries offered by other accounting firms -current professional salaries -health of the economy

What is the purpose of data reduction? -to predict the class of a new observation -to gain an understanding of a typical behavior of an individual, group, population, or sample -to estimate or predict, for each unit, the numerical of some variable -to reduce the amount of detailed info considered to focus on the most interesting or abnormal items

-to reduce the amount of detailed info considered to focus on the most interesting or abnormal items

What are the 4 steps to the data reduction process?

1. identify the attribute 2. filter 3. interpret 4. follow up on the results

Classification 6-step process

1. identify the class (es) you wish to predict 2. manually classify an existing set of records 3. select a set of classification models 4. divide your data into training and testing sets 5. generate your model 6. interpret the results and select the best model

What is the 5-step profiling process?

1. identify the objects or activity 2. determine the types of profiling 3. set boundaries 4. interpret results 5. follow up on exceptions

Classify each of the 5 steps of the ETL process as part of the extraction, transformation, or loading -load the data for Dara analysis -determine the purpose and scope of the data request obtain -clean the data -obtain the data -validate the data for completeness and integrity

Extraction: -determine the purposed scope of the date request -obtain the data Transformation: -validate the data for completeness and integrity -clean the data Loading -load the data for data analysis

What is data?

Facts and statistics collected together for reference or analysis

a class is a manually assigned ________ applied to a record based on an event

category

after data and analytics slice and dice the data, find correlations, ask ourselves further questions, ask colleagues what they think, and revise and rerun the analysis, what comes next in the IMPACT cycle? Communicate insights Track outcomes Address and refine results perform test data

communicate insights

Benefits of relational database

ensures that data is -complete -not redundant -follow business rules and internal controls -aid communication and integration of business process

A target is an expected attribute or value that you want to __________

evaluate

when clustering works well, observations within a segment should be different, and the data across segments should be very similar true or false

false


Conjuntos de estudio relacionados

Chapter 20 aggregate supply and demand

View Set

Salesforce B2C Commerce Developer_LUU_DAT_FPT

View Set

Astronomy: The Earth, Sun, and Moon

View Set