ACTG Data & Analytics Midterm
a specific type of data profiling that is used to look for correspondences between portions, or segments, of text potential matches is called __________ match
Fuzzy
what are the three types of attributes?
Primary Foreign Descriptive
Clustering is an ______________ method that is used to find natural groupings within the data
unsupervised
what approach is used when you dont have a specific question; when exploring data ex: do our customers form natural groups based on similar attributes?
Unsupervised approach
What is the goal of the classification process?
predict whether an individual we know very little about will belong to one class or another
What is the terminology for removing branches from a decision tree to avoid overfitting the model? segregating pruning classification linear classifiers
pruning
A UML class diagram is used to support and design a ________ database
relational
What does the profiling process do?
relies on gathering summary statistics and identifying outliers
structured data is stored in a database or spreadsheet and are readily ____________
searchable
A _________ approach is used when you are performing analysis that uses historical data to predict a future based on a specific question
supervised
Regression is a __________ method used to predict specific values given an explanatory variable (or variables)
supervised
What is a response variable?
the focus of a study or experiment
Match the classification terminology with its definition training data test data decision tree decision boundaries A) existing data that have been manually evaluated and assigned a class B) a tool that is used to divide data into smaller groups C) a technique used to mark the split between one class and another D) existing data used to evaluate the model
training data - A test data - D decision tree - B decision boundaries - C
How do we evaluate classifiers?
try to avoid overfitting (models that are too accurate) -look for a sweet spot where we maximize the accuracy of the testing data
Profiling is a/an _____________ method that is used to discover patterns of behavior, based on the distance of z-scores from the mean
unsupervised
Using a classification model, you can predict _____________ a new vendor belongs to one class or another based on the behavior of others
whether
In a significant paradigm shift, data analytics will allow auditors to -stay engaged with clients beyond the audit -perform an audit in a less expansive manner -be able to perform an audit much quicker
-stay engaged with clients beyond the audit
What is XBRL used for? -a technique used by analysts to develop models to predict expected outcomes -to provide a description of each field in the tables of relational database -to facilitate the exchange of financial reporting info between a company and the SEC -to look up correspondences between portions, or segments, of a set of text for a potential match
-to facilitate the exchange of financial reporting info between a company and the SEC
what is the purpose of classification -to reduce the amount of detailed info considered to focus on the most interesting or abnormal items -it allows analysts to develop models to predict expected outcomes -to gain an understanding of a typical behavior of an individual, group, population, or sample -to predict which class an observation that we know little about will belong to
-to predict which class an observation that we know little about will belong to
Any transaction that has a Z-score of ________ or above would represent abnormal transactions 1 2 3 4 5
3
Flat file database
Database which consists of just one table. - no interrelationship example: excel table
What is an explanatory variable?
It is an independent variable, not affect by what you do
When you need to retrieve data that is stored in more than one table, which type of clause should you use in your sql query? combine together concatenate join
Join
SQL can extract data from two related tables. Place the following lines of SQL code in order to create a query that would retrieve all of the data from the Sales_Subset and the Customer tables INNER JOIN sale_subset FROM customer ON Customer.CustomerID = SalessubsetcustomerID SELECT A*
SELECT A* FROM customer INNER JOIN sale_subset ON Customer.CustomerID = SalessubsetcustomerID
What approach is used when you are trying to predict a specific outcome based on historical data "will a new customer pay its A/R balance on time"
Supervised approach
In which step of the IMPACT cycle do data and analytics slice and dice the data, find correlations, ask ourselves further questions, ask colleagues what they think and revise and rerun the analysis -track outcomes -perform test data -address and refine results -communicate insights
address and refine results
Data sets that are too large and complex for businesses existing systems to handle are called data analytics big data voluminous
big data
In the example regarding the LendingClub data in which the analyst is researching loan rejection, they identified three possible indicators for why a loan would be rejected, the debt-to-income ratio, length of employment, and credit (Risk) score. Which of the following is/are the explanatory variables? debt-to-income ratio credit (risk) score length of employment loan rejection
debt-to-income ratio credit (risk) score length of employment
Place the five steps of the ETL process in order clean the data obtain the data load the data for data analysis validate the data for completeness and integrity determine the purpose and the scope of the data request
determine the purpose and the scope of the data request obtain the data validate the data for completeness and integrity clean the data load the data for data analysis
What does regression allow accountants to do?
develop models to predict expected outcomes its a prediction not necessarily the causation
In the example provided in the text regarding regression in auditing, the analyst is trying to predict the allowance for loan losses based on current aged loans, loan type, customer loan history, and collection success. Select the explanatory variables loan type current aged loans customer loan history allowances for loan losses collection success
loan type current aged loans customer loan history collection success
What is a class Class
manually assigned category or grouping that a record or target is assigned to (fraud/not fraud - accept/reject loan)
In the profiling ex. regarding T&E expenses, which of the following is NOT one of the areas that the analyst would try to uncover? lack of control change in procedures individuals more willing to spend excessively significant variances in standard cost
significant variances in standard cost
Benfords law states that in many natural occurring collections of number, the significant leading digits is likely to be_____________ larger if the data is describing geographic elements larger if the data is describing financial elements small large
small
What is a Target
specific attribute or value that we want to evaluate (fraud/credit score)
In the example of profiling for management accounting regarding advanced environmental recycling technologies, what are they looking for significant variances in? feet of decking recipe standards travel and entertainment expenses standard cost
standard cost
What is a relational database?
structure to recognize relationships among stored items of information -each table (relation) contains one or more data categories in columns (AKA Attributes) -each row (AKA record) contains a unique instances of data for the categories defined by the columns
Normalized database
*relational database enables users to manage predefined data relationships across many data entities
Name the unsupervised approaches
-Data reduction (aggregate data) -Profiling (typical behavior) -Co-occurance grouping (events that happen together) -Clustering (undiscovered ground)
Name the supervised approaches
-Similarly matching (natural grouping) -Link predictions (social networking) -Classification (whether or not) -Casual modeling (event influences other) -Regression (how much)
select the appropriate definition for regression -a method that can be used to predict the class of a new observation -a method used to predict specific values -a method for simplifying large datasets into obvious categories
-a method used to predict specific values
When evaluating classifiers, you need to be careful to strike a balance between what two things? -explanatory and respond variables -complexity of the model and accuracy of the classification -positive and negative relationships in the model
-complexity of the model and accuracy of the classification
After you have identified the objects or activity you wish to profile, what should you do next? -interpret the results and monitor the activity -follow up on exceptions -set boundaries of thresholds for the activity -determine the types of profiling you want to perform
-determine the types of profiling you want to perform
The Forbes insight/ KPMG report, "audit 2020: a focus on change" found that the vast majority of survey respondents believe that technology will -will make auditing more challenging because digital assets are harder to audit than hard assets -enhance the quality, transparency and accuracy of the audit -be required to audit companies in the near future
-enhance the quality, transparency and accuracy of the audit
What does IMPACT stand for?
-identify questions -master the data -perform testing -address refine -communicate insights -track outcomes
Place the steps of data reduction in order: -follow up on the results -interpret the results -identify the attribute you would like to reduce or focus on -filter the results
-identify the attribute you would like to reduce or focus on -filter the results -interpret the results -follow up on the results
Place the steps of classification into order -identify the classes you wish to predict -divide your data into training and testing sets -manually classify an existing set of records -generate your model -select a set of classification models -interpret the results and select the "best" model
-identify the classes you wish to predict -manually classify an existing set of records -select a set of classification models -divide your data into training and testing sets -generate your model -interpret the results and select the "best" model
Place the steps of profiling in order, from 1 to 5 -interpret the results and monitor the activity and/or generate a list of exceptions -set boundaries or thresholds for the activity -identify the objects or activity you want to profile -determine the types of profiling you want to perform -follow up on exceptions
-identify the objects or activity you want to profile -determine the types of profiling you want to perform -set boundaries or thresholds for the activity -interpret the results and monitor the activity and/or generate a list of exceptions -follow up on exceptions
What is the 3 step regression process?
-identify the variables -determine the functional form of the relationship -identify the parameters of the model
When is a foreign key required? -if two tables are related in a relational database, one of the two must have a foreign key -if two tables are related in a relational database, they both must have a foreign key -foreign keys never required, they are optional attributes -every table in a relational database requires a foreign key
-if two tables are related in a relational database, one of the two must have a foreign key
What is the purpose of regression analysis? -to predict the class of a new observation -it allows analysts to develop models to predict expected outcomes -to reduce the amount of detailed info considered to focus on the most interesting or abnormal items -to gain an understanding of a typical behavior of an individual, group, population, or sample
-it allows analysts to develop models to predict expected outcomes
in the example provided in the text regarding employee turnover, the analyst is trying to predict employee turnover based on current professional salaries, health of the economy (GDP), and salaries offered by other accounting firms. In the scenario, select the explanatory variables -salaries offered by other accounting firms -employee turnover -current professional salaries -health of the economy
-salaries offered by other accounting firms -current professional salaries -health of the economy
What is the purpose of data reduction? -to predict the class of a new observation -to gain an understanding of a typical behavior of an individual, group, population, or sample -to estimate or predict, for each unit, the numerical of some variable -to reduce the amount of detailed info considered to focus on the most interesting or abnormal items
-to reduce the amount of detailed info considered to focus on the most interesting or abnormal items
What are the 4 steps to the data reduction process?
1. identify the attribute 2. filter 3. interpret 4. follow up on the results
Classification 6-step process
1. identify the class (es) you wish to predict 2. manually classify an existing set of records 3. select a set of classification models 4. divide your data into training and testing sets 5. generate your model 6. interpret the results and select the best model
What is the 5-step profiling process?
1. identify the objects or activity 2. determine the types of profiling 3. set boundaries 4. interpret results 5. follow up on exceptions
Classify each of the 5 steps of the ETL process as part of the extraction, transformation, or loading -load the data for Dara analysis -determine the purpose and scope of the data request obtain -clean the data -obtain the data -validate the data for completeness and integrity
Extraction: -determine the purposed scope of the date request -obtain the data Transformation: -validate the data for completeness and integrity -clean the data Loading -load the data for data analysis
What is data?
Facts and statistics collected together for reference or analysis
a class is a manually assigned ________ applied to a record based on an event
category
after data and analytics slice and dice the data, find correlations, ask ourselves further questions, ask colleagues what they think, and revise and rerun the analysis, what comes next in the IMPACT cycle? Communicate insights Track outcomes Address and refine results perform test data
communicate insights
Benefits of relational database
ensures that data is -complete -not redundant -follow business rules and internal controls -aid communication and integration of business process
A target is an expected attribute or value that you want to __________
evaluate
when clustering works well, observations within a segment should be different, and the data across segments should be very similar true or false
false