Data Analytics 6220 - Final Review
CAATs are automated scripts that can be used to validate data, test controls, and enable substantive testing of transaction details or account balances and generate supporting evidence for the audit. What does CAAT stand for? - Computer-aided audit techniques - Computerized audit and accounting techniques - Computer-assisted audit techniques - Computerized audit aids and tests
Computer-assisted audit techniques
When preparing data, analysts use the ETL process. ETL stands for Explore, Transfer, Load. True or False
False
What describes finding correspondences between at least two types of text or entries that may not match perfectly? - Algorithmic matching - Fuzzy matching - Incomplete matching - Incomplete linkages
Fuzzy matching
By the year 2024, the volume of data created, captured, copied, and consumed worldwide will be 149 _____________blank. - zettabytes - yottabytes - petabytes - exabytes
zettabytes
As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? - Clean up trailing zeroes. - Remove headings and subtotals. - Format negative numbers. - Correct inconsistencies across data.
Clean up trailing zeroes
According to the text, which of the following is NOT true? - Data analytics is the process of analyzing raw data to answer questions or provide insights. - Data are raw figures and facts. - An example of an SSBI tool is PowerPoint. - Information is the knowledge gained from data.
An example of an SSBI tool is PowerPoint.
An observation about the frequency of leading digits in many real-life sets of numerical data is called: - leading digits hypothesis. - Benford's law. - Moore's law. - clustering.
Benford's law.
Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs? - Regression - Similarity matching - Classification - Co-occurrence grouping
Classification
Which data approach attempts to assign each unit in a population into a small set of classes (or groups) where the unit best fits? - Co-occurrence grouping - Similarity matching - Regression - Classification
Classification
Which testing approach would be used to predict whether certain cases should be evaluated as having fraud or no fraud? - Classification - Sentiment analysis - Artificial intelligence - Probability
Classification
Which skills were not emphasized that analytic-minded accountants should have? - Classification of test approaches - Developed an analytics mindset - Data scrubbing and data preparation - Statistical data analysis competency
Classification of test approaches
Which of the following is NOT one of the basic excel functions used in foundational analysis - AVERAGEIF - DISPLAYIF - SUMIF - COUNTIF
DISPLAYIF
Determining if the analysis makes senses is associated with.... - Neither Data Analysis Interpretation nor Data Exploration - Data Analysis Interpretation - Data Exploration - Both Data Analysis Interpretation and Data Exploration
Data Analysis Interpretation
In which areas were skills not emphasized for analytic-minded accountants? - Data and systems analysis and design - Data visualization and data reporting - Data quality - Descriptive data analysis
Data and systems analysis and design
The metadata that describes each attribute in a database is which of the following? - Composite primary key - Flat file - Descriptive attributes - Data dictionary
Data dictionary
Which of these terms is defined as being a central repository of descriptions for all of the data attributes of the dataset? - Data Analytics - Big Data - Data dictionary - Data warehouse
Data dictionary
Which of the following best describes an unsupervised approach to the evaluation of data? - Data exploration that is free from oversight by a superior - Data exploration looking for potential patterns of interest - Data exploration to examine the relationships between variables that are hypothesized to exist - Data exploration that is conducted with direct oversight by a superior
Data exploration looking for potential patterns of interest
Understanding "why" something happening in your analysis is called _________ analytics. - Prescriptive - Predictive - Descriptive - Diagnostic
Diagnostic
Auditing financial statements, and its desire to look for errors, anomalies, and possible fraud, is most consistent with which type of analytics? - Prescriptive analytics - Descriptive analytics - Predictive analytics - Diagnostic analytics
Diagnostic analytics
Which type of audit analytics might be used to find hidden patterns or variables linked to abnormal behavior? - Diagnostic analytics - Descriptive analytics - Prescriptive analytics - Predictive analytics
Diagnostic analytics
Which items would be currently out of the scope of Data Analytics? - Evaluation of time stamps to evaluate workflow - Evaluation of phantom vendors - Direct observation of processes - Duplicate payment of invoices
Direct observation of processes
Which of the following questions are NOT suggested by the Institute of Business Ethics to allow a business to create value from data use and analysis, and still protect the privacy of stakeholders? - How does the company use data, and to what extent is it integrated into firm strategy? - Does the company have the appropriate tools to mitigate the risks of data misuse? - Does the data used by the company include personally identifiable information? - Does the company send a privacy notice to individuals when their personal data is collected?
Does the data used by the company include personally identifiable information?
According to the textbook, an example of a tax cost KPI would be: - levels of technology/tax training. - employee turnover of the tax personnel. - levels of late filing or error penalties. - ETR (effective tax rate).
ETR (effective tax rate).
In the data analysis process, "C" in the MOSAIC model stands for "Cleaning". True or False
False
Questions with single dimensions should be answered with pivot tables, questions with multiple dimensions should be answered with excel functions. True or False
False
Which is the best tool when the desired result is known, but not the input value for a single variable that will achieve that result? - Data Analysis - Scenario Manager - Goal Seek - Linear Regression
Goal Seek
The advantages of storing data in a relational database include which of the following? - Help in enforcing business rules. - Integrating business processes. - Increased information redundancy. - Help in enforcing business rules and integrating business processes.
Help in enforcing business rules and integrating business processes
Which of the following describes part of the goal of the ETL process? - Communicate the results and insights found through the analysis. - Load the data into a relational database for storage. - Identify and obtain the data needed for solving the problem. - Identify which approach to data analytics should be used.
Identify and obtain the data needed for solving the problem.
There are 4 types of joins used to link tables together, which type of join DOES NOT result in any null values being produced? - Right - Inner - Full - Left
Inner
Why is Supplier ID considered to be a primary key for a Supplier table? - It is used to identify different supplier categories. - It can either be for a vendor or miscellaneous provider. - It is a 10-digit number. - It contains a unique identifier for each supplier.
It contains a unique identifier for each supplier.
Which of the following analysis can predict a future outcome? - Crosstabulation analysis - Linear Regression - Linear Optimization - Standard deviation
Linear Regression
Which approach to data analytics attempts to predict a relationship between two data items? - Link prediction - Classification - Similarity matching - Co-occurrence grouping
Link prediction
Which data approach attempts to predict connections between two data items? - Link prediction - Regression - Profiling - Classification
Link prediction
_____________ data would be considered the least sophisticated type of data. - Ratio - Ordinal - Nominal - Interval
Nominal
Which attribute is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table? - Primary key - Foreign key - Unique identifier - Key attribute
Primary key
Line charts are not recommended for what type of data? - Qualitative data - Normalized data - Continuous data - Trend lines
Qualitative data
_____________blank data would be considered the most sophisticated type of data. - Ratio - Interval - Nominal - Ordinal
Ratio
Which testing approach would be useful in assessing the value of inventory shrinkage given multiple environmental factors? - Regression - Probability - Sentiment analysis - Applied statistics
Regression
According to the text, the data analysis process is comprised of three equally important stages, which of the following is NOT one of those stages? - Plan - Review - Analyze - Report
Review
Which of the following is not a typical example of nominal data? - SAT scores - Hair color - Gender - Ethnic group
SAT scores
Which of the following is the best choice as the unique identifier for each sales order? - Sales_Order# - Purchase_Order# - Item# - Shipping#
Sales_Order#
What is the most appropriate chart when showing a relationship between two variables (according to Exhibit 4-12)? --picture i can't insert-- - Histogram - Scatter chart - Pie graph - Bar chart
Scatter chart
What type of analysis would help auditors find missing checks? - Sequence check - Decision support systems - Benford's law analysis - Fuzzy matching
Sequence check
Which data approach attempts to identify similar individuals based on data known about them? - Classification - Similarity matching - Data reduction - Regression
Similarity matching
In the late 1960s, Ed Altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic Altman's Z-score, now a widely used score in finance. Based on the name of the statistic, which statistical distribution would you guess this came from? - Poisson distribution - Normal distribution - Uniform distribution - Standardized normal distribution
Standardized normal distribution
What allows tax departments to view multiple years, periods, jurisdictions (state or federal or international, etc.), and differing scenarios of data, typically through use of a dashboard? - Tax data visualizations - Tax planning - Tax compliance data - Tax data warehouses
Tax data visualizations
_____________blank is a set of data used to assess the degree and strength of a predicted relationship. - Unstructured data - Structured data - Test data - Training data
Test data
In a regression model prepared to predict revenue, which of the following is the correct interpretation of an adjusted R-squared of 0.85? - The independent variables in the model can explain 85% of the change in revenue - The adjusted R-squared is too small of a number for us to rely on the model - The dependent variable in the model can explain 85% of the change in independent variables - Revenue will increase by 85% next year
The independent variables in the model can explain 85% of the change in revenue
In which stage of the IMPACT model (introduced in Chapter 1) would the use of tax cockpits fit? - Address and refine results - Perform test plan - Master the data - Track outcomes
Track outcomes
If the objective is to use historical data to identify patterns, which is the best analysis to use? - Linear optimization - Linear regression - Trend analysis - Frequency distribution
Trend analysis
Power BI is considered an example of an SSBI tool. True or False
True
The CPA Exam and the CMA Exam both include topics on data analytics . True or False
True
An appropriate analysis to use to determine how many times an event has occurred would be - a measure of location - linear optimization - a measure of dispersion - a frequency distribution
a frequency distribution
When examining the relationship between two variables, if one variable increases as the other variable decreases the relationship is - uncorrelated - a positive correlation - perfectly correlated - a negative correlation
a negative correlation
Which is consistent with the Data Analytics Mindset? - an open mind for learning new technologies - asking why when interpreting results - having skills like critical thinking, data literacy, technological agility,and communication abilities. - all of these
all of these
According to the textbook, an example of a tax efficiency and effectiveness KPI would be: - ETR (effective tax rate) over time. - amount of time spent on compliance versus strategic activities. - number of audits closed. - number of resubmitted tax returns due to errors.
amount of time spent on compliance versus strategic activities.
An anomaly is - always eliminated - an indication of fraud - always an outlier - an observation that deviates from what is normal or expected
an observation that deviates from what is normal or expected
A visualization of a chart that compares actual vs expected monthly revenue would probably be found in the _________ area. - auditing - tax - managerial accounting - financial accounting
auditing
An analysis prepared to support a predetermined belief is an example of - affect bias - confirmation bias - expectation bias - selection bias
confirmation bias
The IMPACT cycle specifically includes all except the following steps: - communicate insights. - data preparation. - address and refine results. - perform test plan.
data preparation
Simultaneously filtering for multiple dimensions is called - data cleansing - data slicing - data manipulation - data managing
data slicing
Mastering the data can also be described via the ETL process. The ETL process stands for: - extract, total, and load data. - extract, transform, and load data. - enter, total, and load data. - enter, transform, and load data.
extract, transform, and load data.
A __________ is a bar chart of frequency distributions where the height of the bar represents the count of items in the interval - box & whisker chart - scattergram - histogram - plot
histogram
The Fahrenheit scale of temperature measurement would best be described as an example of: - continuous data. - discrete data. - nominal data. - interval data.
interval data.
In a relational database table, a primary key is - is the same as a foreign key - can be repeated in the table if needed - is not always needed - is a unique value
is a unique value
According to the textbook, an example of a tax risk KPI would be: - levels of technology/tax training. - ETR (effective tax rate). - levels of late filing or error penalties. - employee turnover of the tax personnel.
levels of late filing or error penalties.
Exhibits 4-12 gives chart suggestions for what data you'd like to portray. Those options include all of the following except: -- picture I can't get in-- - relationship between variables. - normal distribution curves. - outlier detection. - geographic data.
normal distribution curves.
According to the textbook, an example of a tax sustainability KPI would be: - levels of technology/tax training. - number of audits closed and significance of assessment over time. - level of job satisfaction of the tax personnel. - frequency of concerns pertaining to the organization's tax position.
number of audits closed and significance of assessment over time.
Letter grades of A, B, and C would be best described as an example of: - interval data. - ratio data. - nominal data. - ordinal data.
ordinal data.
In general, the more complex the model, the greater the chance of: - a more accurate prediction of the data. - underfitting the data. - overfitting the data. - pruning the data.
overfitting the data.
The determinants for sample size include all of the following except: - potential risk of account. - tolerable misstatement. - estimated misstatement. - confidence level.
potential risk of account.
In preparing data, the process of reviewing the data for possible issues is called - priming - descriptive review - positioning - profiling - concatenate
profiling
An action request made to a database is called a(n) - placement - stripe command - query - extraction
query
Most of the data you will work with will come from - relational databases - private companies - self-generated data sets - tech companies
relational databases
Database elements can be represented in the REA model, the model's elements are.. - revenues, elements, actions - resources, expenses, agents - resources, events, agents - revenues, expenses, assets
resources, events, agents
These data are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. The term matching this definition is: - structured data. - unstructured data. - test data. - training data.
structured data.
Tax departments interested in maintaining their own data are likely to have their own: - tax analytics. - tax reporting system. - tax data mart. - tax dashboard.
tax data mart.
Predictive analysis of potential tax liability and the formulation of a plan to reduce the amount of taxes paid is defined as: - tax planning - tax compliance data - tax data analytics - tax data warehouses
tax planning
The task of tax accountants and tax departments to minimize the amount of taxes paid in the future is called: - tax minimization. - tax sustainability. - tax planning. - tax compliance.
tax planning.
Models associated with regression and classification data approaches have all have these important parts except: - identifying which variables (we'll call these independent variables) might help predict an outcome (we'll call this the dependent variable). - the numeric parameters of the model (detailing the relative weights of each of the variables associated with the prediction). - the functional form of the relationship (linear, nonlinear, etc.). - test data.
test data.
Benford's law suggests that the first digit of naturally occurring numerical datasets follow an expected distribution where: - the leading digit of 8 is more common than 9. - the leading digit of 6 is more common than 5. - the leading digit of 9 is more common than 2. - the leading digit of 4 is more common than 3.
the leading digit of 8 is more common than 9.
The purpose of transforming data is: - to load the data into the appropriate tool for analysis. - to identify which data are necessary to complete the analysis. - to validate the data for completeness and integrity. - to obtain the data from the appropriate source.
to validate the data for completeness and integrity.
In general, the simpler the model, the greater the chance of: - pruning the data. - overfitting the data. - the need to reduce the amount of data considered. - underfitting the data
underfitting the data
Which of the following is best defined as a measure of dispersion - variance - median - mean - average
variance
Anscombe's Quartet suggests that: - visualizations should be used in tandem with statistics - instead of statistics. - statistics should be used instead of visualizations
visualizations should be used in tandem with statistics
The IMPACT cycle includes all except the following steps: - track outcomes. - master the data. - perform test plan. - visualize the data.
visualize the data.
Big Data is often described by the four Vs, or - volume, volatility, veracity, and variability. - volume, velocity, veracity, and variety. - variability, velocity, veracity, and variety. - volume, velocity, veracity, and variability.
volume, velocity, veracity, and variety
A spreadsheet model that allows evaluating how changes to values and assumptions affect an outcome is called a - regression equation - linear optimization model - what-if analysis - best guess model
what-if analysis
The evaluation of the impact of different tax scenarios/alternatives on various outcome measures including the amount of taxable income or tax paid is called: - data warehousing - tax compliance - tax visualizations - what-if scenario analysis
what-if scenario analysis