ACCT 481 Exam 1
Foreign key
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables.
Decision tree
A tool that is used to divide data into smaller groups
False
T/F: Data visualizations can only be used with big data to make the results easier to interpret
True
T/F: Every table in a relational database requires a primary key
False
T/F: Qualitative data is more complex than quantitative data
exploratory
Tableau is more useful than Excel if your data analysis project is more
Data Analytics
The firm practice of monitoring competitors, customers, and suppliers to better understand its opportunities and threats is called:
Identify the purpose and scope of the data
The first step in the extraction process
Declarative visualizations
The kind of visualizations that present findings to an audience
Classification
The purpose of ___________ is to predict which group an observation that we know little about will belong to
Data reduction
The purpose of _____________ is to reduce the amount of detailed information considered in order to focus on the most interesting or abnormal items.
completeness and integrity
The purpose of comparing the number of records and descriptive statistics for numeric fields is to check extracted data for
66%
The trend line in your chart should take up to __% of your chart
Manually classify an existing set of records
After you have identified the classes you wish to predict, what is the next step?
Determine the types of profiling you want to perform
After you have identified the objects or activity you wish to profile, what should you do next?
Classification
An attempt to assign each unit in a population into a few categories would be called the _______ approach.
Regression
An attempt to estimate or predict the numerical value of some variable for each unit using some type of statistical model would be called the _______ approach.
Similarity matching
An attempt to identify similar individuals based on data known about them
Link prediction
An attempt to predict a relationship between two data items. Ex. Amazon shopping cart or mutual Facebook friends
Benford's Law
An observation about the frequency of leading digits in many real-life sets of numerical data is called
3
Any transaction that has a z-score of _____ or more would represent abnormal transactions
Help companies refine their operations
Audit firms are increasingly considering operational data such as manufacturing logs, CRM data, and SCM data primarily to _________.
Variety
Big data comes from a lot of different sources
Veracity
Big data has many records that are valuable to analyze and contribute in a meaningful way to the overall results
Velocity
Big data is generated very fast
Volume
Big data requires a lot of storage in order to analyze
Ordinal data
If rank matters, what kind of data are you working with?
Overfitting the data
In general, the more complex the model, the greater chance of
Address and refine results
In which step of the IMPACT cycle do data analysts slice and dice the data, find correlations, ask ourselves further questions, ask colleagues what they think, and revise/rerun the analysis?
Z-score
Knowing the mean and standard deviation, one can compute which statistic using a normal distribution to identify abnormal transactions
Sample; population
Traditional audit approaches tested data _________ of the transactions; in contrast, audits that fully integrates big data and analytics will test the full __________ of data.
Composite key
Two or more fields that collectively define the primary key by unique combinations of their values.
Primary key
Unique identifier for each record in a table
Structured Query Language (SQL)
Used to create, update, and delete records. We mainly use it to extract data
XBRL (eXtensible Business Reporting Language)
Used to facilitate the exchange of financial reporting information between a company and the SEC
Exploratory visualization
Used to gain insights while you are interacting with the data
data request form
Used to make communication easier between data requester and provider
Inventory, A/R, and goodwill
What are three accounts that financial accounting often has challenges with valuation and estimates?
Heat map
What common visualization is useful for showing a proportion of values by using a color scale?
Symbol Map
What common visualization is useful for showing data across geographic regions?
Word Cloud
What common visualization is useful for showing the frequency of words in a document
Tree map
What common visualization is useful for showing the proportion of values in a physical space?
Extract, Transform, Load
What does ETL stand for?
A bar chart can easily show comparisons
What is a common difference between a bar chart and pie chart?
Red, yellow, and green traffic lights
What is a common use of color when designing accessible dashboards?
Nominal
What is considered the least sophisticated type of data?
Scatter chart
What is the most appropriate chart when showing a relationship a relationship between two variables?
Ratio
What is the most complex type of data?
Relational
What type of database are you most likely to come across when extracting and using accounting and financial data?
Interpretation of results and visualization
What type of information would be useful to communicate a data analysis project to a manager
Complexity of the model and accuracy of the classification
When evaluating classifiers, you need to be careful to strike a balance between what two things?
data dictionary
When obtaining the data yourself, one of the best tools to use to identify the tables that you could use would be a ______________
a. Accounts Receivable
Which of the following accounts can be easier to estimate using data analytics a. Accounts Receivable b. Accounts Payable c. Cash d. Revenue
UML class diagram
used to support and design a relational database
4 main categories of data analytics
-Descriptive -Diagnostic -Predictive -Prescriptive
ETL Process
-Determine purpose and scope of data request -obtain the data -validate for completeness and integrity -clean the data -load the data for data analytics
Four benefits of relational databases
-No unstructured data -Business rules are enforced -Completeness of data -Communication/integration of business processes
Clean data
-Remove headings or subtotals -clean leading zeroes and nonprintable characters -format negative numbers
Six steps of classification
1. Identify the classes you wish to predict 2. Manually classify an existing set of records 3. Select a set of classification models 4. Divide your data into training and testing sets 5. Generate your model 6. Interpret the results and select the "best" model
five steps of profiling
1. Identify the objects or activity you want to profile 2. Determine the types of profiling you want to perform 3. Set boundaries or thresholds for the activity 4. Interpret results and monitor activity and/or generate a list of exceptions 5. Follow up on exceptions
Technical jargon
Consider the knowledge and skill of your audience by not overwhelming a nontechnical crowd with
b. stay engaged with clients beyond the audit
Data analytics allow auditors to: a. perform an audit in a less expensive manner b. stay engaged with clients beyond the audit c. be able to perform an audit much quicker
Investment in R&D
Data analytics can be applied to taxes by helping predict the tax consequences of a potential international transaction, a proposed merger or acquisition, or ____________.
Social media
Data analytics may use what source to assess the probability of a good will write-down, warranty claims, or the collectability of bad debts?
Continuous data
Data that are represented by values within a range and include decimals, such as measurements in inches.
quantitative data
Data that has a meaningful difference between data points is considered
Communicate insights
Data visualization would be part of which step of the IMPACT cycle?
Decision support systems
Designed to be interactive and adapt to the information collected by the user
Master the data
ETL would be an example of which step in the IMPACT cycle?
declarative
Excel is more useful than Tableau if your data analysis project is more
Qualitiative Data
Nominal and ordinal data are examples of
Communicate insights
Once the data has been analyzed and the results have been refined, what is the next step in the IMPACT cycle?
Ratio
Qualitative data are most easily expressed as _______ data
Master the data
Scrubbing the data would be an example of which step in the impact cycle?
False. Clustering is a supervised method.
T/F: Clustering is an unsupervised method that is used to find natural groupings within the data
True
T/F: Data analytics expand auditors' capabilities in services like testing for fraudulent transactions
d. Identify and obtain the data needed
Which of the following describes part of the goal of the ETL process: a. Identify which approach to data analytics should be used b. Load the data into a relational database for storage c. Communicate the results and insights found through the analysis d. Identify and obtain the data needed
b. Foreign key
Which of the following establishes a relationship between two tables? a. Primary key b. Foreign key c. Composite key d. Descriptive attribute
a. A guide for formatting the way in which data is provided to auditors
Which of the following is an accurate description of the Audit Data Standards? a. A guide for formatting the way in which data is provided to auditors b. The required way for public companies to store data c. A method for extracting data d. All of the above are true
turnover ratio
Which of the following is an example of continuous data? - inventory count -approval value -turnover ratio -hire date
c. All of the data is stored in the same table
Which of the following is not a benefit of storing data in a relational database? a. Business rules are enforced b. Eliminates redundancies c. All of the data is stored in the same table d. Integration of business processes e. Completeness of data
Manufacturing subledger
Which of the following is not an existing Audit Data Standard? - Order-to-cash subledger - General ledger - Procure-to-pay subledger - Inventory subledger - Manufacturing subledger
a. Variability
Which of the following words is not one of the Big V's of data? a. Variability b. Variety c. Velocity d. Volume
Line graph
Which of these charts do NOT do a good job visually representing qualitative data? -pie charts -bar charts -stacked bar charts -line graph
c. Learn what data is available in the data warehouse
Which of these is not included in the 5 steps of the ETL process? a. Determine the purpose and scope of the data request b. Obtain the data c. Learn what data is available in the data warehouse d. Validate the data for completeness and integrity
Clustering
Which testing approach would be considered an attempt to divide individuals (like customers) into groups in a useful or meaningful way?
Co-Occurrence grouping
Which testing approach would be considered to be an attempt to discover associations between individuals based on transactions involving them?
Join
Which type of clause should be used in your SQL query when you need to retrieve data that is stored in more than one table?
b. data description
Which would not be considered as one of the seven skills that analytic-minded accountants should have? a. Data visualization and data reporting b. Data description c. Data scrubbing and data preparation d. Defining and addressing problems through statistical data analysis
interval
With _______ data, the number 0 is just another value on a scale and has no special meaning
Machine learning and artificial intelligence
____________ include both unsupervised exploratory analysis and supervised model generation to provide insight and predictive foresight into the business and decisions made by accountants and auditors.
Profiling might be used to identify
a lack of controls, changes in procedures, or individuals more willing to spend excessively in potential types of T&E expenses which might be associated with higher risk.
Class
a manually-assigned category applied to a record based on an event
Fuzzy match
a specific type of data profiling that is used to look for correspondences between portions, or segments, of text for potential matches is called
Decision boundaries
a technique used to mark the split between one class and another
Profiling
an attempt to characterize "typical" behavior of a population by generating summary statistics about the data
descriptive attribute
an attribute that is used to describe or record information about the relationship for the business
Profiling
an unsupervised method that is used to discover patterns of behavior based on the distance of z-scores from the mean
Training data
existing data that have been manually evaluated and assigned a class
Test data
existing data used to evaluate the model