Data Analytics for Accounting (Midterm)
Internal Control Benefits of Storing Data in Relational Databases
Many preventive internal controls are easier to enforce. (Such as ensuring suppliers that we are paying are verified in our system). Security around data entry and table access can aid in creating and enforcing data entry internal controls. Reduced redundancy cuts down on errors. Version control reduces the possibility of having more than one version of the truth.
Databases
Most secure method of storing data.
Tools Used in Prescriptive Analytics:
Sensitivity Analytics Capital Budgeting Marginal Analytics Goal Seek Analytics What-if Scenario Analysis
Testing/Tools Used
Sequence checks and sequence analytics Duplicate Transactions Variance analytics- (typically performed in management accounting), used to identify differences from expectations. Benford's Law- used to identify fraud or irregular transactions. Drill-downs and roll-ups used to get detail when needed. PivotTables (crosstabulations) Correlation/Regression Hypothesis testing
Corporate SEC Filings
10-K Required Annual Filing for Publicly Traded Companies 10-Q Required Quarterly Filing for Publicly Traded Companies 8-K Report of Material Events or Corporate Changes at Publicly Traded Company
Simplified Product Tables
A simplified table of data pulled from other sources.
Data analytics moved us into the higher order thinking skills.
Accountants simply cannot analyze, evaluate, and create if they do not already have the basic accounting knowledge and understanding required by the lower level skills.
Predictive Analytics: (addresses the questions of "Will it happen in the future?", "What is the probability something will happen?" or "Is it forecastable?")
Analytics performed to provide foresight by identifying patterns in historical data and assessing likelihood or probability.
Statistical and Summarization Tools for Descriptive Analytics
Counts: Show how frequently an attempt occurs. Totals, Sums, Averages, Subtotals: Summarize measures of performance. Minimums, Maximums, Medians, Standard Deviations: Summarize measures showing extreme values to help explain what happened. Graphs and Histograms Percentage change from one period to the next using vertical analytics, horizontal analytics, or common-size financial statements. Ratio analytics like return on assets, return on sales (profit margin), asset turnover ratios, debt-to-equity ratios: Calculate important financial ratios for comparison.
Dynamic Visualizations
Dashboards
Relational Database Data Dictionary
Data Dictionary for the Complete Relational Database - much larger than a data dictionary for a table
Does the data exhibit data integrity (accurate, valid and consistent)?
Does the data have errors? Is data missing? Is the data biased?
Tuple is a row (record) of a relation.
Each tuple is distinct; there are no duplicate tuples. Order of tuples has no significance, theoretically.
Nonaccounting Data
Economic Data Current and Historical Stock Prices Social Media Analyst Research Reports and Earnings Forecasts
Examples of Accounting Data Analytics
Evaluate the characteristics of the journal entries (e.g., who recorded them, at what amount, who authorized them, etc.) to find errors or fraud in the journal entries. Consider how the product reviews on Walmart's website can help predict which inventory items will be sold and which ones won't be sold to help estimate the value of the inventory. Predict the right level of the Allowance for Doubtful Accounts by classifying which of our customers will be able to pay their debts based on the customer's payment history Audit and verify account balances. Auditors look for evidence supporting the amounts and disclosures in the financial statements. Predict when goodwill is or soon will be impaired using social media, the business press, or recent accounting performance. Estimate the fixed and variable costs to help predict the level of sales when a business will breakeven. Predict the level of future sales returns (items returned, refunds, etc.) using prior sales returns to record net sales.
foreign key (FK)
Exist to create relationships or links between two tables
Accounting Data Sources
Financial Statement Data Corporate SEC Filings Conference Call Transcripts - Website Links XBRL (eXtensible Business Reporting Language) Press Releases Managerial Accounting Data Tax Data Nonaccounting Data
Use SQL to extract precise columns (variables) and rows needed to answer the data analysis problem.
For example, to show date, type, and amount: SELECT Date, Transaction Type, Amount FROM Transaction To show only returns: SELECT * FROM Transaction WHERE Transaction_Type = 'Return'
Economic Data
Gross Domestic Product - as a measure of economy-wide performance Unemployment numbers - as a measure of labor availability Consumer Price Index - as a measure of inflation Housing Market Starts and Price Levels - generally regarded as a key measure of economic status.
Referential Integrity
If foreign key exists in a relation, either foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null.
Entity Integrity
In a base relation, no attribute of a primary key can be null.
Tax Data
In a company, tax data comes from the financial reporting system. Tax information applicable to the tax function is kept in a data warehouse or data mart. The IRS maintains all sorts of information on taxpaying companies and individuals to determine if the taxpayer is underpaying.
Conference Call Transcripts - Website Links
Typically CEO and CFO hold a chat to discuss recent quarterly results (or other matters)
primary key (PK)
Unique identifier in each table, Transaction_ID in Transaction Table; CustomerID in Customer Table
Variety
Unstructured Semi-structured Structured
Veracity
Untrusted Uncleansed
Queries
Used to access the data needed for analysis: Using VLOOKUP in Excel to retrieve data from different databases. Understand how SQL might be used to access specific data from large datasets.
Visualizations
Used to communicate data to decision makers using Tableau, Power BI or other tools for communication
Data Integrity Benefits of Storing Data in Relational Database
Useful Accounting Information: -Relevant. -Faithful Representation of Underlying Transaction/Event. -Free from error (contains no mistakes or inaccuracies). -Complete (includes all transactions; isn't missing any). -Neutral (information is not biased).
Spreadsheets
Using Excel or other program as tools for analytics that accountants will perform: Basic formula and functions References - how to connect other data in the spreadsheet Macros - how to automate processes that are repetitive PivotTables - often a very functional tool for many types of summarizations and also serves as a way to help organize the data
The Four V's
Volume, Velocity, Variety, Veracity
Share the Story
What is the best way to communicate what we've found in our data analysis? -Static Visualizations -Dynamic Visualizations
Prescriptive Analytics
What should we do, based on what we expect will happen? How do we optimize our performance based on potential constraints?
Type of Data: Categorical vs. Numerical
What type of analysis does the data allow us to do?
How to deal with automation:
While computers increasingly collect the data, they do not have the accounting expertise that accountants provide. Data Analytics is one way for accountants to develop and exhibit this needed expertise. Data is not the answer. It may help answer question, but that data is a tool of the analyst.
Cost of Acquiring vs. Benefit of Using the Data
Who owns the data? Is the data hard to access?
Diagnostic Analytics
Why did it happen? What are the root causes of past results?
Predictive Analytics
Will it happen in the future? What is the probability something will happen? Is it forecastable?
Ask the Question
Your Data Won't Speak Unless You Ask It The Right Data Analytics Questions" -Diagnostic Analytics -Predictive Analytics -Prescriptive Analytics
Data Dictionary
a centralized repository of information about data containing a separate record for each field/variable in the table/database. It is data about the data.
Relational Database is
a collection of normalized relations with distinct relation names.
Goal Seek Analytics
a form of what-if analytics that tells us what will need to be done (or assumed) in order to reach a desired outcome, output or result.
PivotTables are
a tool that allows reorganization and summarization of certain data using crosstabulations without changing the underlying spreadsheet (or data).
The abundance of data can be helpful in addressing company questions, problem, and challenges to the extent that
accountants can harness and analyze the available data.
Interval Data: (Counting and Grouping, Proportion, Summing, and Averaging)
an equal interval between each observation, so that not only does summing the data make sense, so does multiplication and other more complex numerical calculations. SAT Scores and Fahrenheit Temperature Scale.
Diagnostic Analytics: (addresses the questions of "Why did it happen?")
analytics performed to investigate the underlying cause that cannot be answered by simply looking at the descriptive data
Descriptive Analytics (addresses the questions of "What happened?" or "What is happening?")
analytics performed which characterizes, summarizes, and organizes features and properties of the data to facilitate understanding.
Prescriptive Analytics: (addresses the questions of "What should we do based on what we expect will happen?" and "How do we optimize our performance based on potential constraints?")
analytics performed which identifies best possible options given constraints or changing conditions.
2.5 Quintillion bytes of data are created every day,
and that number continues to grow with no sign of slowing down.
Tables
data organized into sets of columns (fields) and rows (records).
Capital Budgeting
evaluating future cash flows using various analytics techniques including net present value and internal rate of return.
Sensitivity Analytics
evaluation of outcomes based on uncertainty regarding the inputs.
We consider accounting and non-accounting data sources available for data analysis including:
financial statements, macroeconomic statistics, supply chain data, and financial analyst reports. Finding what data is available to answer our question is part of Mastering the Data.
Determine Relations/Patters/ Linkages between Variables
find the extent to which there are patterns in the data, or data moves together.
ER (Entity-Relationship) Diagram
graphical representation of an information system, illustrating relationships among people, objects, places and events within that system.
The increasing amount of data may cause
information overload and hinder the work of the accountant.
Accounting Data Analytics
is encapsulated as the technologies, systems, practices, methodologies, databases, statistics, and applications used to analyze diverse accounting and nonaccounting data to give organizations the information they need to make sound and timely business decisions.
Tableau requires access to another data source
it does not allow input of data!
Perform Drill-Down Analytics
look for patterns in the underlying data set by summarizing data at different levels and uncovering additional details to understand why something happened.
Numerical Data:
meaningful numbers, such as transaction amount, net income, age, or the score on an exam.
Ratio Data: (Counting and Grouping, Proportion, Summing, Averaging, and Ratio Calculations)
numerical data with an equal and definitive ratio between each data point and absolute "zero" (meaning absence of) in ratio data is the point of origin. Height, Weight Most accounting figures, sales, net income, depreciation expense Kelvin Temperature Scale
Domain is the
set of allowable values for one or more attributes.
Accountants have access to a rich set of both structured and unstructured data,
some of which they generate themselves through their financial reporting systems, other internal systems (e.g., cost accounting, supply chain, and customer relationship management systems), and some of which they get from external sources.
Marginal (or incremental) analytics
technique used to determine the change in profit associated typically with the cost or benefit of the next (or the marginal) unit.
Categorical Data
tend to be represented by words - such as categorizing a group of people by gender (male, female, nonbinary), or categorizing transaction types (sales versus returns).
The data that now floods the Internet every second is equivalent to
the data stored on the entire Internet 20 years ago.
Fields
these are the columns that contain descriptive information about the observations in the table (including primary and foreign keys).
Records
these are the rows in a table; each row, or record, corresponds to a unique instance of what is being described in the table.
As accountants become more familiar with the data available to them
they are better able to opine on the data that might be available to address decision-maker needs.
The AMPS Model:
1. Ask the Question 2. Master the Data 3. Perform the Analysis 4. Share the Story
The Data Analytics Process
1. Data Analytics 2.Intro to Accounting Data 3.Types of Data 4.Preparing Data for Analysis 5.Types and Tools of Data Analytics 6.Descriptive Analytics 7.Diagnostic Analytics 8.Predictive Analytics 9.Prescriptive Analytics 10.Share the Story 11.Putting it All Together
Bloom's Taxonomy
1. Remember 2. Understand 3. Apply 4. Analyze 5. Evaluate 6. Create
What Tools do Accountants Need?
1. Spreadsheets 2. Queries 3. Scripting 4. Visualizations
Perform the Analysis
A histogram or a scatterplot might be used to help evaluate journal entries that are excessively big or excessively small (or negative) with the testing of internal controls. Regression analysis might be used to evaluate cost behavior by segregating total costs into fixed and variable cost components. An internal rate of return or net present value analysis might be used to evaluate capital investments made by the firm, or equity investments made by a potential investor. Benford's Law analytics might be used to find and evaluate unusual and potentially fraudulent transactions. A What-If/Goal Seek analysis might be used to perform an analysis of how changing costs and other factors affect the breakeven level for a new product.
Regression
A predictive analytics technique used to predict a specific dependent variable outcome value based on independent variable inputs.
Forecasting Using Series Analytics
A predictive analytics technique used to predict future values based on past values of the same variable.
Classification
A predictive analytics technique used to separate or classify a sample (or population) into two or more groups or classes.
Foreign Key
Attribute, or set of attributes, within one relation that matches candidate key of some (possibly same) relation.
Financial Statement Data
Balance Sheet Income Statement Statement of Cash Flows Statement of Stockholders' Equity General Journal Special Journal Entries General Ledger Fixed Asset Subledger (or Subsidiary Fixed Asset Ledger) Receivable Subledger (or Subsidiary Accounts Receivable Ledger) Inventory Subledger (or Subsidiary Inventory Ledger) to estimate Lower of Cost or Market and/or Inventory Obsolescence.
Excel
Best at data analysis and exploring data. Tool for creating PivotTables, graphs, and performing statistical analysis.
Tableau
Best for data visualization; does not allow original data entry.
Managerial Accounting Data
Budget Data Standard Cost Data - allows variance analysis, comparing budget to actual. Point-of-Sale Transaction Data - all information from sale. Data on Potential Cost Drivers for Allocating Overhead - allows evaluation of best way to allocate overhead. Supply Chain Data - system which tracks products from vendor to final customer. Customer Relationship Management Data - tracks everything about the customer (order, payment, preferences, etc.). Human Resources Data - system which tracks all information regarding employees (pay, hire date, benefits, applicants, retirees, etc.).
Master the Data
Can the data answer/address the question? Does the data exhibit data integrity (accurate, valid and consistent)? Cost of Acquiring vs. Benefit of Using the Data Type of Data: Categorical vs. Numerical
Primary Key
Candidate key selected to identify tuples uniquely within relation.
Alternate Key
Candidate keys that are not selected to be primary key.
Ordinal Data (Counting and grouping, Proportion & Ranking)
Categorical data that allows/implies ranking and sorting: Gold, Silver and Bronze Survey Answers: Agree, Indifferent, Disagree Transaction Dates
Nominal Data (Counting, Grouping & Proportion)
Categorical data that cannot be ranked: Gender - Male or Female Transaction Type - Sale or Return Depreciation Method - Straight-Line Method, Declining Balance Method
Broad Categories of Predictive Analytics:
Classification Regression Forecasting Using Series Analytics
Volume
Click stream Active/passive sensor Log Event Printed corpus Speech Social media Traditional
Identify Anomalies/Outliers
Often a first step in diagnostic analytics is to look for and identify unusual, unexpected results or transactions.
A relation (entity) is a table (file) with columns and rows.
Only applies to logical structure of the database, not the physical structure.
Attribute is a named column (field) of a relation.
Order of attributes has no significance.
Finding Previously Unknown Linkages, Patterns, or Relationships Between and Among Variables
Perform Drill-Down Analytics Determine Relations/Patterns/ Linkages between Variables
Scripting
Programs written to perform analytics to do repetitive tasks
The basic academia accounting curriculum most directly addresses to the three lowest levels of Bloom's taxonomy.
Remember, Understand, and Apply.
Static Visualizations
Reports Graphs Tables
Null
Represents value for an attribute that is currently unknown or not applicable for tuple. Deals with incomplete or exceptional data. Represents the absence of a value and is not the same as zero or spaces, which are values.
Velocity
Speed of generation Rate of analysis
XBRL (eXtensible Business Reporting Language)
Standards-based way of exchanging business information between business systems
Databases often define data types more specifically than the four categories mentioned in the previous sections. Some examples follow:
String text, Short text, or alphanumeric. Date Number Geographic Dimension - any attribute that is characterized as categorical data. Measure - any attribute that is characterized as numerical.