AC 389 Chapter 5
most valued data analytics-focused skillsets employers look for in accounting professionals are:
-"analytic mindset," including ability to apply critical thinking skills to data (asking a question, transforming data to answer the question, and communicating answer to leadership) -Understanding of basic data elements and structures -Knowledge of data visualization and analytics software -Ability to use traditional tools such as Excel and Access to help migrate processes to newer analytics tools
data composition characteristics of data set include:
-Categorical values, which are descriptive components -Quantitative values, which are numeric data points that can be summed, counted, or otherwise analyzed using mathematical operations -Anomaly detection reveals observations or events that are outside of a data set's normal behavior.
Good principles of visualization design include:
-Choosing the right type of visualization. -Simplifying the presentation of data. -Emphasizing what is important. -Representing the data ethically.
Once we understand the data, we can select the appropriate data visualization techniques to answer these questions:
-Composition of data: What variables are included in the data set? -Comparison of data points: How is one variable performing compared to other variables in the data set? -Distribution of data: How often does a variable occur in the data set? -Relationships between data points: How do the different variables in the data set relate to one another? -Geospatial location of data points: Where are variables geographically located in the data set?
Monte Carlo Simulation
-Measures sensitivity of changes in a simulation based on the existence of random variables -ex. Can help project cash flow, which is impacted by uncertainty of the markets.
Data summarization
-Simplifies data to quickly identify trends by compressing the data into smaller, easier-to-understand outputs such as charts or tables -ex. an Excel pivot table can be used to summarize sales order data based on location, type of product, and date of sale.
Process mining
-Uses event log data to show what individuals, systems, and machines are doing in a visual format ex. A purchasing agent creates purchase orders and also approves them. This puts the business at high risk for purchasing fraud.
Data
-consists of facts and statistics about a person or object that are collected for reference or analysis -can include numbers, words, measurements, observations, and descriptions
Forecasting
-process of estimating future events based on a combination of past and present data -predictive analytic method uses statistics and is a staple in accounting data analytics. Example: Common in sales forecasting, where historical sales data is used to create a forecast of future sales.
Noise
Additional movement that cannot be explained as a trend or seasonality
Natural language processing (NLP)
An advanced type of textual analysis that uses artificial intelligence to read, understand, and derive meaning from human language. -ex. chat bot that helps with virtual communication with customers, Sentiment analysis (or opinion mining) is textual analysis that uses NLP to interpret and classify emotions that lie behind text and speech. -ex. Deloitte consultants analyze social media to improve branding of products by identifying key motivators for customers
Exploratory analytics techniques include:
Data summarization Clustering Classification analysis
Data Analytics can be categorized into 4 categories:
Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics
Geospatial analysis
Gathers, transforms, and visualizes geographic data and imagery, including satellite photographs and GPS coordinates -ex. Banks use geospatial data to track credit card transactions and flag suspicious transactions.
Explanatory data analytics techniques include:
Linear Regression Forecasting Monte Carlo Simulation
Advanced data analytics techniques include:
Process mining Network analysis Geospatial analysis Natural language processing (NLP)
Linear Regression
Statistical techniques that predict the relationships between one dependent variable and one or more independent variables -does not establish a cause-and-effect relationship between the variables, only estimates existence of relationship between them. -ex. Used in cost accounting to look for relationships between fixed costs, variable costs, and total costs.
Time series data occurs in chronological order across a period of time Important considerations include:
Time trend Seasonality Noise
data warehouse breakdown
Type of data: Historical data in a structured format designed for a relational database (processed data) Purpose: Aggregated big data for analytics and decisions Users: Data analysts Activities: Supporting business analysis, Read-only queries for aggregating or extracting data Scope of data: Only data relevant to analysis
data lake breakdown
Type of data: Unstructured and structured data from across the company (raw data) Purpose: Cost-effective storage of big data Users: Data scientists Activities: Storing big data, Big data analytics (data science) Scope of data: All data in a company
Classification analysis
Uses supervised machine learning to categorize labeled data into groups based on predefined labels -identifies cookies and non-cookies in the customer order data set
Clustering
Uses unsupervised machine learning to categorize unlabeled data into groups based on similarities.
Network analysis
Visualizes relationships among participants in a data set to learn about social structure based on those relationships -Mostly used to study relationships among people, who are displayed as nodes in a network analysis -links between nodes are relationships, or interactions, that connect participants. -ex. Used by banks to identify fraud risks by investigating transactions related to reported fraudulent transactions.
Descriptive Analytics
What has happened? information that results from the examination of data to understand the past -ex. earnings per share, inventory turnover ratios, profitability ratios, etc.
Predictive Analytics
What is likely to happen in the future? -Information that results from analyses that focus on predicting the future -All forecasting analytics are examples of predictive analytics (e.g., sales forecasts, EPS forecasts, stock price forecasts, etc.)
Prescriptive Analytics
What should be done? -Information that results from analyses to provide a recommendation of what should happen -Machines increasingly monitored with sensors and metrics that will predict when machine might fail and make recommendation to perform maintenance or replace machine -When making credit approval decisions, company will gather various data to predict how likely a customer will be a "good" customer that pays, then make recommendation to accept or not
Diagnostic Analytics
Why did it happen? -information that results from examination of data to determine causal relationships -often statistical analyses such as using regression to see if one thing causes (or more often is associated with) another
Value
arguably the most important of the 5 V's because data isn't useful to a business unless it can be converted into valuable information
storyboard
collection of dashboards, stand-alone visualizations, infographics, and other presentation materials that turn the data analytics and visualizations into a business presentation
Time trend
consistent movement that does not repeat
Seasonality
consistent movement that repeats on a regular basis
data warehouse
designed specifically for reporting and data analysis and contains relevant data that has already been transformed for reporting use
data dashboard
display of important data points, metrics, and key performance indicators in easily understood data visualizations
unstructured data
doesn't fit into a traditional table -Images, audio, video, and more -Require more storage space and are harder to manage -Cannot easily be displayed as table, harder to manage
relational databases
organize structured data in interrelated tables which are connected by similarities between tables
structured data
organized and fits nicely into tables -Made up of specific data types like date, numeric, and text -Stored using less storage space, allows for easy scalability -Easier to manage than unstructured data -Ex: debit and credit amounts, sales records, customer information (e.g., name, address, phone number, etc.)
several popular composition and comparison visualizations:
pie chart tree map bar chart stacked bar chart heat map etc
Data visualization
presentation of data in a graphical format, such as charts and graphs, that is used for analysis and communication -turns complete, reliable, and accurate data into a story -use of a graphical representation of data to convey meaning
Exploratory Data Analytics
reveals key characteristics of a data set -Data composition refers to the various characteristics of a data set
Database
set of logically related files (tables) that contains an organized collection of data that is accessible for fast searching and retrieval
5 V's of big data
term companies use to describe the massive amounts of data they now capture, store, and analyze Volume Velocity Variety Veracity Value
Veracity
the accuracy and truthfulness of the data
Variety
the diversity of data created or collected -structured vs. unstructured, qualitative vs. quantitative
Volume
the quantity and scale of data generated every second
Velocity
the speed at which the data is generated
data lake
vast pool of data as it is designed to contain all a company's data and acts as a central repository for data
