Chapter 17- Business Intelligence
Concepts of Data Analytics
1. Data Mining 2. Topic Analytics 3. Text Analytics 4. Business Analytics
Forms of Business Analytics
1. Descriptive Analytics 2. Predictive Analytics 3. Decision Analytics
Data can be categorized in three groups:
1. Structured Data 2. Unstructured Data 3. Semi- Structured Data
Four V's of Big Data
1. Volume 2. Velocity 3. Variety 4. Veracity
What decade did businesses start DSS?
1950s
How much data is unstructured?
80%
Querying
A tool that lets you ask your data questions that in turn lead to answers, and eventually decisions
Volume
AMOUNT of data
Hadoop
An infrastructure for storing and processing large sets of data across multiple servers
Business Analytics
Attempts to make connections between data so organizations can try to predict future trends that may give them a competitive advantage Can also uncover computer system inadequacies within an organization
Predictive Analytics
Attempts to reveal future patterns in a marketplace Essentially trying to predict the future by looking for data correlations between one thing, and any other things that pertain to it
Large collected datasets are called what?
Big Data
Decision Analytics
Builds on Predictive Analytics to make decisions about future industries and marketplaces Looks at an organization's internal data and then analyzes external conditions like supply abundance and then endorse a best course of action
What refers to an assortment of software applications to analyze an organization's raw data?
Business Intelligence (BI)
Structured Data
Business transactions (you order something, sell something--> all transactions) Ex but not transaction: 1 to 5 rating Data is typically well-labeled and often with traditional fields and records of common data tables The data has recognizable patterns that allow it to be more easily queried
Topic Analytics
Catalogs phrases of an organization's customer feedback If a customer said "the barista was friendly", it would be categorized under the topic "Employee Friendliness"
Topic Analytics
Catalogs phrases of an organization's customer feedback into relevant topics Ex: If a customer said "the barista was friendly", that would be categorized under the topic "Employee Friendliness"
Business Intelligence (BI)
Changes data into meaningful information that helps organizations make better decisions
Extract
Collect the data, whether it be tweets or recorded phone convos Once you determine where your data resides, you can start extracting it You often extract from Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) software Extraction sometimes grabs unstructured data like text notes to semi-structured or structured data by tagging it w/ metadata
Big Data
Collected datasets such as smartphone metadata, Internet usage records, social media activity, computer usage records, and countless other data sources
Decision Support System (DDS)
Computer-based systems that support an organization's decision-making activities Ex: Loan officers at a bank use DDS to verify the credit of a loan applicant Excel (goal seeks, what-ifs, pivot tables)
Data Warehouse
Consolidate disparate (dissimilar) data in a central location Not unusual for it to hold yottabytes of data
Customer Relationship Management (CRM) System
Contains info about an organization's sales, marketing, customer service records, and much more Used to track and organize communication w/ customers Are a component of an ERP system Knowing how customers feel is important Ex: Siebel System
In BI, what does the acronym CRM mean?
Customer Relationship Management
What is a graphical interface that characterizes specific data analysis through visualization?
Dashboard
What software helps BI become more usable through visualization?
Dashboards
What is another term for Data Mining?
Data Discovery
What is Data Mining also called?
Data Discovery Examination of huge sets of data to find patterns and connections Identifies outliers
What consolidates disparate data?
Data Warehouse
Load
Data is finally transferred into the data warehouse or datamart Sometimes happens weekly, daily, or even hourly
What is a smaller Data Warehouse?
Datamart
In relation to BI, what does the acronym DSS mean?
Decision Support System
DSS
Decision Support Systems Came about in the 1950's--> computer based systems used to support an organization's decision making activities
Descriptive Analytics
Defines an organization's past data that can be grouped into significant pieces, like a department's sales results The baseline that other types of analytics are built Start to reveal trends
Unstructured Data
Disorganized data that cannot be easily read or processed by a computer because it is not stored in rows and columns like traditional data tables 80% of all data is unstructured Call center convos, tweets, customer complaints, posts (such as Facebook posts)
Unstructured Data
Disorganized, cannot be easily read or processed bc not stored in rows and columns 80% of all data is unstructured
Dashboards
Easy-to-use graphical interfaces that characterize specific data analysis through visualization Make it a lot easier to make sense of the data and see the resulting information
ETL
Extract, Transform, Load Tools that are used to standardize data across systems, allowing it to be queried
Metaphor Example
Extract- holding tanks Transform- purifying water and adding it to the ocean Load- pouring the purified water into the ocean Data Warehouse- the many oceans Analyze- is it salt water, fresh, base, acid?
Structured Data
Fixed formats, well-labeled, often with traditional fields and records of common data tables
Velocity
HOW FAST can you collect data? HOW QUICKLY can you analyze it?
What allows a cluster system that allows data to be stored on multiple servers?
Hadoop
What is Doug Cutting's son's toy elephant's name?
Hadoop
What held up the emergence of the Cloud?
Internet speeds
Variety
Is the data Structured, Semi-Structured, or Unstructured?
Veracity
Is the data you collected any good? Is it clean? Has it been scrubbed?
Veracity
Is the data your organization collected any good? Is it "clean"?
Disadvantage to keeping enormous amounts of inventory
It's too expensive
Why is keeping enormous amounts of inventory a bad thing?
It's too expensive
Business Analytics attempts to _______________
Make connections between data so organizations can try to predict future trends that may give them a competitive advantage
Data Analysis
Makes sense of an organization's collected data and turn it into useful information and validate their future decisions Basically applying statistics and logic techniques to define, illustrate, and evaluate data
Data resides in the bank's _______________________ system that has endless customer facts and figures
Marketing Automation Services
Normalizing Data
Means your data is typically organized into the fields and records of a relational database Provides the standard data format required to analyze data
Mining
Mountain example Netflix mines for data to give you recommendations on what to watch
Hadoop facts
Named after Doug Cutting's young son's yellow toy elephant Cutting and Mike Cafarella created it originally to support distribution for a search engine.. started off as "Nutch" Uses a cluster system that allows files to be stored on multiple servers Attempts to identify files on other multiple servers Typically needs a highly qualified data scientist to run it Best for large companies like Facebook, eBay, and American Express that create Terabytes and Petabytes of data every day
What provides a standard format to organize data?
Normalization
Semi-Structured Data
Not structured but there are some tags XML and HTML--> tags ex: emails w/ subject lines In-between structured and unstructured data and can possibly be converted into structured data
Extract
Often from Customer Relationship Management (CRMs) or Enterprise Resource Planning (ERPs)
Transform
Once extracted, data needs to be normalized Data is no good unless it's organized Normalizing= organizing data into fields and records of a relational database
Jeopardy
One way to decide how much data or what kind of data you require--> it asks for questions What questions do you need to answer
Apache Hadoop
Open source, written in Java, for distributed storage and processing of LARGE data sets on computer clusters built from commodity hardware
What attempts to reveal future patterns in the marketplace?
Predictive Analytics
Information
Processed data that means something
Data
Raw, unorganized facts
Business Intelligence (BI)
Refers to an assortment of software applications to analyze an organization's raw data Can be described as computer applications that change data into significant, meaningful information that helps organizations make better decisions Historical, current, and predictive data to help decision makers
Volume
Refers to the amount of data collected by an organization Sheer quantity of Big Data Ex: All the sales @ all the Walmarts on Black Friday Ex: All the tweets around the world in one day
ROI
Return on Investment
Slice and Dice
Rubix cube example Digging through data
Datamart
Smaller, more focused data warehouse Can't "answer" as much w/ these, but they're cheaper
Data includes:
Smartphone metadata, Internet usage records, social media activity, computer usage records, and countless other data sources
Hadoop
Stores and processes lots of data across multiple servers Uses a cluster system, as opposed to a centralized system, that allows files to be stored on multiple servers
What kind of data resides in fixed formats?
Structured
User friendly
Tableau bc it's very graphical
When collecting internal data, first things first:
Take an inventory
What is another term for Text Analytics?
Text-mining
Load
The data is ready to be finally transferred into the data warehouse or datamart Sometimes occurs weekly, daily, or even hourly The more often this is done, the more up-to-date analytic reports are possible, and the more timely they can be
Data Visualization
The graphic display of the results of data mining, analytics, and Business Intelligence (BI) in general, typically in real time PowerPoint has found a place in BI, specifically in Data Visualization Helps BI become more understandable and therefore, more meaningful in decision making
Map Reduce
The processing arm, or engine of Hadoop Allows data to be queried and processed directly on the server where it lives Only the query is transported through the network
Map Reduce
The processing arm, or engine of Hadoop Allows data to be queried and processed directly on the server where it lives, instead of moving the data across the network to be analyzed on the computer Only the query is transported through the network Like little computer minions that search out and query data where it resides, and process the query instead of dragging it back to a large centralized server Saves immense amounts of network bandwidth and resources Functions: Map, Shuffle, Reduce
Velocity
The speed to gather & process this data How fast can you collect data, and more importantly, how quickly can you analyze it? Ex: Ads at London airport (digital), the cameras check who you are and gear the next ad around the corner toward you
Transform
Transform into data that can go into the warehouses Once extracted, the data needs to be normalized, data is no good unless it's organized
In BI, which of the following is not part of ETL?
Transmission
Social Media Platforms can
Uncover what your customers are thinking
What data is disorganized and not easily read?
Unstructured
Enterprise Resource Planning (ERP)
Used for running of the business
Datamart
Used often by a single department or function w/in an organization Limited in complexity of databases, and smaller than a Data Warehouse, but are cheaper to implement than a full warehouse You can't "answer" as much as you can w/ a Data Warehouse, but they're cheaper to implement than a full Warehouse Use data from smaller parts of an organization, like the marketing or purchasing departments Essentially a smaller, more focused data warehouse
What refers to different kinds of data?
Variety
What refers to how fast data is collected?
Velocity
What refers to quality of data?
Veracity
What refers to the amount of data?
Volume
Data Warehouse
Where an organization stores and consolidates disparate data in a central location Ex: Oracle, IBM, SAS, Teradata A collection of data from a variety of sources used to support decision making and generate BI
Variety
You may have identified what data you wish to collect, but is it Structured, Semi-Structured, or Unstructured? It's most likely a combo of all 3, which could potentially throw a wrench in your data collection gears
Data Mining
aka Data Discovery, is the examination of huge sets of data to find patterns and connections, and identify outliers
Text Analytics
aka Text Mining Finds patterns in text, such as whether customers on Facebook are satisfied with their products
Text Analytics
aka Text Mining, searches through unstructured text data to look for useful patterns Ex: Looking at whether their customers on Facebook or Instagram are unsatisfied w/ the organization's products or service