BI & Analytics Quiz
Company data isn't always in one location. its usually found across:
CRM Programs Marketing automation systems Social media platforms
In-memory systems utilize RAM memory - instead of hard drives - to
execute queries, increasing application performance
Hierarchies let you drill-down into data to
explore interesting patterns and anomalies
A multidimensional data model is organized around a central theme, for example, sales. This theme is represented by a
fact table.
Structured data resides in a
fixed form and labeled
Ad hoc analysis, virtually any report can be
formatted multidimensionally (pivoting and nesting dimensions) anyone can be taught
Core operational database functionality:
gather data update data store data retrieve data archive data
Data visualization is a
graphic display of the results of data mining or analytics, often in real-time
With descriptive analytics, raw data can be
grouped into easily digestible pieces, such as the number of unique page-views, or the sales numbers for a specific department
Map reduce can save
huge amounts of network bandwidth and resources
Transform, in order to properly analyze data, it must be
in the same format
Hadoop
infrastructure for storing and processing large sets of data across multiple servers
The general idea of this approach (data cube) is to
materialize certain expensive computations that are frequently inquired.
OLAP pivot table creates an
mdx query sent to OLAP cube, OLAP cube has data requested by mdx query returned from cube to OLAP pivot table
Facts are numerical measures. Thus, the fact table contains
measure (such as units_sold) and keys to each of the related dimensional tables.
Data marts are essentially smaller, more focused warehouses. Instead of aggregating data across a company, a data mart
might store the information of just a single department
OLAP Cube
multidimensional structure that stores and maintains discrete intersection values -Some OLAP systems let cubes intersect with each other
OLAP systems organize data into
multidimensional structures
OLTP systems are everywhere:
order tracking invoicing credit card processing retail POS banking airline reservations
Hierarchy
organizes data by levels
Descriptive analytics is the base upon which
other types of analytics are built
Descriptive programs analyze
past data and identify trends and relationships
The term business intelligence grew out of technology called
decision support
Attribute
descriptive non-hierarchical information Example: -model number -size -list price -color -flavor -street address
The measures to be analyzed depend on the
purpose of the OLAP system
Map reduce allows data to be
queried and processes on the server where it resides, instead of transporting the data across the network to be analyzed on the computer
A multidimensional data model is organized around a central theme, like sales transactions.
A fact table represents this theme.
When data is grouped or combined in multidimensional matrices called
Data Cubes.
OLAP was named by
IMB's E.F. Codd (inventor of SQL and relational databases)
Load
data is transferred into the central warehouse or data mart
Argued that all the "intelligence" in business intelligence results from
data mining
A multidimensional model views data in the form of a
data-cube.
Raw transactional data
not really useful for business intelligence
Roll-up or consolidation refers to data aggregation and computation in
one or more dimensions.
Ad hoc analysis
point-and-click drill-down is made usable by OLAPs rapid response model Lets managers and analysts perform ad hoc analysis
Key OLTP characteristics
processes a transaction according to rules performs all elements of a transaction in real time continually processes multiple transactions
OLTP systems gather
raw data used for multidimensional analysis raw data has to be converted into something suitable for analysis converting raw data to something useful isn't easy
Structured data is easy for computers to
read and query such information, because the data is already standardized
The fact table contains the names of the facts or measures of the
related dimensional tables.
Predictive analytics
searches for a correlation between a single unit or factor, and the features that pertain to it
Slicing refers to
selecting a subset of the cube by choosing a single value for one of its dimension and creating a smaller cube with one less dimension.
Text analytics is useful for analyzing the
sentiment of social media posts, or online customer feedback
Companies now have access to smartphone metadata, internet usage records, social media activity. Business intelligence platforms
sift through this data to find patterns and trends.
Using a process known as extract, transform, and load (ETL), warehouses
standardize data across systems, which allows it to be queried
Extract is the step where unstructured data (such as notes, or author information) is
tagged with metadata to make it easier to find
Roll-up or consolidation for instance,
the cube with cities is rolled up to regions to depict the data with respect to time (in periods) and item (material descriptions).
Transform
the data is normalized
The analysis gap
the large gap between data businesses collect and the information that decision makers require
Hadoop only the question (the query) is
transferred across the network. The analysis is done on the server. The answer is brought back to the computer.
A good rule of thumb is that 80% of all data produced is
unstructured (messages, comments)
Text analytics software combs through
unstructured textual data to find patterns
The more often load is done, the more
up to data analytic reports will be
Data cubes can have
very large numbers of members
OLAP-based ad hoc analysis lets
virtually any question be answered quickly
The data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical Processing)."
Dimensions let you
"slice and dice" multidimensional data
Slice Iocid =
1 is shown
2 dimensions and
1 measure
Business intelligence systems have grown more powerful and comprehensive, mainly due to:
1) Increased data collection 2) Greater storage capacity
Map reduce is the arm of
Hadoop
OLTP systems can be used to
answer transactional questions
Drill-down operation helps users navigate through the
data details.
Dicing generates a
subcube by picking two or more values from multiple dimensions of the cube. The cube is rotated independent of its dimension, therefore users can analyze data from different viewpoints.
Operation database
supports the day-to-day operations of a company Ex: lots of individual shoppers buying soda, each transaction stored in a database designed to store checkout transactions
Four important properties of a measure:
1. Always a quantity or expression that yields a quantity 2. Can take any quantitative format 3. Can be derived from any original data source or calculation 4. At least one measure required to perform OLAP analysis
Two test for dimensionality:
1. Can data about members be compared? -Sales numbers of one product compared to sales numbers of another product 2. Can data from members be aggregated into summaries? -Jan, Feb, Mar aggregate together as Q1
Packaged systems have 2 big limitations:
1. Can only report on their own data - "silos" of data ex: sales, marketing, accounting, finance 2. Don't really support multidimensional analysis
There are three main forms of business analytics:
1. Descriptive 2. Predictive 3. Decision
Data mining can be used to:
1. Group sets of data 2. Find outliers 3. Draw connections
All OLAP systems have to meet three key criteria:
1. Must support multidimensional analysis -"by" dimensions 2. Fast retrieval times -"infinite question syndrome" 3. Calculation engine that can handle specialized multidimensional math -Simple formulas
Business analytics, by analyzing and drawing connections between data, companies can:
1. Predict future trends 2. Gain competitive advantages 3. Reveal unknown inefficiencies
The three basic operations in OLAP are:
1. Roll-up (Consolidation) 2. Drill-down 3. Slicing and dicing.
Data comes in three main forms:
1. Structured 2. Semi structured 3. Unstructured
The OLAP approach is used to analyze multidimensional data from
multiple sources and perspectives.
OLTP (online transaction processing)
Capturing and storing data from ERP, CRM, POS Day-to-day business transactions The main focus is on efficiency of routine tasks
When the OLAP pivot table wants to get information from the OLAP Cube, it uses aa language called
MDX
Relational database ->
OLAP Cube -> OLAP Pivot Table
Analysis gap between raw data and BI can be bridged by combing
OLTP systems with BI systems
Tabular representation (Think Hershey's Chocolate Bar)
On the top of the bar: Prid, Timeid, Iocid, Sales
Modern BI systems designed to follow
OnLine Analytic Processing (OLAP) model
Multidimensional representation (Think Rubik's Cube)
Pid, Timeid, Iocid
Extract
Raw data is extracted from a source program (such as CRM or ERP software)
Dimensions are the perspectives or entities concerning which an organization keeps records. For example,
a shop may create a sales data warehouse to keep records of the store's sales for the dimension time, item, and location. These dimensions allow the storage to keep track of things, for example, monthly sales of items and the locations at which the items were sold.
OLAP systems provide
ad hoc analysis, slicing and dicing, pivoting dimensions, and drilling down through hierarchies
Each level in the hierarchy is the
aggregate of the levels beneath it
Data mining is the
analysis of large sets of data in order to find patterns and correlations
Example of drilling down enables users to
analyze data in the 5 steps (virtual day) of the first period separately. The data can be divided with respect to DC, months (time) and item (material descriptions).
Decision analytics is the software that helps companies
analyze future industries and market spaces
A measure is the data that's being
analyzed across multiple dimensions
Decision analytics looks at a companys internal data, then
analyzes external conditions (such as manufacturing trends, or predicted supply shortages) to recommend the best course of action for a company
Measure
any quantitative expression contained in an OLAP system
A data cube is created from a subset of
attributes in the database.
Specific attributes are chosen to be measure attributes, i.e., the
attributes whose values are of interest.
Unstructured data is information that
cant be easily read by computers
Data in operational databases
cant easily be analyzed
OLTP systems cant be used to answer most analytics questions:
cant search, sort, and summarize large numbers of records cant handle required calculations negative impact on OLTP system performance
Data marts limit the complexity of databases, and are
cheaper to implement than full warehouses
Instead of centralizes files, Hadoop uses a
cluster system that allows files to be stored on multiple servers
Data Warehouses are used to
consolidate disparate data in a central location
Highest level of OLAP structure is a
dimension: categorically consistent view of data
Each dimension has a table related to it, called a
dimensional table, which describes the dimension further.
View "by" qualifiers are usually
dimensions
A data cube enables data to be modeled and viewed in multiple dimensions. It is defined by
dimensions and facts (measures).
OLAP consists of
dimensions and measures
Another attributes are selected as
dimensions or functional attributes.
The measure attributes are aggregated according to the
dimensions.
The insights from analytics reports influence company
direction, product lineups, and even hiring decisions
Load process can occur
every week, day, hour, or even minute
OLAP provide tools for users to
examine/filter dimensional data
The goal of predictive analytics is to
find the same correlation across different data sets, which would allow companies to infer future patterns from past trends
Dashboard are the
interfaces that represent specific analysis No command-line interface
The first step in BI is taking
inventory of the data your company produces
Data analysis is the reason companies
invest in BI
Roll-up or consolidation
is actually performed on an OLAP cube
Hadoop can be complex to implement and run, and
is not well suited for ad hoc queries
Unstructured data is difficult to organize in traditional databases, because
it cant be stored in rows or columns
For example, a dimensional table for an item may contain the attributes:
item_name, brand, and type.
OLTP is optimize for managing
low-level business data
Hadoop is best suited for companies that produced
massive volumes of data
Stories are at the center of the SAC, and the underlying data lies in the
measures and dimensions defined in the multi-dimensional data model of your data
In BI, measures known by different names depending on the application:
metric/key performance indicator (KPI) Benchmark Ratio
A data cube enables data to be
modeled and viewed in multiple dimensions.
Hierarchy example
months, quarters, years