INFS 247 Quiz 3 Lectures 6-10 Prof. Eddington
Data warehouses are used by different (____ ______) set of users than operational databases, such as data analysts and decision makers
much smaller
Data mining
"the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques"
Why have a separate analytics DB?
1. Time Horizon Difference 2. Performance Reasons 3. Difference in Data Structuring
big data encompasses _____ _____ of stored data
80-90%
recency-frequency-monetary value (RFM)
A CRM metric and method for analyzing customer value. Recency: When was the most recent purchase by the customer? Frequency: How often does the customer purchase over a period of time? Monetary value: What is the monetary value of the customer's purchase over a period of time?
Business Intelligence
A broad term encompassing analytical use of data. Incorporates various methods (including database management, data warehousing, and data mining) that give companies ability to discover and utilize information they already own and turn it into the knowledge that directly impacts corporate performance.
Data Mart
A data store based on the same principle as a data warehouse, but with a more limited space.
Data Warehouse (EDW)
A system used for reporting and data analysis and is considered a core component of business intelligence
Slice and Dice
Adds, replaces, or eliminates specified dimension attributes (or particular values) from the already displayed result
Data Warehouses:
Are modeled and structured differently than operational databases. Use different types of technologies for storage and retrieval than operational databases.
SCM Automation
Enabled by the use of digital information and other technological advances
ETL includes the following tasks:
Extracting analytically useful data from the operational data sources Transforming such data so that it conforms to the structure of the target data warehouse model, while ensuring the quality of the transformed data Loading the transformed and quality assured data into the target data warehouse
Extraction-Transformation-Load (ETL)
Facilitates the retrieval of data from operational databases into the data warehouses
Example OLAP Query
For each individual store, show separately for male and female shoppers the number of product units sold for each product category
Analytical Information Examples
Information reflecting trends, sales, product statistics, and future growth productions
Operational (transactional) Information Examples
Information reflecting withdrawing cash from an ATM, making an airline reservation, and purchasing stocks
Executive dashboards
Intended for use by higher level decision makers within an organization contains an organized, easy-to-read display of a number of critically important queries describing the performance of the organization in general, the usage of executive dashboards should require little or no effort in training
Difference in data structuring
It is often impossible to structure a database which can be used in an efficient manner for both operational and analytical purposes
examples of sales support metrics and KPIs
Leads-to-conversions rate, average profit per customer in a time period, upsell revenues in a time period
Source systems can include external data sources, such as...
Market research data, census data, stock market data, weather data, geospatial data
Data warehouses and data mining
Mining transactional (operational) databases containing data related to current day-today organizational activities can be of use in certain situations, but generally isn't ideal. The enterprise-wide corporate data warehouse contains all the information from the operational data sources that has analytical value.
Front-end/BI applications
Most users who need to access analytical data in data warehouses and data marts can not engage in direct access (as they do not know how to use OLAP/BI tools). Instead, they use front-end/BI applications
examples of customer service metrics and KPIs
Number of complaints in a time period, customer ratings of service, average time to complete problem resolution
DW front-end (BI) applications
Often referred to as BI (Business Intelligence) applications. Provide access to the data warehouse for users who are engaging in indirect use.
Operational Databases collect...
Operational (transactional) Information
Source systems
Operational databases that provide analytically useful information for the data warehouse's subjects of analysis
MapReduce
Parallel computing divides complex tasks into a sequence of smaller tasks that are performed in parallel on multiple computers. Using multiple computers at the same time (parallel computing) vastly reduces the time needed for processing. Traditional parallel approaches are specialized (expensive) computers. Map reduce technique utilizes regular commodity (cheap) computers.
Association rules are expressed in the form:
People who buy X tend to buy Y
Mythical association rule
People who buy diapers tend to buy beer
OLAP/BI tool features
Slice and Dice Pivot (rotate) Drill Up/Drill Down
EDW components
Source systems (operational databases) Extraction-transformation-load (ETL) infrastructure Data warehouse (analytical database) Front-end (BI) applications
Time horizon difference
The difference between how long the data is useful for transactional vs. analytical needs.
Every operational database that is used as a source system for the data warehouse has two purposes:
The original operational purpose. As a source system for the data warehouse.
Performance Reasons
The performance of operational day-to-day tasks involving data use can be severely diminished if such tasks must compete for computing resources with analytical tasks
Online Analytical Processing (OLAP)/BI
The process of retrieving data from data warehouses and/or data marts for analytical purposes.
Online Transaction Processing (OLTP)
The process of updating (i.e. inserting, modifying, and deleting) and retrieving data from databases for operational purposes
Support measures...
The significance of the rule. We are interested in rules with relatively high support.
True or False: The same fact can have both an operational and analytical purpose
True
True or False: big data is not modeled up front for a pre-determined operational and/or analytical queries (retrievals)
True
supply chain
a network between a company, its suppliers, and its customers whose purpose is to produce and distribute the company's products and/or services
Data mining patterns should be:
accurate, meaningful, understandable, actionable
enable
actively and jointly manage all SCM-related processes (plan, make, source, deliver, return) and elements (such as business rules, contracts, regulations, risks, human resources), while measuring and monitoring SCM performance through usage of data and information systems
a supply chain consists of...
all of the steps it takes to get products and/or services to the customer
OLAP/BI tools provide _____ access to data in data warehouses and data marts
direct
Big data methods ___ ___ replace database and data warehousing approaches
do not
Business intelligence software
enables business users to see and use (analyze) large amounts of complex data: - database reporting tools (for retrieving data from databases) - OLAP/BI tools (for retrieving data from data warehouses and or/data marts) - data mining tools (for data mining)
Market basket analysis
finds groups of items that appear together in transactions
the bullwhip effect
occurs when product demand information passes throughout the supply chain in a distorted (delayed) way. can result in excessive or insufficient inventory.
supply chain logistics
often used as a synonym for "transportation" or "shipping" or "delivering." the part of the supply chain process that plans, implements, and controls the efficient, effective forward and reverse flow and storage of goods, services, and related information between the point of origin and the point of consumption
Analytical Information is based on...
operational (transactional) information
transactional structured data
operational databases - data modeled/structured and stored for anticipated pre-determined operational use
SCM levels of decision making
operational, tactical, strategic
return rate
percent of shipped items that are returned by customers. calculated by dividing the number of returned items by the number of all items sold. the goal is to decrease the rate. investigating this metric includes itemizing the reasons for returned items (unwanted, defective, damaged, late arriving) this serves to identify trends and address the underlying issues in order to reduce the rate of return.
back order rate
percentage of orders that cannot be filled at the time a customer places them. calculated by dividing the number of back orders by the number of all orders
examples of marketing support metrics and KPIs
promotional mail/e-mail response rate, increase in sales during the marketing campaign, increase in number of customers during the marketing campaign
Executive dashboards provide a _____ _____ to data in data warehouses and data marts
quick glance
Pivot (rotate)
reorganizes the values displayed in the original query result by moving values of a dimension column from one axis to another
Test promotions to subsets of different cells to determine...
response rates
source
schedule and execute necessary steps for obtaining materials, goods, and services from suppliers that are needed to produce a company's products and services
make
schedule and execute the necessary steps for creating a company's products and services
deliver
schedule and execute the necessary steps for delivering a company's products and services to their customers
return
schedule and execute the necessary steps for receiving returned products from their customers and for cancelled services by their customers
A data warehouse is created within an organization as a...
separate data store whose primary purpose is data analysis
Companies/organizations tend to store a lot of big data knowing that...
some of it may be of use later
Based on the point-and-click actions by the user of the OLAP/BI tool, the tool...
writes and executes the code in the language of the DBMS that hoses the data warehouse or data mart that is being queried
association rule
x → y
Data Warehouses keep...
years' worth of data in order to enable long-term analysis
We can't keep data in transaction-oriented databases for longer periods of time because...
it would cause a majority of database operations to deal with unnecessarily large amounts of old, no longer used data
Data lake
large data pool in which the schema and data requirements are not defined until the data is queried.
Operational databases can be used by a ____ _______ of employees, partners, and customers
large number
Processes run ____ _______ when performance deteriorates
less efficiently
customer relationship management (SCM)
leveraging existing customer data to - drive loyalty and engagement - maximize profitability CRM entails integration of - sales management - marketing - customer service
Regular transaction-oriented databases store data for a...
limited period of time, then the data loses its immediate usefulness and is archived
strategic
longer-term decisions. example: should we consider using overseas suppliers in the next 3 to 5 years
CRM objectives
maintain, enhance, and improve relationships with customers and potential customers. analyze customer behavior with the goal of: - increasing customer retention - driving sales growth and revenue - segmenting customer base - understanding seasonality - identifying potential new customers
Drill Up
makes the granularity of the data in the query result coarser
Drill Down
makes the granularity of the data in the query result finer
OLAP/BI tools allow users to retrieve needed data from data warehouses and data marts by using...
simple point-and-click mechanisms
while a data warehouse stores data in a predefined structure designed for analysis, a data lake is a...
storage repository that holds a vast amount of raw data in its native (original) format until it is needed.
each three-digit code is a...
cell
Data warehouses are...
central repositories of integrated data from one or more disparate sources
Data lake often serves as a...
corporate big data repository
OLAP/BI tools are designed for...
analysis of data in data warehouses and data marts
Analytical databases (data warehouses) collect...
analytical information
Big data methods allow organizations to...
analyze and get insight from the big data
unstructured/semi-structured, un-modeled data
big data
Standard database and data warehousing techniques ______ adequately deal with the diversity and volume of big data.
cannot
OLAP/BI tools analytical functionalities
creating and examining calculated data determining comparative or relative differences performing exception analysis, trend analysis, forecasting, and regression analysis data visualization functionalities
CRM software examples
customer service support, marketing and sales support (keeping track of birthdays and anniversaries), automation of customer service, sales, and marketing (automated email on your birthday)
Big data analysis can be performed on data in...
data lakes or directly on data in original sources
analytical structured data
data warehouses and data marts - data modeled/structured and stored for anticipated pre-determined analytical use
Data mining is an intersection of...
database management, artificial intelligence (machine learning), and statistics.
operational
day-to-day decsions. example: how much inventory to order today based on today's events
plan
develop and communicate plans for sourcing, making, delivering, and returning/cancelling a company's products/services
supply chain metrics and KPIs
examples: back order rate, turn over, return rate. calculated based on data in an organization's information systems. provided in reports, dashboards, etc.
OLAP drill up/drill down example
for each individual store, show separately for male and female shoppers the number of product units sold for each individual product in each category
OLAP slice and dice example
for stores 1 and 2, show separately for male and female shoppers the number of product units sold for each product category
supply chain operations reference (SCOR) model
framework for supply chain management. includes the following processes: - plan - source - make - deliver - return - enable
OLAP/BI tool data visualization
graphically visualizing the answers/analytics results
marketing collateral
hard dopy or electronic document informing potential customers about their products or services
Front-end/BI applications provide _____ access to data in data warehouses and data marts
indirect
Analytical Information
information collected and used in support of analytical tasks
Operational Information (Transactional Information)
information collected and used in support of day-to-day operational needs in businesses and other organizations
big data
massive volumes of diverse and rapidly growing data that are not formally modeled for efficient retrieval. big data is heterogeneous originating from various sources such as smart devices, social media, sensors. big data analytics is a part of overall data strategy, not a separate isolated initiative.
inventory turnover rate
measures how many times a year a company sells its entire inventory. calculated by dividing the cost of goods sold in a period by value of average inventory. higher rates indicates fewer days to sell the inventory. generally, the goal is to increase the rate.
tactical
medium-term decisions. example: what terms to include in the annual contract with our supplier for the upcoming year
cost of back orders
much higher rate of cancellations for back ordered products backorder notification cost loss of goodwill and future sales to current customers lost cost of initial customer acquisition
A typical data warehouse ________ retrieves selected analytically useful data from the operational data sources
periodically
Data-driven customer interaction
personalized marketing collateral: - form fields pull relevant information from database and populate marketing collateral. - standard personalization: name, location, age - advanced personalization: customized promotion based on customer behavior - used in personal selling, direct mail, online marketing - dependent on timeliness and availability of data
The data warehouse is sometimes referred to as the _____ _______ to indicate the fact that it is a destination for the data from the source systems
target system
Data mining leads to...
the discovery of novel and interesting patterns in large amounts of data and is concerned with predictive analytics (using past data to predict future events)
Big data methods allow organizations to analyze and get insight from...
the kinds of data that are not suited for regular database and data warehouse technologies
Upstream Supply Chain
the portion of the supply chain from raw materials to the production facility
downstream supply chain
the portion of the supply chain from the production facility to the end-customer
Confidence measures...
the strength of the correlation. Rules with low confidence are not meaningful, even if their support is high.
Each customer has a...
three-digit RFM code.
Assigning RFM score to customers
top 20% assigned a value of "5" next 20% assigned a value of "4" each customer gets an R, F, and M score between 1 (lowest) and 5 (highest)
Types of data stored in corporations:
transactional structured data, analytical structured data, unstructured/semi-structured, un-modeled data
Correlations are expressed in the form:
transactions that contain X are likely to contain Y