ISDS 2001 Exam 2

Ace your homework & exams now with Quizwiz!

information inconsistency

occurs when the same data element has different values

Metadata

"data about data" describes the structure and meaning of the data, contributing to their effective use

(Fact)

(Fact) A report can fulfill many functions: 1.)To ensure proper departmental functioning 2.) To provide information 3.) To provide the results of an analysis 4.) To persuade others to act 5.) To create an organizational memory, Knowledge Management System (KMS)

In-Database Processing

(also called in-database analytics) refers to the integration of the algorithmic extent of data analytics into data warehouse.

Outcome KPIs

(lagging indicators; example is revenue)

Driver KPI

(leading indicators; example is sales lead)

trade shows

Guidelines that need to be considered when developing a vendor list include all of the following except: A) financial strength B) trade shows C) ERP linkages D) market share

Subject-oriented

Data organized by topics (sales, products, customers). -Contains only info relevant to decision making Provides comprehensive view of organization, how and why a business is operating.

Star Schema

Fact table on the inside and dimension tables on the outside

Two-Tier Architecture

First two tiers in three-tier architecture are combined into one

Two-Tier Architecture

First two tiers in three-tier architecture are combined into one. The software and DW are on one server (hardware platform).

Executive Summary

For those who do not have the time to go through lengthy reports, the best alternative is the

Data Warehouse

A subject oriented, integrated, time-variant, nonvolatile collection of data in support of management's decision making process

Dependent Data Mart

A subset that is created directly from a data warehouse

Performance Measurement System

A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives Comprises systematic comparative methods that indicate progress (or lack thereof) against goals

Enterprise Application Integration (EAI)

A technology that provides a vehicle for pushing data from source systems into a data warehouse

Operational Data Stores (ODS)

A type of database from which a business operates on an on-going basis often used as an interim area for a data warehouse --is used as a type of customer-information-file database.

OLAP Cube

A(n) __________ is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications.

Operational Planning can be?

Tactic-centric (operationally focused) Budget-centric (financially focused)

OLAP Reporting

ad hoc, multidimensional, broadly focused reports and queries

Publication Medium

builds the reports and hosts them for users or disseminates them to users

Data Mart

can be a replication of a subset of data in the data warehouse.

nominal data (categorical)

code values fro the variables lables - male/female, hair color, etc.

Performance Dashboards

commonly used in BPM software suits & BI platforms

Hosted DW

m where another firm develops and maintains the DW and stores it in the cloud

Independent Data Mart

The high cost of data warehouses limits their use to large companies. As an alternative, many firms use a lower-cost, scaled-down version of a data warehouse

independent data mart

The high cost of data warehouses limits their use to large companies. As an alternative, many firms use a lower-cost, scaled-down version of a data warehouse referred to as (an) ________. A) data mart B) operational data store C) dependent data mart D) independent data mart

Drill Down

The investigation of information in detail (example is finding not only total sales but also sales by region, by product, or by salesperson). Finding the detailed sources.

Business Report

The key to any successful _________________ is clarity, brevity, completeness, and correctness.

Report Considerations

The key to any successful business report is clarity, brevity, completeness, and correctness. Traditional reporting process is a manual process of collecting and aggregating financial and other information. Traditional reporting may be flat, slow to develop, and difficult to apply to specific situations. Traditional reporting is still used in corporations. The "last mile" is the most challenging stage of the reporting process in which consolidated figures are cited, formatted, and described to form the final text of the report.

False

True or False: A data warehouse differs from an operational database in that most data warehouses have a product orientation and are designed to handle transactions that update the database.

True

True or False: A data warehouse maintains historical data that do not necessarily provide current status, except in real-time systems.

True

True or False: A data warehouse needs to support scalability, which pertains to the amount of data in the warehouse, how quickly the warehouse is expected to grow, the number of concurrent users, and the complexity of user queries.

False

True or False: BI represents a bold new paradigm in which the company's business strategy must be aligned to its business intelligence analysis initiatives.

False

True or False: Bill Inmon advocates the data mart bus architecture whereas Ralph Kimball promotes the hub-and-spoke architecture, a data mart bus architecture with conformed dimensions.

False

True or False: Data warehouses are subsets of data marts.

True

True or False: Decision support concepts have been implemented incrementally, under different names, by many vendors who have created tools and methodologies for decision support.

True

True or False: ETL tools transport data between sources and targets, document how data elements change as they move between source and target, exchange metadata with other applications, and administer all runtime processes and operations.

Data Warehouse

a physical repository where relational data are specially organized to provide enterprise-wide cleansed data in a standardized format

Data Warehouse

a pool of data produced to support decision making -a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management's decision-making process -relevant to some moment in time.

Data Warehouse

a pool of data produced to support decision making and also a repository of current and historical data

Learning

a process of self-improvement where the new knowledge is obtained through a process by using what is already known.

Informal Report

a single letter or a memo

Analytics Ready

a state of preparedness for analytics projects, especially as it relates to data acquisition and preparedness.

Slice

a subset of a multidimensional array

Metadata is also how data are

organized, the meaning of the data, and how to use them effectively

Assurance

right information to the right people in the right way/format

OLTP Reporting

routine periodic narrowly focused reports

What is the purpose of a business report?

to improve managerial decisions

Apriori Algorithm

uses a bottom-up approach and widely used for data mining

TDW

user community generally consists of power users, knowledge workers, managers, other internal users

Data marts are optional:

- if desired, data marts are created as subsets of the EDW - the data marts are consolidated into the EDW

Act and Adjust

- leaders interpret, collaborate, assess & track - what do we need to do differently? - virtually all strategies depend on new projects - the final part of this loop is taking action & adjusting current actions based on analysis of problems & opportunitiesma

Lean Six Sigma

- lean manufacturing / lean production - lean production vs. six sigma - combined to improve performance management

Basic Charts & graphs

- line chart - bar chart - pie chart - scatter plot - bubble chart

Six sigma is ordinarily utilized in

- manufacturing - service delivery - management - other business activities that rely on eliminating defects, waste and quality control problems

Types of business reports

- metric management reports - dashboard type reports - balanced scorecard type report

Concerns about real-time BI

- not all data should be updated continuously - mismatch of reports generated minutes apart - may be cost prohibitive - may also be infeasible

Monitor

- performance dashboards, reports & analytical tools - when the operational & financial plans are underway, it is imperative that the performance of the organization be monitored

Predictive Analytics

- predictive, future focused - what will happen - why will it happen

Big concerns of hosted DW

- privacy and security issues - loss of control of your data

Geographic Maps

- show geographic information & location data - typically used together with other charts & graphs and show: - postal codes - country names - latitude/ longitude etc

Infrastructure

-Columnar -Real-time DW -Data warehouse appliances -Data management practices/technologies -In-database & In-memory processing New DBMS -New DBMS & Advanced analytics

Balanced Scorecard Combined set of measures for

1. financial & non financial 2. leading & lagging 3. internal & external 4. quantitative & qualitative 5. short term & long term

Three Layers of Information

1. monitoring - KPIs 2. Analysis - root cause of problems 3. management - detailed operational data that identify what actions to take to resolve problem

Security and privacy of data and information are critical and pressing issues in DW

1. safeguarding the most valuable assets 2. government regulations (HIPPA) 3. Must be explicitly planned and executed

A closed- loop process to optimize business performance

1. strategize 2. plan 3. monitor/analyze 4. act/adjust

Dashboard design- Present information in 3 different levels

1. visual dashboard level 2. static report level 3. self service cube level

Alignment of rewards and incentives

70 percent of organizations fail to link middle management incentives to their strategy

Information

= aggregation, summarization, and contextualization of data

Logistic Regression

Employs supervised learning & developed in 1940s

growing rapidly

Enabling real-time data updates for real-time analysis and real-time decision making is

Application Case 2.8: Visual Analytics Helps Energy Supplier Make Better Connections Why do you think energy supply companies are among the prime users of information visualization tools?

Energy companies are typically dealing with a very large amount of information that comes from a wide variety of sources. Additionally, these companies tend to be working with very large budgets, and the ability to identify areas of possible savings can result in large increases to revenue.

Inmon Model

Enterprise Data Warehhouse approach (top-down). Create the EDW first and then optionally create data marts

executive summary.

For those executives who do not have the time to go through lengthy reports, the best alternative is the A) last page of the report. B) raw data that informed the report. C) executive summary. D) charts in the report.

Business Report

Format: text + tables + graphs/charts

There are several basic information system architectures that can be used for data warehousing. What are they?

Generally speaking, these architectures are commonly called client/server or n-tier architectures, of which two-tier and three-tier architectures are the most common, but sometimes there is simply one tier.

Why would you use a geographic map? What other types of charts can be combined with a geographic map?

Geographic maps are useful when the data set includes any kind of location data, including addresses, postal codes, state names or abbreviations, country names, latitude/longitude, or some type of custom geographic encoding. Maps can be used in conjunction with other charts and graphs. For instance, one can use maps to show distribution of customer service requests by product type (depicted in pie charts) by geographic locations.

Apriori Algorithm

Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of itemsets

Dashboard-Type Reports

Graphical presentation of several performance indicators in a single page using dials/gauges

Monitoring

Graphical, abstracted data to monitor key performance metrics.

Metric Management Reports

Help manage business performance through metrics (SLAs (service-level agreements) for externals; KPIs (key performance indicators) for internals) Can be used as part of Six Sigma and/or TQM (total quality management)

Application Case 3.1: A Better Data Plan: Well-Established TELCOs Leverage Data Warehousing and Analytics to Stay on Top in a Competitive Industry How can data warehousing and data analytics help TELCOs in overcoming their challenges?

Highly targeted data analytics play an ever more critical role in helping carriers secure or improve their standing in an increasingly competitive marketplace. Argentina's Telefónica de Argentina used analytics for its "traceability project," which tracked the factors involved in customer churn, a big problem among phone service carriers. France's Bouygues Telecom used BI technologies to facilitate cost reduction through automation via its Teradata-based marketing operations management system, which automates marketing/communications collateral production. Pakistan's Mobilink uses BI to help acquire customers and grow their subscriber network, largely aided by social networking.

Hardware resources are dynamically allocated as use increases.

How does the use of cloud computing affect the scalability of a data warehouse? A) Cloud computing vendors bring as much hardware as needed to users' offices. B) Hardware resources are dynamically allocated as use increases. C) Cloud vendors are mostly based overseas where the cost of labor is low. D) Cloud computing has little effect on a data warehouse's scalability.

Data warehouse administrator should be knowledgeable about

IT Business Decision making processes Communication skills

Analytics Study

Identifying, accessing, obtaining and processing of relevant data are the most essential tasks in any

Analytics

Identifying, accessing, obtaining, and processing of relevant data are the most essential tasks in any ____________ study.

What are the most essential tasks in any analytics study?

Identifying, accessing, obtaining, and processing of relevant information

Linear Regression vs. Logistic Regression

In Logistic Regression Output/Target variable is a binomial (binary classification) variable (as opposed to numeric variable as in linear regression).

Dimension Table

In a data warehouse, surrounding the central fact tables (and linked via foreign keys)

Application Case 2.5: Flood of Paper Ends at FEMA What are the main challenges that FEMA faces?

When a disaster occurs, FEMA is inundated with a huge amount of paperwork to sift through in order to administer the National Flood Insurance Program (NFIP). Sifting through this paperwork is very cumbersome and labor-intensive. Two floods occur at once: flood over the land and a flood of paperwork.

False

True or False: When faced with a turbulent business environment, organizations are best able to survive or even excel by minimizing changes until the environment stabilizes.

True

True or False: With key performance indicators, driver KPIs have a significant effect on outcome KPIs, but the reverse is not necessarily true.

True

True or False: Without middleware, different BI programs cannot easily connect to the data warehouse.

A well-told story should have no need for subsequent discussion

When you tell a story in a presentation, all of the following are true EXCEPT A.) A story should make sense and filter out of a lot of background noise B.) A well-told story should have no need for subsequent discussion C.) Stories and their lessons should be easy to remember D.) The outcome and reasons for it should be clear at the end of your story

enterprise application integration

Which approach to data warehouse integration focuses more on sharing process functionality than data across systems? A) extraction, transformation, and load B) enterprise application integration C) enterprise information integration D) enterprise function integration

bubble chart

Which kind of chart is described as an enhanced variant of a scatter plot? A) heat map B) bullet C) pie chart D) bubble chart

Oper marts

Which of the following are created when operational data need to be analyzed multidimensionally? A) Oper marts B) Customer information file C) Dependent data marts D) Independent data marts

Line Chart

Which of the following chart types would be the choice where you want to determine if the trend for the data is increasing, decreasing, fluctuating, or remaining constant? A. Pie Chart B. Scatter Plot C. Column Chart D. Line Chart

graphic artwork

Which of the following is LEAST related to data/information visualization? A) information graphics B) scientific visualization C) statistical graphics D) graphic artwork

Enterprise information integration (EII)

Which of the following is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases? A) Information integration B) Data management integration C) SQL data integration D) Enterprise information integration (EII)

Access modeling

Which of the following is needed to determine how data are to be retrieved from a data warehouse, and will assist in the physical definition of the warehouse by helping to define which data require indexing? A) Indexing modeling B) Retrieval modeling C) Access modeling D) Tactic modeling

ETL

Which of the following is not a data source to a data warehouse? A) ERP B) Legacy C) POS D) ETL

Improved customer service and satisfaction

Which of the following is not a direct benefit of a data warehouse? A) End users can perform extensive analysis in numerous ways. B) A consolidated view of the data provides a single version of the truth. C) Simplified data access D) Improved customer service and satisfaction

high levels of data summarization

Which of the following is not one of the failure factors in data warehousing? A) Cultural issues are ignored. B) inappropriate architecture C) unrealistic expectations D) high levels of data summarization

ROLAP

Which of the following online analytical processing (OLAP) technologies does NOT require the precomputation and storage of information? A) MOLAP B) ROLAP C) HOLAP D) SQL

Why did it happen?

Which type of question does visual analytics seeks to answer? A) Why did it happen? B) What happened yesterday? C) What is happening today? D) When did it happen?

Geographic map

Which type of visualization tool can be very helpful when a data set contains location data? A.) Bar chart B.) Geographic map C.) Highlight table D.) Tree map

True

True or False: fact 5: ETL technologies pul data from many sources, cleanse them, and load them into a data warehouse. ETL is an integral process in any data centric project

True

True or False: fact 6: real time or active data warehousing supplements and expands traditional data warehousing, moving into the realm of operational and tactical decision making by loading data in real time and providing data to users for active decision making

Knowledge

Understanding, awareness, or familiarity acquired through education or experience; anything that has been learned, perceived, discovered, inferred, or understood; the ability to use information.

Data Lakes

Unstructured data storage technology for Big Data

Regression

Used to characterize relationship between explanatory (input) and response (output) variable

data integration

Users demanding access via PDAs and through speech recognition and synthesis is becoming more commonplace, further complicating ________ issues. A) data extraction B) data load C) data integration D) OLAP

What is the difference between information visualization and visual analytics?

Visual analytics is the combination of visualization and predictive analytics. While information visualization is aimed at answering "what happened" and "what is happening" and is closely associated with business intelligence (routine reports, scorecards, and dashboards), visual analytics is aimed at answering "why is it happening," "what is more likely to happen," and is usually associated with business analytics (forecasting, segmentation, and correlation analysis).

Michigan Department of Technology, Management, and Budget

What impacts every area of government?

a methodology aimed at reducing the number of defects in a business process

What is Six Sigma? A) a letter in the Greek alphabet that statisticians use to measure process variability B) a methodology aimed at reducing the number of defects in a business process C) a methodology aimed at reducing the amount of variability in a business process D) a methodology aimed at measuring the amount of variability in a business process

Kalido

What is at the heart of the BIGS Program?

ensuring that the required information is shown clearly on a single screen

What is the fundamental challenge of dashboard design? A) ensuring that users across the organization have access to it B) ensuring that the organization has the appropriate hardware onsite to support it C) ensuring that the organization has access to the latest web browsers D) ensuring that the required information is shown clearly on a single screen

Ensuring that the required information is shown clearly on a single screen

What is the fundamental challenge of dashboard design? A.) Ensuring that users across the organization have access to it B.) Ensuring that the organization has the appropriate hardware onsite to support it C.) Ensuring that the organization has access to the latest web browsers D.) Ensuring that the required information is shown clearly on a single screen

Closed-Loop

What is the key component of a BPM system?

operational data that identify what actions to take to resolve a problem

What is the management feature of a dashboard? A) operational data that identify what actions to take to resolve a problem B) summarized dimensional data to analyze the root cause of problems C) summarized dimensional data to monitor key performance metrics D) graphical, abstracted data to monitor key performance metrics

ETL (Extraction, Transformation, Load)

What takes about 70% of the time when building a data warehouse?

pie chart

Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration? A) heat map B) bullet C) pie chart D) bubble chart

Operational Data Store (ODS)

________ provides a fairly recent form of customer information files (CIF). It is a type of database often used as an interim staging area for a data warehouse.

Ordinary Least Squares (OLS)

a method that relies on the square of the distance measure to identify the best filling line/plane/hyperplane in regression modeling.

Six Sigma

a performance management methodology aimed at reducing the number of defects in a business process to as close to zero defects per million opportunities (DPMO) as possible

Business Performance Management (BPM)

a real time system that alerts managers to potential opportunities, impending problems & threats, and then empowers them to react through models and collaboration

Linear Regression

a relatively simple statistical technique to model the linear relationship between a response variable and one or more explanatory/input variables.

Six sigma rests on

a simple performance improvement model known as DMAIC

Dice

a slice on more than 2 dimensions

Correlation

a statistical measure that indicates the extent to which two or more variables change/fluctuate together.

Sigma

a symbol that statisticians use to measure the variability in a a process - the number of defects

Performance Measurement System

a system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals & objectives

Data Preprocessing

a tedious process of converting raw data into an analytic ready state.

Nominal Data

a type of data that contains measurements of simple codes assigned to objects as labels, which are not measurements. For example, the variable can be generally categorized as (1) single, (2) married, and (3) divorced

data extraction transformation and loading.

a) Data are extracted and properly transformed using custom-written or commercial software b) Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse.

In the Expedia case study, how did the company convert drivers of departmental performance into a scorecard?

a) Deciding how to measure satisfaction. b) Setting the right performance targets. c) Putting data into context.

Examples of infrastructure

a) columnar b) real-time DW c) data management technologies and practices d) data warehouse applicances (all in 1 solutions to DW) e) in-memory storage technology (moving the data in the memory for faster processing) f) new database management systems g) advanced analytics

Sourcing

a) web, social media and Big Data b) open source software c) SaaS (software as a service) d) cloud computing

Bubble Chart

not a new visualization type; instead, they should be viewed as a technique to enrich data illustrated in scatter plots (or even geographic maps).

Multidimensionality

the ability to organize , present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)

Pie Chart

graphical illustration of proportions

Dimensional Modeling

A retrieval-based system that supports high-volume query access

Line Graphs

- time series data

A comprehensive framework for monitoring performance should address two key issues

1. what to monitor - kpis - critical success factors - strategic goals & targets 2. how to monitor

Communication (enterprise-wide)

10% of employees are aware of the organization's strategy

Focus

85 percent of managers spend less than an hour per month discussing strategy

Dimensional Modeling

A retrieval-based data querying system that supports high-volume, high-speed access to subsets of data

fact table

A star schema contains a central ________ surrounded by several dimension tables. A) database B) fact table C) data tree D) data table

Data Taxonomy

A structured representation of the subgroups/subtypes of data

greater control of data.

All of the following are benefits of hosted data warehouses EXCEPT A) smaller upfront investment. B) better quality hardware. C) greater control of data. D) frees up in-house systems.

Differentiate among a data mart, an ODS, and an EDW.

An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis. Both an EDW and a data mart are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area.

Enterprise Data Warehouse (EDW)

An ________ is a large-scale data warehouse that is utilized across the enterprise for decision support.

Business Performance Management

An advanced performance measurement and analysis approach that embraces planning and strategy

business activity management (BAM)t

An approach to real-time, on-demand BI, commonly called ________, uses Web services to discover key business events.

Enterprise Information Integration

An evolving tool space that promises real-time data integration from a variety of sources such as relational databases, Web services, and multidimensional databases

Oper Mart

An operational data mart

Data

Analytics starts with

What were the challenges, the proposed solution, and the obtained results with BIGS?

As a result of merger activity, BP need to integrate data held in disparate source systems without the delay of introducing a standardized ERP system. --NOT: DECENTRALIZE To be more agile and prepare for growth, it wanted a unified system. --The system integrates, unifies, and stores information from multiple source systems to provide consolidated views for marketing, sales, and finance. The obtained results are that BIGS helped BP identify a multitude of business opportunities to maximize margins and/or manage associated costs. --Typical benefits include BIGS improved the visibility of consistent, timely data used to assist the business decisions,

Structured or Unstructured

At the highest level of abstraction, data can be classified as

Application Case 2.3 Town of Cary Uses Analytics to Analyze Data from Sensors, Assess Demand, and Detect Problems What were the results?

Based on this project, the city has a much better understanding of how water is used within its borders. Additionally, it is much easier to bill for water use and plan for future demands.

Information Visualization

Basic charts & graphs are commonly used for

Subject-Oriented

Because a data warehouse gives a more comprehensive view of an organization when organized by topics, DW should be

BIGS

Business Intelligence and Global Standards

ok

Business functions generate data. Data to information (reporting) to managerial, fact-based decisions to action to optimize business functions. (Answer=ok)

Bubble Chart

By varying the size and/or color of the circles, one can add additional data dimensions, offering more enriched meaning about the data.

Integrated

DW must place data from different sources into a consistent format

Briefly describe four major components of the data warehousing process.

Data sources. Data are sourced from multiple independent operational "legacy" systems and possibly from external data providers (such as the U.S. Census). Data may also come from an OLTP or ERP system. ∙ Data extraction and transformation. Data are extracted and properly transformed using custom-written or commercial ETL software. ∙ Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/or data marts. ∙ Comprehensive database. Essentially, this is the EDW to support all decision analysis by providing relevant summarized and detailed information originating from many different sources. ∙ Metadata. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools. ∙ Middleware tools. Middleware tools enable access to the data warehouse. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools.

Structured Data

Data that is formatted (often into tables with rows and columns) for computers to easily understand and process

Business Report

Distribution: in-print, email, portal/intranet

Inferential Statistics

Drawing inferences about the population based on sample data

Application Case 3.6: Expedia.com's Customer Satisfaction Scorecard Who are the customers for Expedia.com? Why is customer satisfaction a very important part of their business?

Expedia, Inc., is the parent company to some of the world's leading travel companies. Therefore, it acts as an online travel agent for consumers who want to travel. Because Expedia.com is an online business, the customer's shopping experience is critical to Expedia's revenues. The online shopping experience can make or break an online business. Obviously, the travel experience is also critical to Expedia's success.

Categorical Data

Nominal Data Ordinal Data

Mention briefly some of the recently popularized concepts and technologies that will play a significant role in defining the future of data warehousing.

Sourcing (mechanisms for acquisition of data from diverse and dispersed sources): o Web, social media, and Big Data o Open source software o SaaS (software as a service) o Cloud computing ∙ Infrastructure (architectural-hardware and software-enhancements): o Columnar (a new way to store and access data in the database) o Real-time data warehousing o Data warehouse appliances (all-in-one solutions to DW) o Data management technologies and practices o In-database processing technology (putting the algorithms where the data is) o In-memory storage technology (moving the data in the memory for faster processing) o New database management systems o Advanced analytics

Descriptive and Inferential

Statistical methods are used to prepare as input to produce both ___________ and ___________ measures.

descriptive; inferential

Statistical methods are used to prepare data as input to produce both ______ and _______ measures.

Descriptive or Inferential

Statistical methods can be classified as either

ok

Statistics in general, and descriptive statistics in particular, is a critical part of BI and business analytics. (Answer=ok)

Analysis

Summarized dimensional data to analyze the root cause of problems.

BI vs. BPM

The main difference is that BPM is always strategy driven

formal

There are only a few categories of business report: informal, ________, and short.

Data Integration

comprises the major processes of data access, data federation, and change capture.

Bubble Chart

often enhanced versions of scatter plots

Business

possess solid business knowledge and insight

market basket analysis Input

the simple point-of-sale transaction data

Line Chart

to show how donations to United Way Giving Fund have increased over the past five years.

Components of Business Reporting systems:

- oltp - data supply - etl - data storage - business logic - publication medium - assurance

opening Vignette: Sirius XM What were the challenges that Sirius XM faced? both technology and data-related challenges

1st issue: changing demographics of car owners. cars were sold no secondary mkt and it was more difficult to pinpoint new potential customers 2nd issue: technical challenge bc of an acquisition; uncertainty of ability to use all technology Available

Application Case 3.6: Expedia.com's Customer Satisfaction Scorecard What were the challenges, the proposed solution, and the obtained results?

Because the customer experience is critical, all customer issues need to be tracked, monitored, and resolved as quickly as possible. But Expedia had no uniform way of measuring satisfaction, of analyzing the drivers of satisfaction, or of determining the impact of satisfaction on the company's profitability or overall business objectives. However, there was plenty of data to work with. It took a business analyst two to three weeks every month to pull and aggregate the data, leaving virtually no time for analysis. Expedia's solution was to start by identifying key drivers of customer satisfaction. From these drivers, the customer satisfaction group constructed a scorecard application. They came up with 10 to 12 objectives that linked directly to Expedia's corporate initiatives, which translated into more than 200 KPIs. This involved three steps: deciding how to measure satisfaction, setting performance targets, and putting the data into context. All of this was applied to their data warehouse (which they called DSS Factory). As a result, managers and executives have a quick and transparent view of how well actions are aligning with the strategy, with the ability to drill down into the data underlying any of the trends or patterns observed. This would have taken weeks to do previously. Other business units are also benefitting.

Enterprise Data Warehouse

EDW stands for

Extract, transform, and load technoliges

ETL technologies pull data from many sources, cleanse them, and load them into a data warehouse. ETL is an integral process in any data-centric project.

ETL Process

Extraction: selecting data from one or more sources and reading the selected data Transformation: converting data from their original form to whatever form the DW needs. cleansing the data to remove as many errors as possible Load: putting the converted (transformed) data into the DW

Product

In ________ oriented data warehousing, operational databases are tuned to handle transactions that update the database.

marts

The advantage of three-tier architecture for data warehousing is its separation of the functions of the data warehouse, which eliminates resource constraints and makes it possible to easily create data ________. A) banks B) cubes C) bases D) marts

True

True or False: One way an operational data store differs from a data warehouse is the recency of their data.

False

True or False: Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses.

drill down.

When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is A) dice. B) slice. C) roll-up. D) drill down.

star schema

When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure? A) star schema B) snowflake schema C) relational schema D) dimensional schema

independent data mart

Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates? A) sectional data mart B) public data mart C) independent data mart D) volatile data mart

parallel processing

Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests? A) use of the web by users as a front-end B) parallel processing C) Microsoft Windows D) a larger IT staff

Middleware tools

Which of the following is one of the components of data warehousing process that enables users to access the data warehouse? A) Middleware tools B) Users interface C) Query tools D) OLAP

large numbers of users, including operational staffs

Which of the following statements is more descriptive of active data warehouses in contrast with traditional data warehouses? A) strategic decisions whose impacts are hard to measure B) detailed data available for strategic use only C) large numbers of users, including operational staffs D) restrictive reporting with daily and weekly data currency

Data Warehouse

a specially constructed data repository where data are organized so that they can be easily accessed by end users for several applications.

AARP

challenges: AARP unable to generate the analytics needed to support its activities solution: single data warehouse Results: positive results, faster systems, more reliance on BI in the future

Data

in its original/raw form is not usually ready to be useful in analytics tasks.

Data

is the main ingredient for any BI, data science, and business analytics initiative.

Data Visualization

is the use of visual representations to explore, make sense of, and communicate data -- It is closely related to the fields of information graphics, scientific visualization, and statistical graphics.

more data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for real-time data warehousing (RDW). how would you define real-time data warehousing?

it is the process of loading and providing data via the data warehouse as they become available.

Why would a state invest in a large and expensive IT infrastructure (such as an EDW)?

like a business, a state government can operate more efficiently and make better decisions if it has access to current data in a centralized EDW - state officials can use the EDW to improve efficiency and better serve the state's residents

market basket analysis Output

most frequent affinities among items

Nonvolatile

once data are entered into DW, users cannot change or update the data - obsolete data are discarded, and changes are recorded as new data

OLAP

online analytical processing - converting data into information for decision support - data cubes, drill-down/rollup, slice & dice - requesting ad hoc reports - conducting statistical and other analyses - developing multimedia-based applications

OLTP

online transaction processing - capturing and storing data from ERP, CRM POS - the main focus is on efficiency of routine taskts

SaaS (software as a service)

"The Extended ASP Model," is a creative way of deploying information system applications where the provider licenses its applications to customers for use as a service on demand (usually over the Internet)

Ralph Kimball

"plan big, build small"

Business Performance Management (BPM)

*The processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance. *It is an outgrowth of BI and incorporates many of its technologies, applications, and techniques. * BPM encompasses three key components 1. A set of integrated, closed-loop management and analytic processes, supported by technology ... 2. Tools for businesses to define strategic goals and then measure/manage performance against them 3. Methods and tools for monitoring key performance indicators (KPIs), linked to organizational strategy

What happens if the assumptions do NOT hold?

- Compromises the validity of the model. - What do we do then? Identify the violations of the assumptions and use techniques to mitigate them.

Readying the data for analytics is tedious, time-demanding, yet crucial task

- Data preprocessing -- Data consolidation -- Data cleaning -- Data transformation -- Data reduction

What steps can an organization take to ensure the security and confidentiality of customer data in its data warehouse?

- Establish effective corporate and security policies and procedure - Implementing logical security procedures and techniques to restrict access - Limiting physical access to the data center environment - Establishing an effective internal control review process with an emphasis on security and privacy

Benefits of Hosted Data Warehouses

- Requires minimal investment in infrastructure - Frees up capacity on in-house systems - Frees up cash flow - Makes powerful solutions affordable - Enables solutions that provide for growth - Offers better quality equipment and software - Provides faster connections

Issues to consider when deciding which architecture to use

- Which database management system (DBMS) should be used? - Will parallel processing and/or partitioning be used? - Will data migration tools be used to load the data warehouse? - What tools will be used to support data retrieval and analysis?

PERT Charts

- also called network diagrams - show precedence relationships among the project activities/tasks

Popular business performance measurement systems

- balanced scorecards - dashboards & scorecards - six sigma - DMAIC

What are the main challenges that FEMA faces?

- before FEMA implemented BureauNet, when the president declared a natural disaster, FEMA got 2 floods at once - first water covered the land - next a flood of paper covered their desks

Plan: - operational planning

- budgets, plans, forecasts, models, initiatives & targets - plan that translates an organization's strategic objectives & goals into a set of well- defined tactics and initiatives, resource requirements & expected results for some future time period (usually one year)

List some of the drivers for RDW.

- businesses cannot wait a whole day for data - data warehouses have captured snapshots of an organization's fixed states instead of incremental real-time data showing every state change and almost analogous patterns over time. - keeping metadata in sync is difficult, costly to develop, maintain, and secure many systems as opposed to one huge data warehouse - processing power for large nightly data warehouse loading might be very high, and the processes take too long

Key to any successful report:

- clarity - brevity - completeness - correctness

Visualization differs from traditional charts & graphs in

- complexity of data sets - use of multiple dimensions and measures

Independent Data Mart

- created separately from EDW by a department and not reliant on it for updates - operate independently so difficult if not impossible to get single version of the truth - simplest and least costly architecture alternative

Operational areas covered by driver KPIs

- customer performance - service performance - sales operations - sales plan/forecast

Histograms

- depict frequency distributions

Information Visualization

- descriptive, backward focused - what happened - what is happening

Bar Charts

- display nominal data or numeric data that splits nicely into different categories so you can quickly see comparative results and trends

Scatter plots & bubble charts:

- good for illustrating relationships between two or three variables

Metric Management Reports

- help manage business performance through outcome- oriented metrics - service level agreement - key performance indicators - can be used as part of six stigma and/r total quality management

Specialized charts & graphs

- histogram - gantt chart - pert chart - geographic map - bullet graph - heat map/tree map - highlight table

Strategize

- identifying & stating the organization's mission, vision & objectives - developing plans (at different levels of granularity- strategic, tactical & operational) to achieve these objectives

Centralized DW Architecture Alternative

- similar to hub-and-spoke; NO DEPENDENT DATA MARTS -Single version of the truth in a timely & holistic view of the enterprise. -Provides all users with access to all data in the DW instead of limiting them to data marts. -Reduces the amount of data the technical team has to transfer or change. -Simplifies data management and administration.

Benefits of hosted DW

- smaller upfront investment - better quality hardware - frees up in house systems

Distinguishing features of KPIs

- strategy - targets - ranges - encodings - time frames - bench marks

List the benefits of an RDW.

- supplement and expand traditional data warehouse funtions into the realm of tactical decision making - people are empowered with the information-based decision making at their fingertips - provides information directly to the customers and suppliers -the reach and impact of information access for decision making can positively effect almost all aspects of customer service, SCM, logistics, and beyond - capable of making events happen - offers integrated information repository to drive strategic and tactical decision support within and organization - data are assembled from OLTP systems as and when events happen and are moved at once into the data warehouse -instant updating of the data warehouse and elimination of the ODSd

Active (real time) data warehousing

- supplements and expands traditional data warehousing, moving into the realm of operational and tactical decision making by loading data in real time and providing data to high numbers of users for active decision making - data warehouses ensure only comprehensive detailed data is available within minutes, in contrast with traditional data warehouses which daily, weekly, monthly data currency acceptable and summaries often appropriate - data warehouses are built to handle large numbers of users, including operational staffs, in contrast with traditional data warehouses which are moderate user concurrency

Operational planning can be

- tactic centric (operationally focused) - budget centric plan (financially focused)

How did FEMA improve its inefficient reporting practices?

- the BureauNet software allowed the FEMA staff to select just the information they want to see and get an on screen report or download the data as a spreadsheet - the BureauNet software was the primary reason behind the increased speed & relevance of the reports FEMA employees received and could self generate

The main issues pertaining to scalability

- the amount of data in the warehouse - how quickly the warehouse is expected to grow - the number of concurrent users - the complexity of user queries

Examples of metadata in Microsoft Access database

- the data type for a variable contained in an Access data table - the number of characters (field length) used for a Last_Name field in a database - the variable Alcohol is a YES/NO variable - the variable CR_Date represents theDate creat of a crash

What were the challenges, the proposed solution and the obtained results of the EDW in Michigan?

- the michigan state government struggles to balance budgets and satisfy a wide variety of needs - the michigan state government operates many kinds of departments, so it benefited from a unified source of information - the proposed solution was an enterprise (statewide) electronic data warehouse linking employees across departments instead of developing separate BI/DW platforms for each business area or state agency - the obtained results include financial benefits, savings come from operational efficiencies, and improvements in the accurate delivery and accounting of benefits

Comprehensive database and metadata

- this is the EDW that supports decision analysis by providing relevant summarized and detailed information - metadata are maintained for access by IT personnel and users - metadata include rules for organizing data summaries that are easy to index and search

A report can fulfill many functions

- to ensure proper departmental functioning - to provide information - to provide the results of an analysis - to persuade others to act - to create an organizational memory, knowledge management system (KMS)

Typical charts, graphs & other visual elements used in visualization- based applications usually involve:

- two dimensions - sometimes three dimensions - fairly small subsets of data sets

What to look for in a dashboard

- use of visual components to highlight data - transparent to the user - combined data from a variety of systems - enable drill down or drill through to underlying data - present a dynamic, real world view with timely data - require little coding

Pie Charts

- used for depicting proportions -ex: to show relative proportions of majors declared by college students in their sophomore year

Differentiate among a data mart, an ODS, and an EDW.

-An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. -A Data Mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area. -Operational Data Stores (ODS) is type of database from which a business operates on an on-going basis often used as an interim area for a data warehouse

How do we develop linear regression models?

-Scatter plots (visualization—for simple regression) - Ordinary least squares method (A line that minimizes squared of the errors.)

Regression

-most widely known and used -inferential statistics -explanatory ( input) and response (output) - used for hypothesis testing and forecasting

What are the major differences between a traditional data warehouse and an RDW?

1. Acceptable TDW refresh rates range from daily to monthly; RDW data must be up to the minute. 2. TDW summaries are often appropriate; RDWs must supply detailed data. 3. Small user community at upper organizational levels means a TDW supports few concurrent users; an RDW must support many, perhaps over a thousand.

BPM encompasses 3 key components

1. a set of integrated CLOSED LOOP management & analytic processes, supported by technology 2. tools for businesses to define strategic goals & then measure/manage performance against them 3. a core set of processes - methods & tools- for monitoring key performance indicators, linked to organizational strategy

Four sources for the gap between stratgey and execution

1. Communication 2. Alignment of rewards and incentives 3. Focus 4. Resources

Analysis Type of Visualization

1. Compare or predict behavioral patterns from time to time 2. Explore and analyze large volumes of historical data across many dimensions and organizational hierarchies (online analytical processing (OLAP), slice and dice drill-down reports, data mining, and predictive modeling) 3. Tools are geared toward sophisticated business analysts or statisticians who need to create analytical models or dig deeply into the patterns of underlying data.

Three-tier architecture

1. Data acquisition software (back-end) 2. The data warehouse that contains the data and software 3. Client (front-end) software that allows users to access and analyze data from the warehouse

The data warehousing process consists of the following 5 steps

1. Data are imported from various internal and external sources 2. Data are cleansed and organized consistently with the organization's needs 3. a. Data are loaded into the enterprise data warehouse, or b. Data are loaded into data marts. 4. a. If desired, data marts are created as subsets of the EDW, or b. The data marts are consolidated into the EDW 5. Analyses are performed as needed

What are the major components of a data warehouse.

1. Data sources. 2. Data extraction transformation and loading. 3. Comprehensive database and metadata. 4. Data Marts. 5. Middleware tools.

What are the three steps of the ETL process?

1. Extraction 2. Transformation 3. Load

Some of the many benefits of a data warehouse are

1. consolidated view of corporate data 2. simplification of data access for analysis 3. better and timelier information for making actionable decisions

4 Perspectives of Balanced Scorecards

1. customer 2. financial 3. internal business processes 4. learning and growth

Steps in DW Process

1. data are imported from various internal and external sources 2. data are cleansed and organized consistently with the organization's needs, ETL 3. a: data are loaded into the EDW b: data are loaded into data marts 4. Data marts are optional: a: if desired, data marts are created as subsets of the EDW b: the data marts are consolidated into the EDW 5. Analyses are performed as needed, Visualization is created and viewed using software apps

What are the 3 main types of specialized data storage for big data analysis are:

1. data marts 2. operational data stores (ODS) 3. enterprise data warehouse

Data warehouse administrator (DWA) should be knowledgeable about

1. IT 2. Business knowledge and insight 3. Decision-making processes 4. Communication skills

Monitoring Type of Visualization

1. Monitoring and evaluation tool that enables managers to identify potential problems and areas for further evaluation 2. Allows users to monitor critical business processes, activities, and applications based on their roles with single or multiple key performance indicators that use various graphic displays, charts, alerts, and visual cues. 3. Tools are geared toward super-users and BI specialists who want to build interactive dashboards for users.

Visualization Tools

1. Self service tools that enable the business users to quickly access and analyze data visually with minimal IT assistance and then share the results with others in the form of interactive dashboard. 2. Visual discovery tools are designed for visual data exploration, analysis and lightweight data mining 3. Support connectors to almost any data source, including files (Excel, comma-separated, relational databases, and applications like SAP Business Suite and Salesforce.com)

Process Steps

1. Strategize 2. Plan 3. monitor 4. act/adjust

Management Type of Visualization

1. To improve alignment and collaboration, to facilitate decision making processes, and to steer the organization in the right direction 2. Tools used to support management functions may include a strategy map and cause-effect chain. 3. Detailed analysis and assessment of a key performance indicator

What skills should a DWA possess? Why?

A Data warehouse administrator should be familiar with high-performance software, hardware, and networking technologies. - He or she should also posses solid business insight -- DWA should be familiar with the decision-making processes so as to suitably deisgn and maintain the data warehouse structure - keep the existing requirements and capabilities of the data warehouse stable while simultaneously providing flexibility for rapid improvements - DWA must possess excellent communications skills Because data warehouses feed BI systems and DSS that helps managers with their decision-making activities

flexible, ad hoc

A RDW needs _____ reporting

Three-Tier Architecture

A Web client that connects to a Web server, which is in turn connected to a BI application server

three tier architecture.

A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a A) one tier architecture. B) two tier architecture. C) three tier architecture. D) four tier architecture.

Strategic Objective

A broad statement or general course of action prescribing targeted directions for an organization

What is a business report? What are the main characteristics of a good business report?

A business report is a written document that contains information regarding business matters. Primary characteristics of a good business report include clarity, brevity, completeness, and correctness.

Step 3: Monitor/Analyze How Are We Doing?

A comprehensive framework for monitoring performance should address two key issues: What to monitor? Critical success factors Strategic goals and targets How to monitor?

single subject area or department; combines databases

A data mart is a subset of a data warehouse, typically consisting of a ___________ whereas a data warehouse _____________ across an entire enterprise.

Regression

A data mining method for real-world prediction problems where the predicted values (example the output variable or dependent variable) are numeric (example is predicting the temperature tomorrow as 68°F)

metadata

A data warehouse contains ________ about how data are organized and how to use them effectively. A) a data directory B) a data index C) data fields D) metadata

business rules

A data warehouse contains numerous ________ that define such things as how the data will be used, summarization rules, standardization of encoded attributes, and calculation rules.

Data Mart

A departmental data warehouse that stores only relevant data.

Enterprise Data Warehouse (EDW)

A large-scale data warehouse for the enterprise.

Application Case 2.4 Predicting NCAA Bowl Game Outcomes How successful were the prediction results? What else can they do to improve the accuracy?

A number of potential models were created, but the most accurate model indicates an accuracy of 86.48%. It is possible that accuracy could be improved in the future with the addition of new data points, both in the form of variables and completed games.

Regression

A part of inferential statistics

Six Sigma

A performance management methodology aimed at reducing the number of defects in a business process to as close to zero defects per million opportunities (DPMO) as possible a measure of quality that strives for near perfection originated in manufacturing and now encompasses business operations

Balanced Scorecard

A performance measurement and management methodology that helps translate an organization's financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives

Strategic Vision

A picture or mental image of what the organization should look like in the future

Data Warehouse

A pool of data produced to support decision making

Time Series Forecasting

A prediction model that relies solely on the past occurrences/values of the variable of interest to estimate/calculate the expected future values.

Strategic Goal

A quantified objective with a designated time period

Business Performance Management

A real-time system that alerts managers to potential opportunities, impending problems, and threats, and then empowers them to react through models and collaboration

What is a report? What are reports used for?

A report is any communication artifact prepared with the specific intention of conveying information in a presentable form to whoever needs it, whenever and wherever they may need it. It is usually a document that contains information (usually driven from data and personal experiences) organized in a narrative, graphic, and/or tabular form, prepared periodically (recurring) or on an as-required (ad hoc) basis, referring to specific time periods, events, occurrences, or subjects.

Independent Data Mart

A small data warehouse designed for a strategic business unit or a department

Logistic Regression

A very popular statistics-based classification algorithm

Logistic Regression

A very popular, statistically sound, probability-based classification algorithm that employs supervised learning

Business Report

A written document that contains information regarding business matters.

hosted

A(n) ________ data warehouse has nearly the same, if not more, functionality as an on-site data warehouse, but it does not consume computer resources on client premises.

comprehensive database and metadata

A) This is the EDW that supports decision analysis by providing relevant summarized and detailed information. B) are maintained for access by IT personnel and users. They include rules for organizing data summaries that are easy to index and search

Important criteria in selecting an ETL tool

Ability to read from and write to an unlimited number of data sources/architectures Automatic capturing and delivery of metadata A history of conforming to open standards An easy-to-use interface for the developer and the functional user

speed of data transfer.

Active data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is A) country of (data) origin. B) nature of the data. C) speed of data transfer. D) source of the data.

their primary focus is government.

All of the following are true about external reports between businesses and the government EXCEPT A) they can include tax and compliance reporting. B) they can be filed nationally or internationally. C) they are standardized for the most part to reduce the regulatory burden. D) their primary focus is government.

it is the same as in-memory storage technology.

All of the following are true about in-database processing technology EXCEPT A) it pushes the algorithms to where the data is. B) it makes the response to queries much faster than conventional databases. C) it is often used for apps like credit card fraud detection and investment risk management. D) it is the same as in-memory storage technology.

scorecards are best for real-time tracking of a marketing campaign.

All of the following statements about balanced scorecards and dashboards are true EXCEPT A) scorecards are less preferred at operational and tactical levels. B) dashboards would be the preferred choice to monitor production quality. C) scorecards are best for real-time tracking of a marketing campaign. D) scorecards are preferred for tracking the achievement of strategic goals.

for most organizations, data warehouse metadata are an unnecessary expense.

All of the following statements about metadata are true EXCEPT A) metadata gives context to reported data. B) there may be ethical issues involved in the creation of metadata. C) metadata helps to describe the meaning and structure of data. D) for most organizations, data warehouse metadata are an unnecessary expense.

Business Performance Management

Also called corporate performance management (CPM by Gartner Group), enterprise performance management (EPM by Oracle), strategic enterprise management (SEM by SAP)

ok

Although its value proposition is undeniable, to live up to its promise, the data has to comply with some basic usability and quality metrics. (answer=ok)

OPENING VIGNETTE: Targeting Tax Fraud with Business Intelligence and Data Warehousing What other problems and challenges do you think federal and state governments are having that can benefit from BI and data warehousing?

Applications are being developed relating to voter fraud, medical use, and other tax issues.

Business Intelligence and Data Warehousing

BI used to be everything related to use of data for managerial decision support Now, it is a part of Business Analytics BI = Descriptive Analytics

Application Case 3.2: BP Lubricants Achieve BIGS Success What is BIGS?

BIGS is the Business Intelligence and Global Standards program, a strategic initiative for management information and business intelligence. Its purpose is to deliver globally consistent and transparent management information across functions.

BIGS Program

BP Lubricants established __________, a strategic initiative for management information and business intelligence

Simple Regression versus Multiple Regression

Based on number of input variables

Application Case 2.7: Dallas Cowboys Score Big with Tableau and Teknion What were the challenges, the proposed solution, and the obtained results?

Challenge: The Dallas Cowboys Merchandising Division needed more visibility into their data so they could run more profitably. They selected Microsoft as the baseline platform but was not sufficient for the task. Solution: Tableau and Teknion together provided real-time reporting and dashboard capabilities that provided the necessary visualization functionality to meet and exceed the Cowboys' requirements. Result: Now, for the first time, the Dallas Cowboys are able to monitor their complete merchandising activities from manufacture to end customer and see not only what is happening across the life cycle, but drill down even further into why it is happening.

Application Case 3.2: BP Lubricants Achieve BIGS Success What were the challenges, the proposed solution, and the obtained results with BIGS?

Challenge: The challenge facing BP Lubricants was that it had been involved in merger activity, which meant it had data held in disparate source systems. To be more agile and prepare for growth, it wanted a unified system. So, as a result of merger activity, BP need to integrate data held in disparate source systems without the delay of introducing a standardized ERP system (Enterprise Resource Planning). Solution: The proposed solution was Kalido, an adaptive enterprise data warehousing (EDW) solution. The system integrates and stores information from multiple source systems to provide consolidated views for marketing, sales, and finance. Results: The obtained results are that BIGS helps the business identify a multitude of business opportunities to improve profits and lower costs. Typical benefits include improved consistency and transparency of business data, faster and more flexible reporting, and greater ability to respond intelligently to new business opportunities.

Application Case 2.8: Visual Analytics Helps Energy Supplier Make Better Connections What were their challenges, the proposed solution, and the obtained results?

Challenge: The company's primary challenge was the variety and diversity in the data that it needed to aggregate. Solution: The company selected a SAS Visual Analytics system to manage and report the data. Result: After implementing the system, the company has been able to deliver quality information in the form of reports much faster and at a lower cost.

Service Oriented Architecture (SOA)

Coarse-grained services that are well defined and documented

Step 1: Strategize Where Do We Want to Go?

Common tasks for the strategic planning process: 1. Conduct a current situation analysis 2. Determine the planning horizon 3. Conduct an environment scan 4. Identify critical success factors 5. Complete a gap analysis 6. Create a strategic vision 7. Develop a business strategy 8. Identify strategic objectives and goals

Strategic Planning

Common tasks for the___________ process: 1. Conduct a current situation analysis 2. Determine the planning horizon 3. Conduct an environment scan 4. Identify critical success factors (CSF) 5. Complete a gap analysis 6. Create a strategic vision 7. Develop a business strategy 8. Identify strategic objectives and goals

Explain the importance of metadata

Comprise info that increases our understanding of traditional data. Provides context to the reported data and provides enriching information that leads to the creation of knowledge.

which operating system is running the dashboard server software.

Contextual metadata for a dashboard includes all the following EXCEPT A) whether any high-value transactions that would skew the overall trends were rejected as a part of the loading process. B) which operating system is running the dashboard server software. C) whether the dashboard is presenting "fresh" or "stale" information. D) when the data warehouse was last refreshed.

Time- variant (time series):

DW maintains historical data and in the case of active DW, current status, too. - time is the one important dimension that all DW must support (real-time, daily, weekly, monthly & yearly views)

How does a data warehouse differ from a database

DW: A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized form. DB: A collection of files that are viewed as a single storage concept. Available to a wide range of users

the visual cube level.

Dashboards can be presented at all the following levels EXCEPT A) the visual dashboard level. B) the static report level. C) the visual cube level. D) the self-service cube level.

What are the graphical widgets commonly used in dashboards? Why?

Dashboards can include many kinds of visual widgets, including charts, performance bars, sparklines, gauges, meters, stoplights, geographic maps, etc. These help to highlight, at a glance, the data and exceptions that require action. A picture tells a thousand words, and through the use of many graphical widgets, a dashboard can convey a wealth of information to decision makers in a short time.

Time-variant (time series)

Data Warehouse maintains historical data and in the case of active DW, current status, too. This is the one important dimension that all DW must support (real-time, daily, weekly, monthly, & yearly views

Integration

Data ________ comprises data access, data federation, and change capture.

Three-Tier Architecture

Data acquisition software (back-end) is stored on a software server. The data warehouse that contains the data & software and stored on the data server. Client (front-end) software that allows users to access and analyze data from the DW

Business Report

Data acquisition → Information generation → Decision making → Process management

Data Extraction

Data are extracted using custom-written or commercial software called ETL.

Data Loading

Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse.

Data Sources

Data are sourced from operational systems and possibly from external data sources.

Integrated

Data from different sources are stored in a consistent format. Also clarity is obtained in unit of measures, naming/labeling of attributes, etc. (The assumption is the data warehouse is totally integrated.)

Describe the data warehousing process

Data is imported from various external and internal resources are cleansed and organized in a manner consistent with the organization's needs. Data's are populated in the DW, data marts can be loaded for specific areas.

Kimball Model

Data mart approach (bottom-up)

Kimball Model

Data mart approach (bottom-up) meaning build the DW one data mart at a time. Ralph Kimball, "plan big, build small

Data Warehousing

Data often are fragmented in distinct operational systems, so managers often make decisions with partial information at best. ________ cuts through this obstacle by accessing, integrating, and organizing key operational data in a form that is consistent, reliable, timely, and readily available where needed.

Categorical Data

Data that represent the labels of multiple classes used to divide a variable into specific groups

Issues affecting the purchase of an ETL tool

Data transformation tools are expensive Data transformation tools may have a long learning curve

What is data visualization? Why is it needed?

Data visualization "information visualization" is the use of visual representations to explore, make sense of & communicate data - closely related to the fields of information graphics, scientific visualizations & statistical graphics

ok

Data visualization techniques and tools make the users of business analytics and BI systems better information consumers. (Answer=ok)

PERT Chart

network diagrams; show precedence relationships among the project activities/tasks.

What is data visualization?

Data visualization, perhaps more appropriately called "information visualization," is the use of visual representations to explore, make sense of, and communicate data. It is closely related to the fields of information graphics, scientific visualization, and statistical graphics.

Star Schema

Data warehouse design is based upon the concept of dimensional modeling. The dimensional model is implemented with a star schema. The star schema is the means by which dimensional modeling is implemented. A star schema contains a central fact table. A fact table contains the attributes needed to perform decision analysis, descriptive attributes used for query reporting, and foreign keys to link to dimension tables. The fact tables describe what data can be analyzed; dimension tables describe how data can be analyzed.

subject-oriented

Data warehouse is a(n) ________, integrated, time-variant, nonvolatile collection of data in support of management's decision making process. A) analysis-oriented B) object-oriented C) subject-oriented D) model-oriented

Integrated

Data warehouses must place data from different sources into a consistent format. This means the data warehouse must be

improved customer service

Data warehouses provide direct and indirect benefits to using organizations. Which of the following is an indirect benefit of data warehouses? A) better and more timely information B) extensive new analyses performed by users C) simplified access to data D) improved customer service

ok

Data → Information (report) → Decision → Action (Answer=ok)

DSS

Decision Support Systems for managerial decision making

Report = Information -->

Decision then action

Descriptive Statistics

Describing the data (as it is)

Descriptive Statistics FOR

Descriptive Analytics

ok

Descriptive statistics for descriptive analytics (Answer=ok)

Why do you think there are many different types of charts and graphs?

Different types of charts are appropriate for conveying different types of information

Why do you think there are many different types of charts and graphs?

Different types of charts are appropriate for conveying different types of information. Even though these charts and graphs cover a major part of what is commonly used in information visualization, they by no means cover it all. Nowadays, one can find many other specialized graphs and charts that serve a specific purpose a) Line graphs are good for time-series data. Bar charts are good for depicting nominal or numerical data that can be easily categorized. b) * Bar charts are useful in displaying nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends within your data. c) * Pie charts should be used for depicting proportions, for example, to show relative proportions of majors declared by college students in their sophomore year. d) * Scatter plots and bubble charts are good for illustrating relationships between two or three variables (bubble charts an enhanced variant of a scatter plot because they add a dimension via the size of the dot). e) Histograms are like bar charts, except they depict frequency distributions. f) * Gantt charts and PERT charts are good at illustrating project timelines and task dependencies. PERT charts (also called network diagrams) show precedence relationships among the project activities/tasks. g) * Geographic maps, of course, show geographic information and location data. Geographic maps are typically used together with other charts and graphs, as opposed to by themselves, and show postal codes, country names, latitude/longitude, and etc. h) Bullet graphs show progress toward a goal. i) Heat maps and highlight tables illustrate the comparison of continuous values across two categories using color. j) * Tree maps are good for showing hierarchical information, such as when illustrating the hierarchy chart of employees in a company.

List the benefits of data warehouses.

Direct benefits include: 1. Allowing end users to perform extensive analysis in numerous ways. 2. A consolidated view of corporate data (i.e., a single version of the truth). 3. Better and more timely information. 4. Enhanced system performance. 5. Simplification of data access. ---Indirect benefits arise when end users take advantage of these direct benefits.

Outcome KPIs

Driver KPIs have a significant effect on

Ideal Model

EDW

Inmon Model

EDW approach (top down) - create EDW first and then optionally create datamarts

Inmon Model

EDW approach (top-down)

Extensible markup language (XML)

EII (enterprise information integration) tools use predefined metadata to populate views that make integrated data appear relational to end-users. ________ may be the most important aspect of EII, because it allows data to be tagged either at the time of creation or later.

Integral

ETL is an ________ process in any data centric project

rules

ETL process consists of extract, transform, and load. Transformation occurs by using ________ or lookup tables or by combining the data with other data. A) rules B) policies C) strategies D) procedures

Cleanse (Transfer); Load

ETL technologies pull data from many sources , _______ them, and ______ them into a data warehouse

Which model is best?

Either start with an EDW or end with an EDW.

Application Case 2.8: Visual Analytics Helps Energy Supplier Make Better Connections

Electrabel is the largest supplier of electricity in Belgium and the largest producer of electricity in Belgium and the Netherlands.

Enterprise

In the Michigan State Agencies case, the approach used was a(n) ________ one, instead of developing separate BI/DW platforms for each business area or state agency.

operational systems

In three-tier architecture for data warehouse, ________ contain the data and the software for data acquisition in one tier, the data warehouse is another tier, and the third tier includes the decision support and the client.

cleanse

In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected? A) transformation B) extraction C) load D) cleanse

transformation

In which stage of extraction, transformation, and load (ETL) into a data warehouse are data aggregated? A) transformation B) extraction C) load D) cleanse

Balanced Scorecard-Type Reports

Include financial indicators and non-financial indicators (customer, business process, and learning & growth)

Visual Analytics

Increasing demand for __________________ coupled with fast-growing data volumes led to exponential growth in highly efficient visualization systems investment.

Extraction, Transformation, and Loading (ETL)

Instrumental in the process and use of data warehouses

Data Integration

Integration that comprises three major processes: data access, data federation, and change capture.

Data Integration

Integration that comprises three major processes: data access, data federation, and change capture. When these three processes are correctly implemented, data can be accessed and made accessible to an array of ETL, analysis tools, and data warehousing environments

Yes

Is Time Series Forecasting different than Simple Linear Regression?

Regression

It can be used for - Hypothesis testing (explanation) - Forecasting (prediction)

Enterprise Application Integration (EAI)

It provides a vehicle for pushing data from source systems into the data warehouse. It involves integrating application functionality and is focused on sharing functionality across systems, thereby enabling flexibility and reuse.

Marketing, Sales, Finance

Kalido integrates and stores information from multiple source systems to provide consolidated views for what 3 things?

balanced scorecard-type reports.

Kaplan and Norton developed a report that presents an integrated view of success in the organization called A) metric management reports. B) balanced scorecard-type reports. C) dashboard-type reports. D) visual reports.

redundant

Karacsony indicates that there is a direct correlation between the extent of ________ data and the amount of ETL processes. When data are managed correctly as an enterprise asset, ETL efforts are significantly reduced. A) enormous B) bad C) redundant D) wrong

Critical success factors (CSF)

Key factors that delineate the things that an organization must excel at to be successful

internal results.

Key performance indicators (KPIs) are metrics typically used to measure A) database responsiveness. B) qualitative feedback. C) external results. D) internal results.

ok

Know that increasing demand for visual analytics coupled with fast-growing data volumes led to exponential growth in highly efficient visualization systems investment. (Answer=ok)

Resources

Less than 40% of organizations tie their budget to a strategic plan

Application Case 3.4: EDW Helps Connect State Agencies in Michigan Why would a state invest in a large and expensive IT infrastructure (such as an EDW)?

Like a business, a state government can operate more efficiently and make better decisions if it has access to current data. State officials can use the EDW to improve efficiency and better serve the state's residents.

What are the main differences among line, bar, and pie charts? When should you use one over the others?

Line graphs are good for time-series data. Bar charts are good for depicting nominal or numerical data that can be easily categorized. Pie charts should be used for depicting proportions. You shouldn't use pie charts if the number of categories is very large.

Application Case 3.5: AARP Transforms Its BI Infrastructure and Achieves a 347% ROI in Three Years What was the approach for a potential solution?

Management opted for a solution that included the move to a single data warehouse appliance. The team selected the IBM Netezza data appliance warehouse because of its flexibility in data modeling. Additionally, the group adopted a Scrum development model that allowed for rapid development.

obtained results for Expedia.com

Managers and executives have a quick and transparent view of how well actions are aligning with the strategy, with the ability to drill down into the data underlying any of the trends or patterns observed.

Effective Performance Measurement Should

Measures should focus on key factors. Measures should be a mix of past, present, and future. Measures should balance the needs of shareholders, employees, partners, suppliers, and other stakeholders. Measures should start at the top and flow down to the bottom. Measures need to have targets that are based on research and reality rather than arbitrary.

Structure of

Metadata are data about data. Metadata describe the ________ and some meaning about data, thereby contributing to their effective or ineffective use.

structure of

Metadata are data about data. Metadata describe the ________ and some meaning about data, thereby contributing to their effective or ineffective use.

Explain the importance of metadata.

Metadata, "data about data," are the means through which applications and users access the content of a data warehouse, through which its security is managed, and through which organizational management manages, in the true sense of the word, its information assets. Most database management systems would be unable to function without at least some metadata. Indeed, the use of metadata, which enable data access through names and logical relationships rather than physical locations, is fundamental to the very concept of a DBMS. Metadata are essential to any database, not just a data warehouse.

Application Case 3.4: EDW Helps Connect State Agencies in Michigan What are the size and complexity of EDW used by state agencies in Michigan?

Michigan's EDW has almost 10,000 users in five major departments, 20 agencies, and more than 100 bureaus. Just in the Department of Human Services, the EDW contributes to nearly every function, including accurate delivery of and accounting for benefits to almost 2.5 million clients receiving public assistance. It is safe to say that the EDW was large and complex.

What are the size and complexity of EDW used by state agencies in Michigan?

Michigan's EDW is massive, unified and complex

Enterprise (statewide)

Michigan's approach to BI/DW has always been...

Real-Time Data Warehousing

More data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for

Star Schema

Most commonly used and simplest style of dimensional modeling

Relational

Most data warehouses are built using ________ database management systems to control and manage the data.

Performance measurement system comprises

systematic comparative methods that indicate progress against goals

What is OLAP and how does it differ from OLTP?

OLTP is concerned with the capture and storage of data and is designed to best carry out day-to-day business functions. OLAP is concerned with the analysis of that data and provide answers to business and management queries.

Satellite Radio

OPENING VIGNETTE ch 2: Sirius XM is a provider of ______

Nonvolatile

Once data are entered into the warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data. This ________ characteristic is one of the characteristics of data warehousing.

nonvolatile

Once data are entered into the warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data. This ________ characteristic is one of the characteristics of data warehousing. A) changeable B) nonvolatile C) nonperishable D) static

What are the main components of a business reporting system?

One is the online transaction processing system (ERP, POS, etc.) that records transactions. second is a data supply that takes recorded events and transactions and delivers them to the reporting system. Next comes an ETL component that ensures quality and performs necessary transformations prior to loading the data into a data store Then there is the data storage itself (such as a data warehouse)

Metadata

One of the benefits of a well-designed data warehouse is that business rules can be stored in a ________ repository and applied to the data warehouse centrally.

Transaction Processing

Online ________ is a term used for a transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, and point of sale.

Analytical Processing

Online ________ is arguably the most commonly used data analysis technique in data warehouses.

subject-oriented and nonvolatile.

Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are A) subject-oriented and nonvolatile. B) product-oriented and nonvolatile. C) product-oriented and volatile. D) subject-oriented and volatile.

Step 2: Plan How Do We Get There?

Operational planning Operational plan: plan that translates an organization's strategic objectives and goals into a set of well-defined tactics and initiatives, resource requirements, and expected results for some future time period (usually a year). Operational planning can be Tactic-centric (operationally focused) Budget-centric plan (financially focused)

Business Operations

Organizations are being compelled to capture, understand, and harness their data to support decision making in order to improve:

Expedia

Parent company to world's leading travel companies. Customers: Expedia is an online travel agent for customers who want to travel. Customer and travel experience are critical to expedia's success how did they improve customer satisifcation with scorecards: much better insight Challenges: no uniform way of measuring satisfaction, but a lot of data to work with. solution: identify key drivers of customer satisfaction from which they made scorecards applications. three steps: -decide how to measure satisfaction -set performance targets -put data in context Results: managers have a QUICK and TRANSPARENT view of how well actions are aliging with strategy.

readying the data for analytics is...

tedious, time-demanding, yet crucial task

What is a performance dashboard? Why are they so popular for BI software tools?

Performance dashboards provide visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily drilled in and further explored. They are common components of most, if not all, performance management systems, performance measurement systems, BPM software suites, and BI platforms. Dashboards pack a lot of information into a single screen, which is one reason for their popularity.

extraction, transformation, and load (ETL)

Performing extensive ________ to move data to the data warehouse may be a sign of poorly managed data and a fundamental lack of a coherent data management strategy.

Data Warehouse

Physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format Secondary storage of raw data, represents business logic

Business Report

Purpose is to improve managerial decisions

How do we know if the model is good enough?

R2 (R-Square) p Values Error measures (for prediction problems)

Numerical Data

Ratio Data Interval

loading and providing data

Real-time data warehousing, also known as active data warehousing (ADW), is the process of _______________ via the data warehouse as they become available.

Data Reduction

Reduce dimension, reduce volume, and balance data 1. Variables - Dimensional reduction - Variable selection 2. Cases/samples - Sampling - Balancing / stratification

Information Visualization

Related to information graphics, scientific visualization, and statistical graphics

What is the format of a business report?

text + tables + graphs/charts

OPENING VIGNETTE: Targeting Tax Fraud with Business Intelligence and Data Warehousing Why is it important for IRS and for U.S. state governments to use data warehousing and business intelligence (BI) tools in managing state revenues?

Revenues are complex and have many sources. This variety and detail make understanding the data difficult, hampering efficiency. The use of BI tools allows for better analysis, understanding, and governance.

Application Case 3.6: Expedia.com's Customer Satisfaction Scorecard How did Expedia.com improve customer satisfaction with scorecards?

Scorecards give Expedia much better insight into customer satisfaction by providing staff, managers, and executives instant visualization and reporting of patterns and trends. Through KPIs, the scorecard also allows for comparison of the customer patterns and trends to Expedia's corporate goals and objectives.

Data Warehouse

Since ETL is the process through which data are loaded into a data warehouse, a (BLANK) could not exist without it. The ETL process also contributes to the quality of the data in a DW

OPENING VIGNETTE - Sirius XM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing What does SiriusXM do? In what type of market does it conduct its business?

SiriusXM is a provider of satellite radio. They primarily provide services in automobiles.

TDW; RDW

Small user community at upper organizational levels means a ____ supports few concurrent users; a ___ must support many, perhaps over a thousand

Business Report

Source is data from inside and outside the organization (via the use of ETL (extraction, transformation and loading of the data)

Application Case 2.2 Improving Student Retention with Data-Driven Analytics What is student attrition, and why is it an important problem in higher education?

Student attrition represents students who drop out or fail to complete a course of study. This is very important in higher education as it is a leading metric of the success of individual institutions.

Characteristics of Data Warehouses

Subject-oriented Integrated Time-Variant (time series) Nonvolatile Summarized Not normalized Metadata Web based, relational/multi-dimensional Client-server, real-time/right-time/active

Step 4: Act and Adjust What Do We Need to Do Differently?

Success (or mere survival) depends on new projects: creating new products, entering new markets, acquiring new customers (or businesses), or streamlining some process. Many new projects and ventures fail! What is the chance of failure? 60% of Hollywood movies fail 70% of large IT projects fail The common business myth is that restaurants fail at an alarmingly high rate of 90 to 95 percent in the first year. No. 1 reason is commonly the location. Obstacles: low start-up capital, inconsistent food and poor staffing. Celebrity chef Robert Irvine, the host of business make-over show "Restaurant: Impossible," says the top five reasons are Inexperience, Poor management of the staff, Lack of accounting skills, Spotty customer service, Sub-Par food quality and execution

Application Case 3.1: A Better Data Plan: Well-Established TELCOs Leverage Data Warehousing and Analytics to Stay on Top in a Competitive Industry Why do you think TELCOs are well suited to take full advantage of data analytics?

TELCOs control the telecommunications infrastructure, and acquire much usage data as a result. They have the technical expertise to create, deploy, and refine plans to address their business challenges. The industry and mobile technology have expanded and improved over the years, which provides a strong foundation on which to build intelligent solutions. The data analytics solutions that have been created to meet these challenges have also improved drastically over the past few years, placing TELCOs in a good position to capitalize on their technological advantages.

How is a data warehouse different from a database?

Technically a data warehouse is a database, albeit with certain characteristics to facilitate its role in decision support. Specifically, however, it is an "integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization." These characteristics are not necessarily true of databases in general—though each could apply individually to a given one. As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are highly denormalized for performance reasons. This is acceptable because their content is never updated, just added to. Historical data are static.

Application Case 2.5: Flood of Paper Ends at FEMA What is FEMA and what does it do?

The Federal Emergency Management Agency (FEMA) is a U.S. federal agency that coordinates disaster response when the President declares a national disaster. This case illustrates the power and the utility of automated report generation for a large (and, at the time of natural disaster, somewhat chaotic) organization such as FEMA.

new forms of computation of business logic.

The Internet emerged as a new medium for visualization and brought all the following EXCEPT A) worldwide digital distribution of visualization. B) immersive environments for consuming data. C) new forms of computation of business logic. D) new graphics displays through PC displays.

Application Case 2. 6 Macfarlan Smith Improves Operational Performance Insight with Tableau Online What were the data and reporting related challenges Macfarlan Smith facing?

The Scottish pharmaceutical company in the UK had a number of challenges to overcome. The first was data located in many systems, some of which were difficult to access. Another issue was that the quality of the data was in doubt, bringing concerns that results may not be valid. Finally, even with data aggregated and scrubbed, the process of using that data for analysis was very time consuming.

Inmon

The ________ Model, also known as the EDW approach (top-down) to data warehouse development

Kimball

The ________ Model, also known as the data mart approach, is a "plan big, build small" approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaled-down version of a data warehouse that focuses on the requests of a specific department, such as marketing or sales.

independent data marts

The ________ have inconsistent data definitions and different dimensions and measures, making it difficult to analyze data across those marts. A) enterprise data marts B) operational data marts C) dependent data marts D) independent data marts

data mart

The ________ is a scaled-down version of the data warehouse that centers on the requests of a specific department, such as marketing or sales.

Multidimensionality dimensions measures time

The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions) ex: __________: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry __________: money, sales volume, head count, inventory profit, actual versus forecast __________: daily, weekly, monthly, quarterly, or yearly

Application Case 3.3: Use of Teradata Analytics for SAP Solutions Accelerates Big Data Delivery What were the challenges faced by the large Dutch retailer?

The business was facing challenges, and needed to ensure that they had good visibility of the competitive landscape. The company's 15 distinct divisions, each with a different data and IT infrastructure, aggravated this goal.

Application Case 3.4: EDW Helps Connect State Agencies in Michigan What were the challenges, the proposed solution, and the obtained results of the EDW?

The challenge is not directly stated, but students should be aware that state governments struggle to balance budgets and satisfy a wide variety of needs. They also should recognize that a state operates many kinds of departments, so it likely could benefit from a unified source of information. The proposed solution was an enterprise (statewide) electronic data warehouse linking employees across departments. The obtained results include financial benefits worth $1 million per business day. Savings come from operational efficiencies, avoidance of sanctions, improved client outcomes, integrity benefits awarded to well-performing programs, and recovery of inappropriate benefits payments. The data warehouse has yielded a 15:1 cost-effectiveness ratio and improvements in the accurate delivery and accounting of benefits.

Application Case 2. 6 Macfarlan Smith Improves Operational Performance Insight with Tableau Online What was the solution and the obtained results/benefits?

The company adopted a Tableau system that could be used to store, analyze, and present data for decision making. Because the system was software as a service, SaaS, it required very little in up-front investment. After using the system, the company is able to access and utilize data in ways that it was never able to in the past. They are now able to generate reports as well as answer customer questions with ease.

OPENING VIGNETTE - Sirius XM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing How did they implement the proposed solutions? Did they face any implementation challenges?

The company decided to bring all marketing work in-house. It was determined that it was important for them to clean the data and manage it in a central repository. To do this they partnered with Teradata. There were challenges with the implementation due to the variability in the data itself and the complexity of the task.

Application Case 2.1 Medical Device Company Ensures Product Quality While Saving Money What were the main challenges for the medical device company, Instrumentation Laboratory? Were they market or technology driven? Explain.

The company faced both market-driven and technology-driven challenges. The competitive nature of the marketplace made it important for them to always be improving on their products. Technically, the company needed a better method to analyze the large volumes of data that it collected.

True

True or False: Traditional BI systems use a large volume of static data that has been extracted, cleansed, and loaded into a data warehouse to produce reports and analyses.

OPENING VIGNETTE - Sirius XM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing What were the proposed solutions?

The company felt that it would be able to maintain a strategic advantage if it began working towards being a data-driven marketing company. This would allow them to more precisely target current and potential customers.

OPENING VIGNETTE - Sirius XM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing What were the challenges? Comment on both technology and data-related challenges.

The company had several challenges. The first was the changing demographics of car owners. As cars were sold on the secondary market, it was more difficult for them to identify new potential customers. Additionally, the company had a technical challenge because of an acquisition. There was uncertainty about their ability to use all of the technology available through the acquisition.

What were the results and benefits? Were they worth the effort/investment? OPENING VIGNETTE - Sirius XM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing

The company has been able to progress significantly in its goal of becoming a data-driven marketing organization. With the new systems in place, it is possible to move campaigns faster with better visibility.

Application Case 3.3: Use of Teradata Analytics for SAP Solutions Accelerates Big Data Delivery What was the proposed multivendor solution? What were the implementation challenges?

The company selected a TeraData Database, but initially had significant data import issues. They attempted a homegrown SAP integration that was time consuming. After reevaluation, they pivoted to a TeraData Analytics for SAP solution that could automate data import. This solution significantly increased the pace of data loading. The variation in ERP systems also created issues, with important nuances in each system that needed to be preserved. The company selected an Informatica ELT tool that allowed for complete retention of data. Combined with the TeraData solution, this allowed the company to effectively transport, store, and maintain high-quality data. To utilize the data to create reports and analysis, the company selected a Microstrategy solution.

Application Case 2.8: Visual Analytics Helps Energy Supplier Make Better Connections How did Electrabel use information visualization for the single version of the truth?

The company was concerned about the diversity and disparity of the information that it was using to make decisions. By centralizing information, it was possible for them to create "one version of the truth" and create dashboards and reports that showed reliable information that could be used for decision making.

Describe the cyclic process of management and comment on the role of business reports

The cyclic process of management involves these steps: data acquisition leads to information generation which leads to decision making which leads to business process management. Perhaps the most critical task in this cyclic process is the reporting (i.e., information generation)—converting data from different sources into actionable information.

Complex Data Warehouse

The data in visual analytics reside in

dimensional

The data warehouse design is based upon the concept of ________ modeling, which is a retrieval-based model that supports high-volume query access.

1.) Integrated, subject-oriented 2.) Non-volatile, relevant

The data warehouse is a collection of (1.)_______, _________ databases designed to support DSS functions, where each unit of data is (2.)________ and ________ to some moment in time.

Describe the data warehousing process.

The data warehousing process consists of the following steps: 1. Data are imported from various internal and external sources. 2. Data are cleansed and organized consistently with the organization's needs. 3. a. Data are loaded into the enterprise data warehouse, or b. Data are loaded into data marts. 4. a. If desired, data marts are created as subsets of the EDW, or b. The data marts are consolidated into the EDW. 5. Analyses are performed as needed

Application Case 3.5: AARP Transforms Its BI Infrastructure and Achieves a 347% ROI in Three Years What were the challenges AARP was facing?

The firm had begun to work with a BI system from Oracle, but it could not meet the demands being placed on it in terms of stability, speed, and features. AARP was unable to generate the analytics needed to support its activities.

Dashboard Design

The fundamental challenge of dashboard design is to display all the required information on a single screen, clearly and without distraction, in a manner that can be assimilated quickly.

What are the reasons for the recent emergence of visual analytics?

The growth of visual analytics correlates with the growth of analytics in general. More BI and analytics vendors are becoming aware that their customers require quick and preferably interactive visualizations, not just for their normal reporting systems, but also to illustrate predictive and prescriptive decision-making information. Many of the information visualization vendors are adding the capabilities to call themselves visual analytics solution providers. Conversely, analytics solution providers such as SAS are embedding their analytics capabilities into a high-performance data visualization environment that they call visual analytics.

Application Case 2.2 Improving Student Retention with Data-Driven Analytics List and discuss the data-related challenges within context of this case study.

The largest data-related challenge is the high volume of data available. This data is normally from multiple sources, and used primarily for multiple, different purposes. It is important to be able to aggregate all data, but at the same time identify data that truly affects student retention.

two-tier; three-tier architectures; one tierm

The most common information system architectures that can be used for data warehousing are _______ and _________ but sometimes there is simply ________.

Star Schema

The most commonly used and the simplest style of dimensional modeling Contain a fact table surrounded by and connected to several dimension tables

Regression

The most widely known and used analytics technique in statistics

Creation of Knowledge

The primary purpose of metadata should be to provide context to the reported data; that is, it provides enriching information that leads to

Real-Time Data Warehousing

The process of loading and providing data via a data warehouse as they become available

Application Case 3.5: AARP Transforms Its BI Infrastructure and Achieves a 347% ROI in Three Years What were the results obtained in the short term, and what were the future plans?

The project has produced positive results. The system is significantly faster, and allows functional employees to easily use and create reports. In the future the system will support more reliance on BI within the enterprise.

No not ready for analytics

The real-world data is dirty, misaligned, overly complex, and inaccurate

Application Case 2.4 Predicting NCAA Bowl Game Outcomes How did the researchers formulate/design the prediction problem (i.e., what were the inputs and output, and what was the representation of a single sample—row of data)?

The researchers generated a very detailed model using a wide array of variables available to them. Researchers used data analysis techniques to identify the important variables and understand their weight.

Application Case 2.1 Medical Device Company Ensures Product Quality While Saving Money What were the results? What do you think was the real return on investment (ROI)?

The results were very positive and, based on the information presented, probably provided a positive return on ROI. Specifically the analytics tools allowed them to maintain regulatory compliance, ensure product consistency, evaluate supply chain issues, and save time overall.

data warehouse administrator (DWA)

The role responsible for successful administration and management of a data warehouse is the ________, who should be familiar with high-performance software, hardware, and networking technologies, and also possesses solid business insight.

Data Warehouse Professional

The security and privacy of data and information are critical issues for a

Application Case 2.2 Improving Student Retention with Data-Driven Analytics What was the proposed solution? And, what were the results?

The solution uses a variety of data and controls for important variables to create a system to predict freshman attrition. As a result, the system used was able to predict that attrition with good accuracy, approximately 80%.

Application Case 2.5: Flood of Paper Ends at FEMA How did FEMA improve its inefficient reporting practices?

The solution was to implement a system based on WebFOCUS software from Information Builders. As a result, FEMA staff can now browse insurance data posted on NFIP's BureauNet intranet site, select just the information they want to see, and get an onscreen report or download the data as a spreadsheet. This also allows them to create custom reports without having to go through their IT provider, CSC. The first major test of this system was Tropical Storm Allison, and BureauNet worked very well. It also has been able to scale up to handle increased demand.

OPENING VIGNETTE: Targeting Tax Fraud with Business Intelligence and Data Warehousing What was the solution they adopted?

The state implemented a data warehouse from Teradata that allowed them to examine data and identify/flag traits that were consistent with fraudulent return. The state prioritized their efforts to go after refund fraud.

OPENING VIGNETTE: Targeting Tax Fraud with Business Intelligence and Data Warehousing What were the challenges the state of Maryland was facing with regard to tax fraud?

The state was facing tax fraud from fraudulent returns as other states were, and the process of detecting and investigating potential fraud was time consuming.

OPENING VIGNETTE: Targeting Tax Fraud with Business Intelligence and Data Warehousing What were the results that they obtained? Did the investment in BI and data warehousing pay off?

The team was able to flag a smaller number of potentially fraudulent returns, but those that they did identify were significantly more likely to be false. This allowed the state to recover $7 million more, making the investment pay off.

Data Stores

The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.

Application Case 2.3 Town of Cary Uses Analytics to Analyze Data from Sensors, Assess Demand, and Detect Problems What was the proposed solution?

The town installed 60,000 wireless meters in customers' homes and monitored the data through an online portal. A SAS solution was used to manage and analyze the data. The Town of Cary uses SAS analytics to analyze data from wireless water meters, assess demand, detect problems, and engage customers.

Application Case 2.3 Town of Cary Uses Analytics to Analyze Data from Sensors, Assess Demand, and Detect Problems What were the challenges the Town of Cary, North Carolina was facing?

The town was seeking an accurate way to track the use of water across multiple locations to both identify potential leaks as well as simplify meter readings.

Bi Governance

The two critical partnerships required for ________ are a partnership between functional area heads and a partnership between potential customers and providers.

Strong

There is a _____ move toward visual analytics.

Visual Analytics

There is a strong move toward

What are the common characteristics of dashboards and other information visuals?

They enable drill-down or drill-through to underlying data sources or reports

Application Case 2.7: Dallas Cowboys Score Big with Tableau and Teknion How did the Dallas Cowboys use information visualization for its business operations?

They incorporated Tableau and Teknion to assist with visualizing and understanding their merchandising activities, involving the complete supply chain from manufacture to end customer.

Kalido

This is an adaptive enterprise data warehousing solution for preparing, implementing, operating, and managing data warehouses

Comprehensive Database

This is the EDW that supports decision analysis by providing relevant summarized and detailed information.

Application Case 3.3: Use of Teradata Analytics for SAP Solutions Accelerates Big Data Delivery What were the lessons learned?

This was a complex deployment and integration, and the firm learned many important lessons: • Take the time for due diligence and learn what technologies/solutions exist to support implementations. • Develop a framework to enable repeatable processes. • Keep the system design as simple as possible to ensure technology and business adoption. • Make sure to align technical decisions with the overall vision of enabling business agility. • Develop a standard data governance approach to ensure data integrity that extends beyond the implementation process so that business and technical users understand how they can apply data for reports and analytics. • Identify latency requirements to ensure that solutions— both data warehouse and integration approach—support needs.

How is Time Series Forecasting different than Simple Linear Regression?

Time series models are focused on extrapolating on their time-varying behavior to estimate the future values.

What are the main challenges for TELCOs (telecommunications companies)?

To stay competitive The major challenges faced by both entrenched and new companies in this industry include: retaining customers, decreasing costs, fine-tuning pricing models, improving customer satisfaction, acquiring new customers, and understanding the role of social media in customer loyalty.

Application Case 3.1: A Better Data Plan: Well-Established TELCOs Leverage Data Warehousing and Analytics to Stay on Top in a Competitive Industry What are the main challenges for TELCOs (telecommunications companies)?

To stay competitive, TELCOs must continuously refine everything from customer service to plan pricing and use the power of highly targeted data analytics in helping the company secure or improve their standing in the highly competitive marketplace. In greater detail, the major challenges faced by both entrenched and new companies in this industry include: retaining customers, decreasing costs, fine-tuning pricing models, improving customer satisfaction, acquiring new customers, and understanding the role of social media in customer loyalty.

Application Case 2.2 Improving Student Retention with Data-Driven Analytics What were the traditional methods to deal with the attrition problem?

Traditional solutions are quite varied. Most are centered on obvious problems, but may not take into account problems that are difficult to evaluate or quantify. Additionally, they may not account for the confluence of multiple problems.

Online Transaction Processing (OLTP)

Transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions

False

True or False: A hosted data warehouse has less functionality than an onsite data warehouse, but it does not consume computer resources on client premises for computer upgrades, software licenses, in-house development, and in-house support and maintenance.

True

True or False: A real-time data warehouse together with a decision support system that leverages integrated data can provide significant financial benefits for an organization.

True

True or False: According to conventional wisdom, independent data marts are a poor architectural solution.

False

True or False: An independent data mart is a small warehouse designed for a strategic business unit (SBU) or a department whose source is an EDW.

True

True or False: Automated decision systems (ADS) are rule-based systems that provide solutions to repetitive managerial problems, usually in one functional area (e.g., finance, manufacturing).

False

True or False: Because of performance and data quality issues, most experts agree that federated approaches work well to replace data warehouses.

True

True or False: Before implementing an active data warehouse solution, DirecTV pulled data from the server every night in batch mode, a process that was taking too long and straining the system.

True

True or False: Effectiveness, extensibility, reusability, interoperability, efficiency and performance, evolution, entitlement, flexibility, segregation, user interface, versioning, versatility, and low maintenance cost are some of the key requirements for building a successful metadata-driven enterprise.

False

True or False: For best results when deploying visual analytics environments, focus only on power users and management to get the best return on your investment.

True

True or False: In a three-tier architecture, operational systems contain the data and the software for data acquisition in the first tier, the data warehouse is a second tier, and the third tier includes the DSS/BI/BA engine.

False

True or False: Moving the data into a data warehouse is usually the easiest part of its creation.

True

True or False: Once the data are entered into the data warehouse, users cannot change or update the data.

False

True or False: One of the most important aspects of a successful BI is that it must be of benefit to the enterprise as a whole rather than a single manager or task.

False

True or False: Operational data store is used for the medium- and long-term decisions associated with the enterprise data warehouse (EDW).

False

True or False: Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks.

False

True or False: The BPM development cycle is essentially a one-shot process where the requirement is to get it right the first time.

False

True or False: The ETL process in data warehousing usually takes up a small portion of the time in a data-centric project.

False

True or False: The advantage of BPM is that it extends the monitoring, measuring, and comparing of sales, profit, cost, profitability, and other performance indicators by introducing prediction and forecasting capabilities.

True

True or False: The benefits of dashboards are that they provide a comprehensive visual view of key performance indicators, trends, and exceptions.

False

True or False: The centralized data warehouse helps to simplify data management and administration and reduce data redundancy.

True

True or False: The data for an oper mart come from an ODS.

False

True or False: There are basic chart types and specialized chart types. A bar chart is a specialized chart type.

True

True or False: There are ethical considerations involved in the collection and ownership of the information contained in metadata, including privacy and intellectual property issues.

False

True or False: There are many metaware tools that business users can use to access data stored in the data repositories, including data mining, reporting tools, and data visualization.

True

True or False: There are several levels of metadata management maturity that describe where an organization is in terms of how and how well it uses its metadata.

True

True or False: There are three main types of data warehouses, which are data marts, operational data stores, and enterprise data warehouses.

False

True or False: Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones.

Application Case 2.4 Predicting NCAA Bowl Game Outcomes What are the foreseeable challenges in predicting sporting event outcomes (e.g., college bowl games)?

While a large amount of data exists that can be used to possibly help predict the outcome of sporting events, understanding how all of that information works together and how important individual factors will be is quite challenging. Additionally there are individual actions that can occur on the day that may affect the outcome as well.

BP Lubricants

Who established the BIGS Program?

because measurement alone has little use without action

Why is a performance management system superior to a performance measurement system? A) because performance measurement systems are only in their infancy B) because measurement automatically leads to problem solution C) because performance management systems cost more D) because measurement alone has little use without action

because dissatisfied customers will eventually hurt the bottom line

Why is the customer perspective important in the balanced scorecard methodology? A) because dissatisfied customers will eventually hurt the bottom line B) because customers should always be included in any design methodology C) because customers understand best how the firm's internal processes should work D) because companies need customer input into the design of the balanced scorecard

Because dissatisfied customers will eventually hurt the bottom line

Why is the customer perspective important in the balanced scorecard methodology? A.) Because dissatisfied customers will eventually hurt the bottom line B.) Because customers should always be included in any design methodology C.) Because customers understand best how the firm's internal processes should work D.) Because companies need customer input into the design of the balanced scorecard

real-time

With ________ data flows, managers can view the current state of their businesses and quickly identify problems.

Dashboards; single screen; single glance;easily drilled in and further explored

_ provide visual displays of important information that is consolidated and arranged on a __ so that information can be digested at a __ and ______

Obsolete (nonvolatile characteristic)

____ data are discarded, and changes are recorded as new data.

Many

____ new projects & ventures fail!

90

____ percent of organizations fail to execute their strategies

TDW; RDW

____ summaries are often appropriate, but ____ must supply detailed data.

Data Integration

________ comprises three major processes that, when correctly implemented, permits data to be accessed and made accessible to an array of ETL and analysis tools and data warehousing environment.

Data cleansing

________ is a critical aspect of data warehousing that includes reconciling conflicting data definitions and formats organization-wide. A) Data modification B) Fact refinement C) Data purification D) Data cleansing

Independent Data Mart

________ is a small data warehouse designed for a strategic business unit (SBU) or a department.

Dependent Data Mart

________ is a subset that is created directly from the data warehouse. It has the advantages of using a consistent data model and providing quality data.

Metric

________ management reports are used to manage business performance through outcome-oriented metrics in many organizations.

Statistical

_____________ methods are used to prepare data as input to produce both descriptive and inferential measures.

Descriptive Statistics

a branch of statistical modeling that aims to describe a given sample of data

Inferential Statistics

a branch of statistical modeling that aims to draw inferences or conclusions about the characteristics of the population based on a given sample of data

The DMAIC performance model

a closed loop business improvement model that encompasses the steps of defining, measuring, analyzing, improving & controlling a process

Data Warehouse

a collection of integrated subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time

Statistics

a collection of mathematical techniques to characterize and interpret data

Data Mart

a departmental small-scale "DW" that stores only limited/relevant data

Scatter Plot

a graph in which the values of two variables are plotted along two axes to illustrate the relationship between them.

Data Visualization

a graphical, animation, or video presentation of data and the results of data analysis

critical to analytics

data quality and data integrity

Direct Benefits of Data Warehouse

allows end users to perform extensive analysis, allows a consolidated view of corporate data, better and more timely info, enhanced system performance, simplification of data access

Roll Up

an OLAP operation -computing all of the data relationships for one or more dimensions

Dice

an OLAP operation -is a slice on more than two dimensions of a data cube

Slice

an OLAP operation -is a subset of a multidimensional array (usually a two-dimensional representation) corresponding to a single value set for one (or more) of the dimensions not in the subset.

Pivot

an OLAP operation -used to change the dimensional orientation of a report or an ad hoc query-page display

Drill Down/Up

an OLAP operation -navigating among levels of data ranging from the most summarized (up) to the most detailed (down)

Visual Analytics

an extension of data/information visualization that includes not only descriptive but also predictive analytics

Snowflake Schema

an extension of star schema where diagram resembles a snowflake in shape

Online Analytical Processing (OLAP)

an information system that enables the user, while at a PC, to query the system, conduct an analysis, and so on. The result is generated in seconds.

Data is the main ingredient for__?

any BI, data science, and business analytics initiative

Data Mart

are a less-expensive solution that can be replaced by or can supplement a data warehouse.

Ratio Data

are commonly found in physical science, such as mass, length, time, but in business, the values for the variable "salary" are ratio data.

Tree Maps

are good for showing hierarchical information, such as when illustrating the hierarchy chart of employees in a company.

Specialized Charts

are often derived from the basic charts as exceptional cases

Data marts in data warehouses

are optional: a. If desired, data marts are created as subsets of the EDW, or b. The data marts are consolidated into the EDW

Linear Regression and Logistic Regression

are the two major regression types in statistics

Geographic Map

are typically used together with other charts and graphs, as opposed to by themselves, and show postal codes, country names, latitude/longitude, and etc.

RDW

are used by operational staff, call centers, perhaps external users.

Data Mart

can be either dependent or independent

Bubble Chart

can be used to show a competitive view of college-level class attendance by major and by time of the day, or it can be used to show profit margin by product type and by geographic region.

ratio data (numerical)

commonly found in physical science, such as mass, length, time, but in business, the values for the variable "salary" are ratio data. scale data ex: person's weight

Data Integration

compromises three major processes: DATA ACCESS, DATA FEDERATION, & CHANGE CAPTURE - it is an umbrella term that covers the three major processes which combine to move data from multiple sources into a DW

Roll Up

computing all of the data relationships for one or more dimensions

Data Marts

contain data on one topic (e.g., marketing). It can be a replication of a subset of data in the data warehouse. They are a less-expensive solution that can be replaced by or can supplement a data warehouse. They can be independent of or dependent on a data warehouse.

Ratio Data

continuous data where both differences and ratios are interpretable. The distinguishing feature is the possession of a nonarbitrary zero value.

BPM is also called

corporate performance management enterprise performance management strategic enterprise management

Data Warehousing

creates a well-planned information management solution to enable analytical and informational processing despite platform, application, organizational, and other barriers.

Time:

daily weekly monthly quarterly yearly

Data storage=

data + metadata

data integration compromises 3 major processes

data access, data federation, and change capture

Where are data sourced from?

data are sourced from operational systems and possibly from external data sources

Sirius XM

data driven marketing

What is the source of a business report?

data from inside and outside the organization (via the use of ETL)

Kimball Model

data mart approach (bottom up) - build the DW one data mart at a time

Ordinal Data

data that contains codes assigned to objects or events as labels that also represent the rank order among them, For example, the variable credit score can be generally categorized as (1) low, (2) medium, and (3) high.

OLAP Data Source

data warehouse or data mart (a non normalized data repository primarily focused on accuracy and completeness)

ordinal data (categorical)

date field for the variable credit score can be generally categorized as 1) low 2) medium 3) high (good, great, bad), etc.

Success Depends on Mere Survival

depends on new projects - creating new products - entering new markets - acquiring new customers - streamlining some processes

Metadata

describe the structure and meaning of the data, contributing to their effective use.

metadata

describes the contents of a DW including its structure, meaning, syntax, and manner of use. -is also how data are organized, the meaning of the data, and how to use them effectively. -are essential to any database, not just a data warehouse

Visualization

differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures.

Multidimensional Presentation

dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry measures: money, sales volume, head count, inventory profit, actual versus forecast time: daily, weekly, monthly, quarterly, or yearly

Management

displaying operational data that identify what actions to take to resolve a problem.

Load

dumping data into DW

Middleware Tools

enable access to the data warehouse from a variety of front-end applications. These are applications (Apps) for data analyses and visualization of the data for decision making.

Parallel Processing

enables a DW to handle complex queries and scale up to handle many more requests.

Will parallel processing and/or partitioning be used?

enables a data warehouse to handle complex queries and scale up to handle many more requests

Business Performance Management

encompasses three key components 1. A set of integrated, closed-loop management and analytic processes, supported by technology. 2. Tools for businesses to define strategic goals and then measure/manage performance against them 3. Methods and tools for monitoring key performance indicators (KPIs), linked to organizational strategy

Bubble Chart

enhanced variant of a scatter plot because it adds a dimension via the size of the dot

Fundamental Challenge of Dashboard Design

ensuring that the required information is shown clearly on a single screen

Regression

especially linear regression, is perhaps the most widely known and used analytics technique in statistics.

OLTP execution speed

fast (recording of business transactions and routine reports)

Bill Inmo

father of DW

Tree Maps

good for showing hierarchical information - ex: illustrating the hierarchy chart of employees in a company

Data

has become one of the most valuable assets of today's organizations

IT

have the knowledge of high performance software, hardware and networking technologies

How can data warehousing and data analytics help TELCOs in overcoming their challenges?

help carriers secure or improve their standing in an increasingly competitive marketplace.

Geographic Map

helpful when a data set contains location data; can be used with GIS, geographic info systems.

Hierarchy Chart

helpful when illustrating the hierarchy chart of employees in a company

Cube

in OLAP is a multidimensional data structure (actual or virtual) that allows fast analysis of data.

What is the distribution of a business report?

in print email portal/intranet

Visual Analytics

information visualization + predictive analytics

Data Integration

integration that comprises three major processes - data access - data federation - change capture

BIGS

is a Business Intelligence program --its purpose is to deliver globally consistent and transparent management information across functions

Enterprise information integration (EII)

is a mechanism for pulling data from source systems to satisfy a request for information. It is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.

Enterprise application integration (EAI)

is a mechanism that integrates application functionality and shares functionality (rather than data) across systems, thereby enabling flexibility and reuse.

Logistic Regression

is a probability-based classification algorithm.

Operational Data Store

is a type of customer-information-file database that is often used as a staging area for a data warehouse.

Report

is any communication artifact prepared with the specific intention of conveying information in a presentable form.

Online Analytical Processing OLAP

is concerned with the analysis of that data and provide answers to business and management queries. ---Converting data into information for decision support -Data cubes, drill-down / rollup, slice & dice, -Requesting ad hoc reports ---Conducting statistical and other analyses ---Developing multimedia-based applications

Online Transaction Processing OLTP

is concerned with the capture and storage of data and is designed to best carry out day-to-day business functions -The main focus is on efficiency of routine tasks

What are the key similarities and differences between a two-tiered architecture and a three-tiered architecture?

is invisible to the user: in a two-tiered architecture, the application and data warehouse reside on the same machine (a server or other dedicated hardware platform); in a three-tiered architecture, they are on separate machines (servers).

Centralized Data Warehouse

is similar to hub-and-spoke except there are no dependent data marts; instead, there is a gigantic EDW that serves the needs of all organizational units.

Visual Analytics

is the combination of information visualization + predictive analytics

Visual Analytics

is the combination of information visualization and predictive analytics.

Operational Data Store (ODS)

is the database from which a business operates on an on-going basis.

Data

is the lowest level of abstraction (from which information and knowledge are derived)

Data Visualization

is the use of visual representations to explore, make sense of, and communicate data.

Art

it develops and improves with experience

Data Warehouse

large-scale repository of data, organizes all the data related to an organization, may include: audio, video, photographs; a physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format; a collection of integrated, subject-oriented databases design to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time

highly restrictive assumptions

linear regression models suffer from

Good Scalability

means that queries and other data access functions will grown linearly with the size of the warehouse

Key Performance Indicator (KPI)

measure of performance against a strategic objective and goal

Descriptive Statistics

methods can be used to measure central tendency, dispersion, or the shape of a given data set.

List and describe the three major categories of business reports.

metric management reports: involve outcome-oriented metrics based on service level agreements and/or key performance indicators dashboard-type reports: present a range of performance indicators on one page, with both static/predefined elements and customizable widgets and views balanced scorecard-type reports: present an integrated view of a company's health and include financial, customer, business process, and learning/growth perspectives.

Linear Regression

models suffer from highly restrictive assumptions.

Measures:

money sales volume head count inventory profit actual versus forecast

Star Schema

most commonly used and the simplest style of dimensional modeling -contains fact table surrounded by several dimension tables -fact table contains descriptive attributes (numerical values) -dimension tables contain classification and aggregation information about the values in fact table

Drill Down/Up

navigating among levels of data ranging from the most summarized (up) to the most detailed (down)

Operational Plan

plan that translates an organization's strategic objectives and goals into a set of well-defined tactics and initiatives, resource requirements, and expected results for some future time period (usually a year).

Metadata

primary purpose should be to provide context to the reported data; that is, it provides enriching information that leads to the creation of knowledge.

Dimensions:

products sales people market segments business units geographical locations distribution channels country or industry

Dashboards

provide visual displays of important information that is consolidated and arranged on a single screen so that the information can be digested at a single glance and easily drilled in and further explored.

Load

putting the converted (transformed) data into the DW.

Data Reduction

reduce dimension, reduce volume, and balance data

data reduction

reduce dimension, reduce volume, and balance data

Data

refers to a collection of facts usually obtained as the result of experiments, observations, transactions, or experiences.

Infrastructure

refers to architectural-hardware and software-enhancements.

Sourcing

refers to mechanisms for acquisition of data from diverse and dispersed sources.

Business Performance Management

refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance.

Scalability

refers to the degree to which a system can adjust to changes in demand without major additional changes or investments. Issues are: 1. the amount of data in the warehouse, 2. how quickly the warehouse is expected to grow, 3. the number of concurrent users, 4. and the complexity of user queries

Data & Information Visualization

related to information graphics, scientific visualization & statistical graphics - often includes charts, graphs illustrations...

Key Performance Indicator (KPI)

represents a strategic objective & metrics that measure performance against a goal

Real-Time Data Warehouse

same as an active data warehouse (ADW) but decision-making data are updated on an ongoing basis as business transactions occur

interval data (numerical)

scale measurement is temperature

Interval

scale measurement is temperature on the Celsius scale where the unit of measurement is 1/100 of the difference between the melting temperature and the boiling temperature of water in atmospheric pressure; that is, there is not an absolute zero value.

Downside of host

security and privacy

What are some critical issues for a data warehouse professional?

security and privacy of data and information

Data Warehouse Administrator

should be technical and ... -have the knowledge of high-performance software, hardware, and networking technologies possess solid business knowledge and insight be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure possess excellent communications skills

Data-Driven Marketing

sirius xm attracts and engages a new generation of radio consumers with

Automobiles

sirius xm is a provider of satellite radio, primarily provides services to

OLAP execution speed

slow (resource intensive, complex, large scale queries)

Decision-Making Processes

so as to suitably design/ maintain the data warehouse structure

proposed solution for Expedia.com

start by identifying key drivers of customer satisfaction. From these drivers, the customer satisfaction group constructed a scorecard application. This involved three steps: deciding how to measure satisfaction, setting performance targets, and putting the data into context. All of this was applied to their data warehouse (which they called DSS Factory).

Transformation

step includes: a) converting data from their original form to a uniform format b) editing and cleaning data c) creating metadata for documenting the transformation rules and procedures

Extraction

step refers to pulling data out of the various data sources

Data Mart

subset of a data warehouse typically consisting of a single subject area.

Real-time or active data warehousing

supplements and expands traditional data warehousing, moving into the realm of operational and tactical decision making by loading data in real time and providing data to users for active decision making.

Balanced Scorecards

the best known and mot widely used performance measurement system - helps translate an organization's financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives

Nominal Data

the code values for the variable "marital status", S-single, M-married, D-divorced

Ordinal Data

the data field for the variable credit score can be generally categorized as (1) low, (2) medium, or (3) high

Performance Dashboard Design

the fundamental challenge of dashboard design is to display all the required information on a single screen, clearly & without distraction, in a manner that can be assimilated quickly

Data Quality

the holistic quality of data, including their accuracy, precision, completeness, and relevance.

Time Series Forecasting

the use of mathematical modeling to predict future values of the variable of interest base on previously observed values.

Pie Chart

to show last month's sales by product line for the four products you sell and show which product line sold the greatest proportion of total sales.

Pie Chart

to show relative proportions of majors declared by college students in their sophomore year

OLAP Purpose

to support decision making and provide answers to business and management queries

OLTP Data Source

transaction database (a normalized data repository primarily focused on efficiency and consistency)

Dallas Cowboys

use of information visualization: used it with visualizing and understanding their merchandising activities, involving Complete supply chain from manufacture to customer. Challenges: Needed more visibility into data solution: real-time reporting and dashboard capabilities provided necessary visualization Result: can monitor merchandise now, see what is happening and why

TDWs

use restrictive reporting to confirm or check patterns, often predefined summary tables

Pivot

used to change the dimensional orientation of a report or an ad hoc query-page display

Bar Charts

useful in displaying nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends within your data

Data Mart

usually smaller and focuses on a particular subject or department

Dashboards Provide

visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance - information is easily drilled in & further explored

Performance Scorecards

visual displays used to chart progress against strategic and tactical goals & targets

Dashboards

visual presentation of critical data for executives to view. It allows executives to see hot spots in seconds and explore the situation

Pie Chart

visualization tool that can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration

compromises the validity of the model

what happens if the assumptions do not hold?

SiriusXM

who is a provider of satellite radio?

Energy Supply

why: large amount of information from a variety of sources. These companies work with large budgets and possible savings can increase revenue. SINGLE VERSION OF TRUTH: by centralizing information, it was possible for them to do it. challenges: variety and diversity of data solution: SAS visual analytics Result reports much faster, lower cost.

benefits of host

you don't have to pay for your own infrastructure. You pay for it but you pay as you use like a utility.

Describe various issues that affect whether an organization will purchase data transformation tools or build the transformation process itself.

• Data transformation tools are expensive. • Data transformation tools may have a long learning curve. • It is difficult to measure how the IT organization is doing until it has learned to use the tools.

Benefits of Hosted Data Warehouses

• Minimal investment in infrastructure • Frees up capacity on in-house systems • Frees up cash flow • Powerful solutions are affordable • Powerful solutions provide for growth • Better quality equipment and software • Faster connections • Ability to access data from remote locations • Allows a company focus on core business • Meets storage needs for large volumes of data

4 characteristics of data warehousing

∙ Subject oriented ∙ Integrated ∙ Time variant (time series) ∙ Nonvolatile

A common way of introducing data warehousing is to refer to its fundamental characteristics. Describe three characteristics of data warehousing.

∙ Subject oriented. Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support. ∙ Integrated. Integration is closely related to subject orientation. Data warehouses must place data from different sources into a consistent format. To do so, they must deal with naming conflicts and discrepancies among units of measure. A data warehouse is presumed to be totally integrated. ∙ Time variant (time series). A warehouse maintains historical data. The data do not necessarily provide current status (except in real-time systems). They detect trends, deviations, and long-term relationships for forecasting and comparisons, leading to decision making. Every data warehouse has a temporal quality. Time is the one important dimension that all data warehouses must support. Data for analysis from multiple sources contains multiple time points (e.g., daily, weekly, monthly views). ∙ Nonvolatile. After data are entered into a data warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data. ∙ Web based. Data warehouses are typically designed to provide an efficient computing environment for Web-based applications. ∙ Relational/multidimensional. A data warehouse uses either a relational structure or a multidimensional structure. A recent survey on multidimensional structures can be found in Romero and Abelló (2009). ∙ Client/server. A data warehouse uses the client/server architecture to provide easy access for end users. ∙ Real time. Newer data warehouses provide real-time, or active, data-access and analysis capabilities (see Basu, 2003; and Bonde and Kuckuk, 2004). ∙ Include metadata. A data warehouse contains metadata (data about data) about how the data are organized and how to effectively use them.


Related study sets

Introduction to Psychology M 39: LearningCurve 39a. Basic Concepts of Psychological Disorders and Mood Disorders

View Set

U.S. History 3-6-14 Study this for test next week

View Set

Chapter 43: Hematologic and Immunologic Dysfunction

View Set

EAQ 48 Skin integrity and wound assessment

View Set