Business Intelligence Exam 1
What is the evolution of decision support? (Ppt 1, Slide 9)
(1970s) Decision support systems--> (1980s) Enterprise/Executive IS--> (1990s) Business Intelligence ---> (2000s) Analytics--> (2010s) Big Data
What does the term "Business Analystics" mean? (Ppt 1, Slide19)
- the process of developing actionable decisions or recommendations for actions based on insights generated from historical data According to the Institute for Operations Research and Management Science (INFORMS) - Analytics represents the combination of computer technology, management science techniques, and statistics to solve real problems
What is big data? (Ppt 1, Slide 31)
-Big Data is data that cannot be stored or processed easily using traditional tools/means -Big Data typically refers to data that comes in many different forms: large, structured, unstructured, continuous - 3Vs: Volume, Variety, Velocity -Data (Big Data or otherwise) is worthless if it does not provide business value (and for it to provide business value, it has to be analyzed)
What is the 3 layers of information in a dashboard?
-Monitoring -Analysis -Management
Why data mining?
-More intense competition at the global scale -Recognition of the value in data sources -Availability of quality data on customers, vendors, transactions, Web, etc. -Consolidation and integration of data repositories into data warehouses -The exponential increase in data processing and storage capabilities; and decrease in cost -Movement toward conversion of information resources into nonphysical form
What are the basic steps in the SEMMA process? (Visual on Ppt 4, Slide 24)
-Sample -Explore -Modify -Model -Assess
What to look for in a dashboard?
-Use of visual components to highlight data and exceptions that require action -Transparent to the user, meaning that they require minimal training and are extremely easy to use -Combine data from a variety of systems into a single, summarized, unified view of the business -Enable drill-down or drill-through to underlying data sources or reports -Present a dynamic, real-world view with timely data -Require little coding to implement, deploy, and maintain
What is Business Intelligence (Book version)
-[Broad Definition] An umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies -[Narrow Definition] Descriptive analytics tools and techniques (i.e., reporting tools)
Which DM process is the best? (Visual on Ppt 4, Slide 26)
1. CRISP-DM
What are the 5 alternative data warehouse architectures (Ppt 3, Slides 14+15)
1. Independent data marts 2. Datamart Bus w linked dimensional data marts 3. Hub and Spoke (Corporate Info Factory) 4. Centralized data warehouse 5. Federated data warehouse
Which factors affect the data warehouse architecture?
1.Information interdependence between organizational units 2.Upper management's information needs 3.Urgency of need for a data warehouse 4.Nature of end-user tasks 5.Constraints on resources 6.Strategic view of the data warehouse prior to implementation 7.Compatibility with existing systems 8.Perceived ability of the in-house IT staff 9.Technical issues 10.Social/political factors
Data mining mistakes
1.Selecting the wrong problem for data mining 2.Ignoring what your sponsor thinks data mining is and what it really can/cannot do 3.Beginning without the end in mind 4.Not leaving sufficient time for data acquisition, selection, and preparation 5.Looking only at aggregated results and not at individual records/predictions ... 10 more mistakes... in your book
KPI- Key Performance Indicator (Ppt 3, Slide 52)
A KPI represents a strategic objective and metrics that measure performance against a goal -Strategy -Targets -Ranges -Encodings -Time frames -Benchmarks
What are data marts (independent and dependent)
A departmental small-scale "DW" that stores only limited/relevant data -Dependent data mart A subset that is created directly from a data warehouse -Independent data mart A small data warehouse designed for a strategic business unit or a department
Balanced Scorecard (BSC)
A performance measurement and management methodology that helps translate an organization's financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives
Human Intelligence
A set of cognitive abilities like the ability to learn, reason, act fast, remember, think critically and more
What is performance measurement system? (Ppt 3, Slides 50-52)
A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives -Comprises systematic comparative methods that indicate progress (or lack thereof) against goals
In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as: - Cluster analysis - Artificial neural networks - Association rule mining - Decision trees
Associating rule mining
What is Business Intelligence (Paul Mireault version)
BI is a set of processes, technologies, and presentation tools to make better decisions
What is the role of BI in Business Strategy? ] (Ppt 1, Slide 16)
BI must be aligned with business strategy.
How can you strike it rich?
Be creative.
What are the Managerial Issues with BI Implementations? (Ppt 1, Slide 18)
Buy vs Build, Cost benefit Analysis, Security and privacy, Integration with existing systems (ERP, SCM, CRM)
Why do organizations need to make better decision?
Competitive Advantage (Porter's 5 Forces Model)
Data mining confusion matrix on Ppt 4, Slide 30
Confusion matrix illustration
The user-interface of a BI system is often referred to as a -Data mining -Data warehouse -Dashboard
Dashboard
What are the 4 layers of a High-level BI Architecture?
Data Warehouse, Analytics, BPM Strategies, User Interface
Which characteristic of data requires that the variables and data values to be defined at the lowest (or as low as required) level of detail for the intended use of the data? - Data granularity - Data accessibility - Data richness - Data source reliability
Data granularity
Data mining is the intersection of the following multiple disciplines.
Data mining= knowledge discovery Statistics, math, information visualization, machine learning and pattern recognition, artificial intelligence, management science and information systems
What is data preprocessing? (Ppt 2, Slide 11+12+13)
Data preprocessing makes data ready for analytics -Data consolidation -Data cleaning -Data transformation -Data reduction
Who is the DW Administrator?
Data warehouse administrator (DWA) -DWA should... - have the knowledge of high-performance software, hardware, and networking technologies - possess solid business knowledge and insight - be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure - possess excellent communications skills Security and privacy is a pressing issue in DW -Safeguarding the most valuable assets -Government regulations (HIPAA, etc.) -Must be explicitly planned and executed
3 categories/classifications of business analytics (Ppt 1, Slide 20)
Descriptive, prescriptive, predictive
Descriptive statistics and probabilities (Ppt 2, Slides 20-24)
Dispersion, range, arithmetic mean, variance, standard deviation, mean absolute deviation, kurtosis, skewness, mean, mode.
What is ETL? Visual on (Ppt 3, slide 23)
Extract, Transform, Load •Issues affecting the purchase of an ETL tool -Data transformation tools are expensive -Data transformation tools may have a long learning curve •Important criteria in selecting an ETL tool -Ability to read from and write to an unlimited number of data sources/architectures -Automatic capturing and delivery of metadata -A history of conforming to open standards -An easy-to-use interface for the developer and the functional user
XML data is structured data. - True - False
False. It is SEMI structured, not all of the way structured.
Slide 49 What is lift in Market Basket Analysis?
How two things go together, n data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model.
Which data warehouse architecture is the best?
Hub and spoke
A lower-cost, scaled down architecture of a data warehouse is referred to as (an) -Independent data mart -Dependent data mart -Operational data store -Data mart -None
Independent data mart
Is Real-Time BI Attainable? (Ppt 1, Slide 17)
It is becoming attainable but is the cost/expense really worth the result? Real-Time BI enablers: RFID, web services, intelligent agents
When beer is purchased, a frozen meal purchase is 1.8 times more likely. This relationship/dependency describes the: - Interval - Support - Lift - Confidence
Lift
The data field "ethnic group" can be best described as: - Ratio data - Interval data - Ordinal data - Nominal data
Nominal data
The definition of data mining includes the following terms EXCEPT: - Process - Pattern - Nontrivial - Novel - None
None, these are all included in the definition of data mining.
Which of the following is NOT a step in data pre-processing? (What are the steps in datapre-processing) - Data transformation - Data reduction - Data consolidation - Data cleaning - None
None, these are all steps in data pre-processing.
Most operational data in Enterprise Resource Planning (ERP) systems are stores in _______ systems. - OLAP - Mining systems - OLTP - Decision support system
OLTP
How does OLTP differ from OLAP? (Ppt 1, Slide 15)
OLTP (Online Transaction Processing) - Operational databases - ERP, SCM, CRM -Goal of OLTP: data capture OLAP (Online Analytical Processing) - Data warehouses -Goal of OLAP: decision support
What is knowledge?
Plato knowledge = justified, true, and believed. To me, knowledge=understanding
Which of the following disciplines is NOT employed in data mining? - Stats -Psychology -Mathematics -Artificial Intelligence -None
Psychology
What is support in data mining?
Relative frequency of an item in a data set
Which is NOT an example of transaction processing? - A student enrollment in a class - ATM withdrawal - Bank deposit - Sales report
Sales report
Data mining myths
See slide 58 for chart
Major components of a data warehouse (generic framework) (Ppt 3, Slide 13)
See slide for full image Data sources--> ETL Process --> Metadata/Enterprise Warehouse--> DataMart--> Applications (visualizations)
What are the benefits of BI?
Speed, better decisions, more revenue, better customer service etc.
This measure of dispersions calculated by simply taking the square root of variations. -Variance -Range -Standard deviation - Arithmetic mean
Standard deviation
Data warehouse is a(n), _____________, integrated, time-variant, nonvolatile collection of data in support of management's decision-making process. - Model oriented - Subject-oriented - Object-oriented -Analysis-oriented
Subject-oriented
In data mining, the miner is often the: - The end user - The data scientist - The database administrator - None
The end user
A web client that connects to a web server, which is in turn connected to a BI application server, is reflective of: - Three-tier architecture -Two-tier architecture -Four-tier architecture -One-tier architecture
Three-tier architecture
According to our class discussion, knowledge is: - Understanding - Elusive - Business Intelligence - Obscure
Understanding
Information visualization + predictive analytics= -KPIs -Visual analytics -Data mining -Executive dashboard -None
Visual analytics
At the top of the hierarchy of knowledge you will find: - Data - Information - Knowledge - Intelligence - Wisdom
Wisdom
Hierarchy of knowledge
Wisdom Intelligence Knowledge Information Data
Should you validate a dashboard design with usability experts? - Yes - No
Yes
Can an organization have characteristics of human intelligence?
Yes, learn, remember, and act fast.
Why is meta data important?
assists in the conversion of data and info into knowledge
What is logistic regression? (Ppt 2, Slide 35)
f(y)= 1/1+e^(Bo-Bx)
Which are the most common DM processes? Data mining process
•A manifestation of the best practices •A systematic way to conduct DM projects •Moving from Art to Science for DM project •Everybody has a different version •Most common standard processes: -CRISP-DM (Cross-Industry Standard Process for Data Mining) -SEMMA (Sample, Explore, Modify, Model, and Assess) -KDD (Knowledge Discovery in Databases)
What is a data warehouse?
•A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format •A relational database? (so what is the difference?) "The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time
What is visual analytics?
•A recently coined term -Information visualization + predictive analytics •Information visualization -Descriptive, backward focused -"what happened" "what is happening" •Predictive analytics -Predictive, future focused -"what will happen" "why will it happen" •There is a strong move toward visual analytics
What is Association Rule Mining (market basket analysis)? (Ppt 4, Slides 46-48)
•A very popular DM method in business •Finds interesting relationships (affinities) between variables (items or events) •Part of machine learning family •Employs unsupervised learning •There is no output variable •Also known as market basket analysis •Often used as an example to describe DM to ordinary people, such as the famous "relationship between diapers and beers!"
What is a business report?
•A written document that contains information regarding business matters. •Purpose: to improve managerial decisions •Source: data from inside and outside the organization (via the use of ETL) •Format: text + tables + graphs/charts •Distribution: in-print, email, portal/intranet Data acquisition --> information generation--> Decision making -->Process management
What is Support and Confidence in Association Rule Mining?
•Are all association rules interesting and useful? A Generic Rule: X Þ Y [S%, C%] X, Y: products and/or services X: Left-hand-side (LHS) Y: Right-hand-side (RHS) S: Support: how often X and Y go together C: Confidence: how often Y go together with the X
What is BPM? (Ppt 3, Slides 42-44)
•BPM refers to the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance. •BPM encompasses three key components -A set of integrated, closed-loop management and analytic processes, supported by technology ... -Tools for businesses to define strategic goals and then measure/manage performance against them -Methods and tools for monitoring key performance indicators (KPIs), linked to organizational strategy
What are the best practices in dashboard design?
•Benchmark KPIs with Industry Standards •Wrap the Metrics with Contextual Metadata •Validate the Design by a Usability Specialist •Prioritize and Rank Alerts and Exceptions •Enrich Dashboard with Business-User Comments •Present Information in Three Different Levels •Pick the Right Visual Constructs •Provide for Guided Analytics
Data mining vendors
•Commercial -IBM SPSS Modeler (formerly Clementine) -SAS Enterprise Miner -Statistica - Dell/Statsoft -... many more •Free and/or Open Source -KNIME -RapidMiner -Weka R, ...
What are the basic steps in the CRISP-DM process (Ppt 4, Slides 22+23)
•Cross Industry Standard Process for Data Mining •Proposed in 1990s by a European consortium •Composed of six consecutive phases -Step 1: Business Understanding -Step 2: Data Understanding -Step 3: Data Preparation -Step 4: Model Building -Step 5: Testing and Evaluation -Step 6: Deployment
Examples of data mining applications?
•Customer Relationship Management -Maximize return on marketing campaigns -Improve customer retention (churn analysis) -Maximize customer value (cross-, up-selling) -Identify and treat most valued customers •Banking & Other Financial -Automate the loan application process -Detecting fraudulent transactions -Maximize customer value (cross-, up-selling) -Optimizing cash reserves with forecasting
What are some types of data mining patterns? (Ppt 4, Slide 12+14)
•DM extract patterns from data -Pattern? A mathematical (numeric and/or symbolic) relationship among data items •Types of patterns -Association -Prediction -Cluster (segmentation) -Sequential (or time series) relationships
When is data valuable? Metrics for analytics ready data
•Data source reliability •Data content accuracy •Data accessibility •Data security and data privacy •Data richness •Data consistency •Data currency/data timeliness •Data granularity •Data validity and data relevancy
What is data? The nature of data?
•Data: a collection of facts -usually obtained as the result of experiences, observations, or experiments •Data may consist of numbers, words, images, ... •Data is the lowest level of abstraction (from which information and knowledge are derived) •Data is the source for information and knowledge •Data quality and data integrity à critical to analytics
Which are the most common classification techniques
•Decision tree analysis •Statistical analysis •Neural networks •Support vector machines •Case-based reasoning •Bayesian classifiers •Genetic algorithms •Rough sets
What is dimensional modeling?
•Dimensional Modeling -A retrieval-based system that supports high-volume query access
How does Enterprise Application Integration (EAI) differ from Enterprise Information Integration (EII)?
•ETL = Extract Transform Load •Data integration -Integration that comprises three major processes: data access, data federation, and change capture. •Enterprise application integration (EAI) -A technology that provides a vehicle for pushing data from source systems into a data warehouse •Enterprise information integration (EII) -An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.
What are decision trees?
•Employs a divide-and-conquer method •Recursively divides a training set until each division consists of examples from one class: 1.Create a root node and assign all of the training data to it. 2.Select the best splitting attribute. 3.Add a branch to the root node for each value of the split. Split the data into mutually exclusive subsets along the lines of the specific split. 4.Repeat steps 2 and 3 for each and every leaf node until the stopping criteria is reached. •DT algorithms mainly differ on 1.Splitting criteria §Which variable, what value, etc. 2.Stopping criteria §When to stop building the tree 3.Pruning (generalization method) §Pre-pruning versus post-pruning •Most popular DT algorithms include -ID3, C4.5, C5; CART; CHAID; M5
What is six sigma? Types of business reports?
•Metric Management Reports -Help manage business performance through metrics (SLAs for externals; KPIs for internals) -Can be used as part of Six Sigma and/or TQM •Dashboard-Type Reports -Graphical presentation of several performance indicators in a single page using dials/gauges •Balanced Scorecard-Type Reports -Include financial, customer, business process, and learning & growth indicators Six sigma: Variability in a process, less than 3.4 defects per million
Data mining classification
•Most frequently used DM method •Part of the machine-learning family •Employ supervised learning •Learn from past data, classify new data •The output variable is categorical (nominal or ordinal) in nature •Classification versus regression? •Classification versus clustering?Da
What are measures and dimensions?
•Multidimensional presentation -Dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry -Measures: money, sales volume, head count, inventory profit, actual versus forecast -Time: daily, weekly, monthly, quarterly, or yearly
Data warehouse OLTP vs. OLAP
•OLTP (Online Transaction Processing) -Capturing and storing data from ERP, CRM, POS, ... -The main focus is on efficiency of routine tasks •OLAP (Online Analytical Processing) -Converting data into information for decision support -Data cubes, drill-down / rollup, slice & dice, ... -Requesting ad hoc reports -Conducting statistical and other analyses -Developing multimedia-based applications -...more in the book
Data mart components + ODS (Ppt 3, Slide 10)
•Operational data stores (ODS) -A type of database often used as an interim area for a data warehouse •Oper marts -An operational data mart •Enterprise data warehouse (EDW) -A data warehouse for the enterprise •Metadata - "data about data" -In DW metadata describe the contents of a data warehouse and its acquisition and use
What is a (performance) dashboard? (Ppt 2, Slides 44+59+60)
•Performance dashboards are commonly used in BPM software suites and BI platforms •Dashboards provide visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily drilled in and further explored
Inferential statistics and regression (Ppt 2, Slide 28+29)
•Regression -A part of inferential statistics -The most widely known and used analytics technique in statistics -Used to characterize relationship between explanatory (input) and response (output) variable •It can be used for -Hypothesis testing (explanation) -Forecasting (prediction)
What is a report? What is a report's function and purpose? (Ppt 2, Slide 41+42)
•Report = Information à Decision •Report? -Any communication artifact prepared to convey specific information •A report can fulfill many functions -To ensure proper departmental functioning -To provide information -To provide the results of an analysis -To persuade others to act To create an organizational memory
Which are the most common security and privacy issues with BI?
•Security and privacy is a pressing issue in DW -Safeguarding the most valuable assets -Government regulations (HIPAA, etc.) -Must be explicitly planned and executed
OLAP operations (Visual on Ppt 3, Slide 35)
•Slice - a subset of a multidimensional array •Dice - a slice on more than two dimensions •Drill Down/Up - navigating among levels of data ranging from the most summarized (up) to the most detailed (down) •Roll Up - computing all of the data relationships for one or more dimensions •Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display
Data Mining Characteristics? What is the source for data mining?
•Source of data for DM is often a consolidated data warehouse (not always!). •DM environment is usually a client-server or a Web-based information systems architecture. •Data is the most critical ingredient for DM which may include soft/unstructured data. •The miner is often an end user. •Striking it rich requires creative thinking. •Data mining tools' capabilities and ease of use are essential (Web, Parallel processing, etc.).
What is the future of BI systems?
•Sourcing... -Web, social media, and Big Data -Open source software -SaaS (software as a service) -Cloud computing -Data lakes •Infrastructure... -Columnar -Real-time DW -Data warehouse appliances -Data management practices/technologies -In-database & In-memory processing New DBMS -New DBMS, Advanced analytics, ...
What is the Star schema? How does it differ from Snowflake schema?
•Star schema -The most commonly used and the simplest style of dimensional modeling -Contain a fact table surrounded by and connected to several dimension tables •Snowflakes schema -An extension of star schema where the diagram resembles a snowflake in shape
How do statistics relate to Business Analytics (Ppt 2, Slide 18+19)
•Statistics -A collection of mathematical techniques to characterize and interpret data •Descriptive Statistics -Describing the data (as it is) •Inferential statistics -Drawing inferences about the population based on sample data •Descriptive statistics for descriptive analytics
How is structured data different than unstructured data? Taxonomy of data (Ppt 2, Slide 8 & 9)
•Structured data -Targeted for computers to process -Numeric versus nominal •Unstructured/textual data -Targeted for humans to process/digest •Semi-structured data? -XML, HTML, Log files, etc.
Characteristics of a data warehouse
•Subject oriented •Integrated •Time-variant (time series) •Nonvolatile •Summarized •Not normalized •Metadata •Web based, relational/multi-dimensional Client/server, real-time/right-time/active
What is data mining?
•The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases. -- Fayyad et al., (1996) •Keywords in this definition: Process, nontrivial, valid, novel, potentially useful, understandable. •Data mining: a misnomer? •Other names: knowledge extraction, pattern analysis, knowledge discovery, information harvesting, pattern searching, data dredging,...
3 tier architecture and 2 tier architecture (Ppt 3, Slides 14+15)
•Three-tier architecture 1.Data acquisition software (back-end) 2.The data warehouse that contains the data & software 3.Client (front-end) software that allows users to access and analyze data from the warehouse •Two-tier architecture -First two tiers in three-tier architecture are combined into one
What is cluster analysis? (Ppt 4, Slides 41-45)
•Used for automatic identification of natural groupings of things •Part of the machine-learning family •Employ unsupervised learning •Learns the clusters of things from past data, then assigns new instances •There is not an output/target variable •In marketing, it is also known as segmentation