MSBA 203 Exam 1
Is Real-Time BI Attainable? (Ch 1)
- Analyzing real time data is expensive and time-consuming
3 types of analytics (Ch 1)
- Descriptive: dashboards (BI) - Predictive: use past data to model future - Prescriptive: optimization, advise on how best to do your job
How OLTP does differ from OLAP? (Ch 1) (Ch 3)
- Online Transaction Processing (OLTP) Operational databases ERP, SCM, CRM, ... Goal: data capture The main focus is on efficiency of routine tasks - Online Analytical Processing (OLAP) Data warehouses Converting data into information for decision support Data cubes, drill-down / rollup, slice & dice, ... Requesting ad hoc reports Conducting statistical and other analyses Developing multimedia-based applications Goal: decision support - OLAP associated with databases, OLTP speed of entry
Predictive Analytics (Ch 1)
-Aims to determine what is likely to happen in the future (foreseeing the future events) -Looking at the past data to predict the future -Enablers 1. Data mining 2. Text mining / Web mining 3. Forecasting (i.e., time series)
Business Intelligence (Ch 1)
-An umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies -BI is a set of processes, technologies, and presentation tools to make better decisions
Descriptive Analytics (Ch 1)
-Answering the question of what happened -Retrospective analysis of historic data
Which are some types of DM patterns? (Ch 4)
-Association -Prediction -Cluster (segmentation) -Sequential (or time series) relationships
Big Data (Ch 1)
-Big Data is data that cannot be stored or processed easily using traditional tools/means -Big Data typically refers to data that comes in many different forms: large, structured, unstructured, continuous 3Vs - Volume, Variety, Velocity -Data (Big Data or otherwise) is worthless if it does not provide business value (and for it to provide business value, it has to be analyzed)
Clustering results may be used to (Ch 4)
-Identify natural groupings of customers -Identify rules for assigning new cases to classes for targeting/diagnostic purposes -Provide characterization, definition, labeling of populations -Decrease the size and complexity of problems for other data mining methods -Identify outliers in a specific domain (e.g., rare-event detection)
What is Visual Analytics? (Ch 2)
-Information visualization + predictive analytics •Information visualization -Descriptive, backward focused -"what happened" "what is happening" •Predictive analytics -Predictive, future focused -"what will happen" "why will it happen"
What are measures and dimensions? (Ch 3)
-Measures: money, sales volume, head count, inventory profit, actual versus forecast -Dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry
Which are the 3 layers of information in a Dashboard? (Ch 2)
-Monitoring -Analysis -Management
. Which are the most common security and privacy issues with BI? (Ch 3)
-Safeguarding the most valuable assets -Government regulations (HIPAA, etc.) -Must be explicitly planned and executed
Business Analytics (Ch 1)
-The process of developing actionable decisions or recommendations for actions based on insights generated from historical data -Represents the combination of computer technology, management science techniques, and statistics to solve real problems
What to look for in a dashboard? (Ch 2)
-Use of visual components to highlight data and exceptions that require action -Transparent to the user, meaning that they require minimal training and are extremely easy to use -Combine data from a variety of systems into a single, summarized, unified view of the business -Enable drill-down or drill-through to underlying data sources or reports -Present a dynamic, real-world view with timely data -Require little coding to implement, deploy, and maintain
Issues affecting the purchase of an ETL tool (Ch 3)
-data transformation tools are expensive and may have a long learning curve
Which are the major components of a DW (generic framework)? (Ch 3)
1. Data sources 2. Data extraction and transformation 3. Data loading 4. Comprehensive database 5. Metadata 6. Middleware tools
What is a 3-tier DW (BI) architecture? (Ch 3)
1.Data acquisition software (back-end) 2.The data warehouse that contains the data & software 3.Client (front-end) software that allows users to access and analyze data from the warehouse
Which factors affect the DW architecture? (Ch 3)
1.Information interdependence between organizational units 2.Upper management's information needs 3.Urgency of need for a data warehouse 4.Nature of end-user tasks 5.Constraints on resources 6.Strategic view of the data warehouse prior to implementation 7.Compatibility with existing systems 8.Perceived ability of the in-house IT staff 9.Technical issues 10.Social/political factors
DM Mistakes (Ch 4)
1.Selecting the wrong problem for data mining 2.Ignoring what your sponsor thinks data mining is and what it really can/cannot do 3.Beginning without the end in mind 4.Not leaving sufficient time for data acquisition, selection, and preparation 5.Looking only at aggregated results and not at individual records/predictions ... 10 more mistakes... in your book
DT algorithms mainly differ on (Ch 4)
1.Splitting criteria -Which variable, what value, etc. 2.Stopping criteria -When to stop building the tree 3.Pruning (generalization method) -Pre-pruning versus post-pruning
What is Support and Confidence in Association Rule Mining? (Ch 4)
A Generic Rule: X -> Y [S%, C%] S: Support: how often X and Y go together C: Confidence: how often Y go together with the X
How do DWs differ from a database? (Ch 3)
A database is an application-oriented collection of data, whereas a data warehouse is a subject-oriented collection of data Databases use Online Transactional Processing (OLTP), whereas data warehouses use Online Analytical Processing (OLAP) Data warehouses are much larger than databases and are mainly used for data mining and data analysis to assist leaders in making decisions
What is a balanced scorecard? (Ch 3)
A performance measurement and management methodology that helps translate an organization's financial, customer, internal process, and learning and growth objectives and targets into a set of actionable initiatives
What is Business Performance Management (BPM)? (Ch 3)
A real-time system that alerts managers to potential opportunities, impending problems, and threats, and then empowers them to react through models and collaboration •encompasses three key components -A set of integrated, closed-loop management and analytic processes, supported by technology ... -Tools for businesses to define strategic goals and then measure/manage performance against them -Methods and tools for monitoring key performance indicators (KPIs), linked to organizational strategy
What is dimensional modeling? (Ch 3)
A retrieval-based system that supports high-volume query access
Star Schema (Ch 3)
A simple database design containing a fact table surrounded by and connected to several dimension tables
Independent data mart (Ch 3)
A small data warehouse designed for a strategic business unit or a department
Dependent data mart (Ch 3)
A subset that is created directly from a data warehouse
What is performance measurement? (Ch 3)
A system that assists managers in tracking the implementations of business strategy by comparing actual results against strategic goals and objectives
Important criteria in selecting an ETL tool (Ch 3)
Ability to read from and write to an unlimited number of data sources/architectures Automatic capturing and delivery of metadata A history of conforming to open standards An easy-to-use interface for the developer and the functional user
Snowflake Schema (Ch 3)
An expanded version of a star schema in which dimension tables are normalized into several related tables.
What is the role of BI in Business Strategy? (Ch 1)
BI must be aligned with Business Strategy
Bar Charts (Ch 2)
Bar charts are easy for our eyes to read. Our eyes compare the end points of the bars, so it is easy to see quickly which category is the biggest, which is the smallest, and also the incremental difference between categories. The rule we've illustrated here is that bar charts must have a zero baseline.
What are the Managerial Issues with BI Implementations? (Ch 1)
Buy vs Build, Cost benefit Analysis, Security and privacy, Integration with existing systems (ERP, SCM, CRM)
Which DM process is the best? (Ch 4)
CRISP-DM
Which are the most common DM processes? (Ch 4)
CRISP-DM, SEMMA, KDD
What are the 4 layers of a High-level BI Architecture? (Ch 1)
Data Warehouse, Analytics, BPM Strategies, User Interface
What is the hierarchy of knowledge? (Ch 1)
Data to Information to Knowledge to Intelligence to Wisdom
Who is the DW Administrator? Slide 39 (Ch 3)
Data warehouse administrator (DWA) should... -have the knowledge of high-performance software, hardware, and networking technologies -possess solid business knowledge and insight -be familiar with the decision-making processes so as to suitably design/maintain the data warehouse structure -possess excellent communications skills
What is the evolution of Decision Support? (Ch 1)
Decision support systems -> Enterprise/Executive IS -> Business Intelligence -> Analytics -> Big Data
What is ETL? (Ch 3)
ETL is abbreviation of extract, transform, and load ETL is software that enables businesses to consolidate their disparate data while moving it from place to place. The data can come from any source First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data - using rules or lookup tables, or creating combinations with other data - to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.
What is a Dashboard? (Ch 2)
Graphical presentation of several performance indicators in a single page using dials/gauges; commonly used in BPM software suites and BI platforms; provide visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily drilled in and further explored
Which DW architecture is the best? (Ch 3)
Hub and Spoke and Bus
What are the 5 alternative DW architectures? (Ch 3)
Independent data marts architecture - Simple and cheapest. - Data marts operate independently of one another. Data mart bus architecture with linked dimensional datamarts - individual Data marts are linked to each other via middleware *Hub and spoke architecture (corporate information factory) - Most famous data warehousing - Easy customization Centralized data warehouse architecture Federated architecture - best plans to develop perfect system
Data Integration (Ch 3)
Integration that comprises three major processes: data access, data federation, and change capture.
Line Graphs (Ch 2)
Line graphs are most commonly used to plot continuous data. Because the points are physically connected via the line, it implies a connection between the points that may not make sense for categorical data (a set of data that is sorted or divided into different categories).
What is ODS? (Ch 3)
ODS is the abbreviation of Operational Data Store ‑ a database structure that is a repository for near real-time operational data rather than long-term trend data. The ODS may further become the enterprise-shared operational database, allowing operational systems that are being re-engineered to use the ODS as their operation databases.
What is knowledge? (Ch 1)
Plato knowledge = justified, true, and believed. To me, knowledge=understanding.
What is Data Mining? (Ch 4)
Process of identifying patterns in data; to build models that we can apply to make decisions
What are the basic steps in the SEMMA process? (Ch 4)
Sample Explore: Visualization and basic description of the data Modify: Select variables, transform variable representations Model: Use variety of statistical and machine learning models Assess: Evaluate the accuracy and usefulness of the models
Scatterplots (Ch 2)
Scatterplots can be useful for showing the relationship between two things, because they allow you to encode data simultaneously on a horizontal x-axis and vertical y-axis to see whether and what relationship exists.
Stacked Horizontal Bar Chart (Ch 2)
Similar to the stacked vertical bar chart, stacked horizontal bar charts can be used to show the totals across different categories but also give a sense of the subcomponent pieces. They can be structured to show either absolute values or sum to 100%.
Examples of DM Applications (Ch 4)
Slide 16
DM Myths (Ch 4)
Slide 58
Slopegraphs (Ch 2)
Slopegraphs can be useful when you have two time periods or points of comparison and want to quickly show relative increases and decreases or differences across various categories between the two data points.
What are the benefits of BI? (Ch 1)
Speed, better decisions, more revenue, better customer service etc.
Multidimensionality (Ch 3)
The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)
Waterfall Chart (Ch 2)
The waterfall chart can be used to pull apart the pieces of a stacked bar chart to focus on one at a time, or to show a starting point, increases and decreases, and the resulting ending point.
What are Data Marts? (Ch 3)
a lower-cost, scaled down version of a data warehouse •Specialized for a single department
What is Six Sigma? (Ch 2)
a methodology aimed at reducing the number of defects in a business process; The goal of Six Sigma is to achieve a level of quality that is nearly perfect, with only 3.4 defects per million opportunities
3 types of memory (Ch 4)
a.iconic memory: super fast. It happens without you consciously realizing it and is piqued when we look at the world around us. Information stays in your iconic memory for a fraction of a second before it gets forwarded on to your short-term memory b.short-term memory: has limitations. Specifically, people can keep about four chunks of visual information in their short-term memory at a given time. c.long-term memory: aggregate of visual and verbal memory, which act differently. Verbal memory is accessed by a neural net, where the path becomes important for being able to recognize or recall. Visual memory, on the other hand, functions with specialized structures.
Key performance indicators (KPIs) (Ch 3)
measurements that define and measure the progress of an organization toward achieving its objectives
Why is metadata important? (Ch 3)
provides context to understand data; assists in the conversion of data and information into knowledge
Preattentive Attributes (Ch 4)
visual properties that we notice without using conscious effort to do so and determine what information catches our attention; can be leveraged to help direct your audience's attention to where you want them to focus it
What is a Data Warehouse? (Ch 3)
•A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format •"The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time"
What is Association Rule Mining (market basket analysis)? (Ch 4)
•A very popular DM method in business •Finds interesting relationships (affinities) between variables (items or events) •Part of machine learning family •Employs unsupervised learning •There is no output variable •Also known as market basket analysis •Often used as an example to describe DM to ordinary people, such as the famous "relationship between diapers and beers!"
WHat is a business report? (Ch 2)
•A written document that contains information regarding business matters. •Purpose: to improve managerial decisions •Source: data from inside and outside the organization (via the use of ETL) •Format: text + tables + graphs/charts •Distribution: in-print, email, portal/intranet •Data acquisition -> Information generation -> Decision making -> Process management
Prescriptive Analytics (Ch 1)
•Aims to determine the best possible decision •Uses both descriptive and predictive to create the alternatives, and then determines the best one •Enablers -Optimization -Simulation -Multi-Criteria Decision Modeling -Heuristic Programming •Analytics Applied to Many Domains
Which are the best practices in Dashboard design? (Ch 2)
•Benchmark KPIs with Industry Standards •Wrap the Metrics with Contextual Metadata •Validate the Design by a Usability Specialist •Prioritize and Rank Alerts and Exceptions •Enrich Dashboard with Business-User Comments •Present Information in Three Different Levels •Pick the Right Visual Constructs Provide for Guided Analytics
What are the basic steps in the CRISP-DM process? (Ch 4)
•Cross Industry Standard Process for Data Mining -Step 1: Business Understanding -Step 2: Data Understanding -Step 3: Data Preparation -Step 4: Model Building -Step 5: Testing and Evaluation Step 6:Deployment
When is data valuable? (Ch 2)
•Data source reliability •Data content accuracy •Data accessibility •Data security and data privacy •Data richness •Data consistency •Data currency/data timeliness •Data granularity •Data validity and data relevancy
Which are the most common classification techniques? (Ch 4)
•Decision tree analysis •Statistical analysis •Neural networks •Support vector machines •Case-based reasoning •Bayesian classifiers •Genetic algorithms •Rough sets
What are decision trees? (Ch 4)
•Employs a divide-and-conquer method •Recursively divides a training set until each division consists of examples from one class: 1.Create a root node and assign all of the training data to it. 2.Select the best splitting attribute. 3.Add a branch to the root node for each value of the split. Split the data into mutually exclusive subsets along the lines of the specific split. 4.Repeat steps 2 and 3 for each and every leaf node until the stopping criteria is reached.
How does Enterprise Application Integration (EAI) differ from Enterprise Information Integration (EII)? (Ch 3)
•Enterprise application integration (EAI) -A technology that provides a vehicle for pushing data from source systems into a data warehouse •Enterprise information integration (EII) -An evolving tool space that promises real-time data integration from a variety of sources, such as relational or multidimensional databases, Web services, etc.
Why Data Mining? (Ch 4)
•More intense competition at the global scale. •Recognition of the value in data sources. •Availability of quality data on customers, vendors, transactions, Web, etc. •Consolidation and integration of data repositories into data warehouses. •The exponential increase in data processing and storage capabilities; and decrease in cost. •Movement toward conversion of information resources into nonphysical form.
What is DM Classification? (Ch 4)
•Most frequently used DM method •Part of the machine-learning family •Employ supervised learning •Learn from past data, classify new data •The output variable is categorical (nominal or ordinal) in nature
What is data preprocessing? (Ch 2)
•Readying the data for analytics is needed -Data consolidation: - relevant data is collected from the identified sources, the necessary records and variables are selected, and the records coming from multiple data sources are integrated/merged -Data cleaning: -Data transformation -Data reduction
Inferential Statistics and regression (Ch 2)
•Regression -A part of inferential statistics -The most widely known and used analytics technique in statistics -Used to characterize relationship between explanatory (input) and response (output) variable •It can be used for -Hypothesis testing (explanation) -Forecasting (prediction)
What is a report? (Ch 2)
•Report = Information -> Decision -Any communication artifact prepared to convey specific information •A report can fulfill many functions -To ensure proper departmental functioning -To provide information -To provide the results of an analysis -To persuade others to act -To create an organizational memory...
What are the basic OLAP operations? (Ch 3)
•Slice - a subset of a multidimensional array •Dice - a slice on more than two dimensions •Drill Down/Up - navigating among levels of data ranging from the most summarized (up) to the most detailed (down) •Roll Up - computing all of the data relationships for one or more dimensions •Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display
What is the source for DM? (Ch 4)
•Source of data for DM is often a consolidated data warehouse (not always!).
What are the DM characteristics? (Ch 4)
•Source of data for DM is often a consolidated data warehouse (not always!). •DM environment is usually a client-server or a Web-based information systems architecture. •Data is the most critical ingredient for DM which may include soft/unstructured data. •The miner is often an end user. •Striking it rich requires creative thinking. •Data mining tools' capabilities and ease of use are essential (Web, Parallel processing, etc.).
What is the future of BI systems? (Ch 3)
•Sourcing... -Web, social media, and Big Data -Open source software -SaaS (software as a service) -Cloud computing -Data lakes •Infrastructure... -Columnar -Real-time DW -Data warehouse appliances -Data management practices/technologies -In-database & In-memory processing New DBMS New DBMS, Advanced analytics,
How do statistics relate to Business Analytics? (Ch 2)
•Statistics -A collection of mathematical techniques to characterize and interpret data •Descriptive Statistics -Describing the data (as it is) •Inferential statistics -Drawing inferences about the population based on sample data
How is structured data different from unstructured data? (Ch 2)
•Structured data -Targeted for computers to process -Numeric versus nominal •Unstructured/textual data -Targeted for humans to process/digest
What are the characteristics of a DW? (Ch 3)
•Subject oriented •Integrated •Time-variant (time series) •Nonvolatile •Summarized •Not normalized •Metadata •Web based, relational/multi-dimensional •Client/server, real-time/right-time/active
What is Cluster Analysis? (Ch 4)
•Used for automatic identification of natural groupings of things •Part of the machine-learning family •Employ unsupervised learning •Learns the clusters of things from past data, then assigns new instances •There is not an output/target variable •In marketing, it is also known as segmentation
What is data? (Ch 2)
•a collection of facts -usually obtained as the result of experiences, observations, or experiments •Data may consist of numbers, words, images, •Data is the lowest level of abstraction (from which information and knowledge are derived) •Data is the source for information and knowledge •Data quality and data integrity -> critical to analytics