Chapter 9: Business Intelligence Systems
difference between BI users and knowledge workers
-BI users: specialists in data analysis -knowledge workers: nonspecialists users of BI results
problems with operational data
-dirty data -missing values -inconsistent data -nonintegrated data -wrong granularity (level of detail) -too much data (too many attributes/data points)
3 primary activities in the BI process
1. acquire data 2. perform analysis 3. publish results
3 disadvantages of expert systems
1. difficult and expensive to develop 2. difficult to maintain 3. unable to live up to the high expectations set by their name *note: few expert systems have been successful
2 reasons for resistance to hyper-social KS
1. employees can be reluctant to exhibit their ignorance (i.e. may not submit blog entries out of fear of appearing incompetent) 2. employee competition *endorsement can be effectivei n curbing these types of resistance, especially if follower up by strong positive feedback
3 alternatives for content management applications
1. in-house custom (expensive) 2. off-the-shelf (more functionality, less expensive) 3. public search engine (i.e. Google, Bing; not everything is publically accessible)
challenges of CMS
1. most databases are huge 2. CMS content is dymanic 3. documents do not exist in isolation from each other (refer to each other, when one changes the others change as well) 4. document contents are perishable (become obsolete, need changing) 5. content is provided in many languages
functions of a data warehouse
1. obtain data 2. cleanse data 3. organize and relate data 4. catalog data
organizational use of BI (heirarchy, top of pyramic is #1)
1. project management 2. problem solving 3. deciding 4. informing note: informing needed to decide; deciding needed prior to problem solving; problem solving necessary for project management
static reports
BI documents that are fixed at the time of creation and do not change; static content requires only low skill i.e. PDF documents
dynamic reports
BI documents that are updated at the time they are requested; publishing this requires BI application to access a database or other data source at the time the report is delivered to the user; dynamic content requires more skill i.e. sales report that is current at the time accessed
content management systems (CMS)
IS that support the manaagement and delivery of documents including reports, Web pages, and other expressions of employee knowledge; users are companies that sell complicated products and want to share their knowledge of those products with employees and customers
Hadoop is written in ______ and originally ran on ______
Java; Linux
synonym for OLAP report
OLAP cube
Hadoop includes a query lanquage titled ______
Pig
T or F: business intelligence is used to predict purchasing patterns & changes in purchasing patterns
T
T or F: placing BI applications on operational servers can dramatically reduce systems performance
T
T or F: publishing media include print as well as online content delivered via Web servers, specialized Web servers known as "report servers," and BI results that are sent via automation to other programs
T
T or F: hyper-organization theory focus moves from the knowledge and content to the fostering of authentic relationships among the creators and the users of that knowledge
T; in other words, they move from controlled processes to messy ones
reminder: the most important part of an IS is...
YOU! the "people" portion
BI server
a Web server application that is purpose-built for the publishing of business intelligence i.e. Microsoft SQL Server Report manager
data mart
a data collection, smaller than a data warehouse, that addresses the needs of a particular department or functional area of the business
data warehouse
a facility for managing an organization's BI data; include data purchased from outside sources
MapReduce
a technique for harnessing the power of thousands of computers working in parallel; BigData is broken into pieces and hundreds of thousands of indepenend processors search these pieces for something of interest
rich directory
an employee directory that includes not only one standard name, email, phone, and address but also organizational structure and expertise
unsupervised data mining
analysts do not create a model or hypothesis before running the analysis; apply a data mining application to the data and observe the results; hypothesis created AFTER the analysis
data mining
application of statistial techniques to find patterns and relationships among data for classification and prediction; also called knowledge discovery in databases (KDD)
dimeision
characteristic of measure
neural networks
common supervised application used to predict values and make classifications such as "good prospect" or "poor prospect" customers; complicated set of possibly nonlinear equations
regression analysis
common supervised technique which measures the effect of a set of variables on another variable
cluster analysis
common unsupervised technique where statistical techniques identify groups of entities that have similar characteristics
confidence
contidional probability estimate that consideres additional probabilities such as the proportion of customers who bought a swim mask that also bought fins; decimal number
supervised data mining
data miners develop a model PRIOR to the analysis and apply statistical techniques to data to estimate parameters of the model
organizations use ______________________ to select variables that are then used by other types of data mining tools
decision trees
push publishing
deliveres BI to users without any request from the users; BI results are delivered according to a schedule or as a result of an event or particular data condition
It is better to have too __________ a granularity than too ____________
fine; coarse
market-basket analysis
first typical data mining tool; an unsupervised data mining technique for determining sales patterns; shows products that customers tend to buy together
drill down
further divide the data into more detail
BI system's 5 components
hardware software data procedures people
cross-selling opportunity
idea that customerss who buy product X also tend to buy product Y; related to market-based applications
business intelligence (BI) systems
information systems that process operational, social, and other data to identify patterns, relationships, and trends for use by business professionals
2 major functions of BI servers
management and delivery management: maintains metadata about the authorized allocation of BI results to users
all management data needed by any of the BI servers is stored in ___________
metadata
Hadoop
open source program supported by the Apache Foudation that implements MapReduce on potentially thousands of computers; began as part of Cassandra; deep technical skills/experts needed to use this
business intelligence
patterns, relationships, trends, and predictions in a BI system
predictive policing
police departments analyze data on past crimes, including location, date, time, day of week, type of crime, and related data, to predict where crimes are likely to occur; they then station police personnel in the best locations for preventing those crimes
publish results
process of delivering business intelligence to the knowlege workers who need it; last activity in the BI process
expert system shells
programs that process a set of rules; typically, it processes rules until no value changes
online analytical processing (OLAP)
provides the ability to sum, count, average, and perform other simple arithemtic operations on groups of data; has measures and dimension
Push and pull options for static/dynamic reports
pull: same for each of the servers push: vary by server type i.e. email/collaboration is manual, while Web servers/SharePoint may create alerts and RSS feeds to have a server push content when content is created/changed (see "subscriptions")
4 fundamental categories of BI analysi
reporting data mining BigData knowledge management
pull publishing
requires the user to request BI results
expert systems
rule-based systems that encode human knowledge in the form of If/Then rules
decision tree
second typical data mining tool; hierarchical arrangement of criteria that predict a classification or a value; easy to understand and implement using decision rules; work with many types of variables as well as partial data
If/Then rules
statements that specify if a particular condition exixts, then to take some actions
OLAP reports often require....
substantial computing power
decision support systems
synonym for "decision-making BI systems"; not used in the rest of the chapter
RFM analysis
technique readily implemented with basic reporting operations used to analyze andrank customers according to their purchasing patterns; order of importance: 1. recent 2. frequent 3. money (amount) spent
hyper-social knowledge management
the application of social media and related applications for the management and delivery of organizational knowledge resources; provides a framework for understanding KM
measures
the datat item of intrest
curse of dimensionality
the more attributes there are, the easier it is to build a model thata fits the sample data but that is worthless as a predictor
the Singularity
the point at which computer systems become sophisticated enough that they can adapt and create their own software and hence adapt their behavior without human assistance
support
the probability that two items will be purchased together
BI analysis
the process of creating business intelligence
knowledge management (KM)
the process of cretaing value from intellectual capital and sharing that knowledge with employees, managers, suppliers, customers, and others who need that capital; goal is to prevent problems
data acquisition
the process of obtaining, cleansing, organizing, relating, and cataloging source data
lift
the ratio of confidence to the base probability of buying an item
BigData
used to describe data collections that are characterized by huge volume, velocity and variety -at least a petabyte in size -generated rapidly -has structured data, free-form text, log files, graphics, audio, video
subscriptions
user requests for particular BI results on a particular schedule or in response to particualr events i.e. daily sales report
could a value of zero in the analysis stage be problematic?
yes; such problematic data is common in data extracts