Chapter 9 MIS
cluster analysis
Unsupervised data mining using statistical techniques to identify groups of entities that have similar characteristics. A common use for cluster analysis is to find groups of similar customers in data about customer orders and customer demographics.
MapReduce
a technique for harnessing the power of thousands of computers working in parallel a 2 step technique for which massive data sets are analyzed leveraging 1000's of computers workings together by breaking down said data sets into smaller groups than combining the results Map Phase: Google search log broken into thousands of pieces, and hundreds or thousands of independent processors search these pieces for something of interest Reduce Phase: results combined; result is a list of all the terms searched for on a given day and the count of each Written in Java.
"Any" device
computer mobile devices office and other applications cloud services to anything
project management
create a partnership program between PRIDE competitors and local health clubs expand geographically
Acquire Data
first step in BI process obtain, cleanse, organize and relate, catalog
KM benefits
improve process quality and increase team strength
BigData Analysis
involve both reporting and data mining techniques
KM goals
Enable employees to use organization's collective knowledge. BUT: Not many companies can afford
RFM Example
How recently (R) a customer has ordered How frequently (F) a customer ordered How much money (M) the customer has spent
confidence
In market-basket terminology, the probability estimate that two items will be purchased together.
informing
In what ways are clients using the new system? How do sales compare to our sales forecast?
Hadoop
Open-source program supported by Apache Foundation2. Manages thousands of computers. Implements MapReduce. written in Java; can be run from server farms or in the cloud supported by Amazon.com as part of the EC3 cloud Query language entitled Pig technical skills needed to run and use
Components of BI
Operational DBMS, Social data, Purchased data, and employee knowledge connected to business intelligence application, which is connected via business intelligence to knowledge workers
Email or collaboration tool
Report Type:Static, Push Options:manual, Skill level needed: Low
BI Software
A Crowded Space •Epicor •Sisense •SalesForce Analytics Cloud •Birst •QlikView •Looker •Datameer •Board All-in-One •Infor •IQMS •DOMO •Pentaho •Yellowfin •MicroStrategy •TIBCO Plus IBM Oracle Microsoft SAP
clickstream data
Data collected about user behavior and browsing patterns by monitoring users' activities when they visit a Web site.
BI application
RFM application, OLAP, other reports, market basket, decision tree, other data mining, context indexing, RSS feed, expert system
variety
different forms of data
Content Management System (CMS)
information systems that support the management and delivery of documents including reports, web pages, and other expressions of employee knowledge typical users are companies that sell complicated products and want to share their knowledge of those products with employees and customers
Publish results
last activity in BI process the process of delivering business intelligence to the knowledge workers who need it print, web servers, report servers, automation divided into Push Publishing and Pull Publishing can mean placing BI results on servers for publication to knowledge workers over the Internet or other networks or making results available via web service for use by other applications, or creating PDFs or powerpoint presentations for communicating to colleagues or management, or reporting results to management in a team meeting
5 basic reporting operations
1. Sorting 2. Filtering 3. Grouping 4. Calculating 5. Formatting not particularly sophisticated, but can be accomplished using SQL and basic HTML or a simple writing tool used to produce complex and highly useful reports ex. RFM analysis and online analytical processing
disadvantages of expert systems
1. They are difficult and expensive to develop, requiring many labor hours from experts in domain under study and designers of expert systems 2. Difficult to maintain. 3. Unable to live up to their high expectations.
unsupervised data mining
Apply application to data, observe results, and create hypothesis AFTER the analysis (Cluster Analysis, Market-basket Analysis, Decision Tree) cluster analysis findings obtained solely by data analysis
static reports
BI documents that are fixed at the time of creation and do not change; mostly published as PDF documents; little skill needed; just creating content, and the publisher attaches it to an email or puts it on the web or a sharepoint site
dynamic reports
BI documents that are updated at the time they are requested; publishing requires the BI application to access a database or other data source at the time the report is delivered to the user, requiring high skills
Entertainment
BI produced from data on certain habits determines what people actually want Video - Legendary Pictures, "Persuade-ables", ie Godzilla & Analytics
Knowledge Management (KM)
Creating value from intellectual capital and sharing knowledge with those who need that capital, such as with employees, customers, and other partners Preserving organizational memory Scope of KM same as SM in hyper-social organizations enabled employees to better achieve an organization's strategy, solve problems more quickly, and accomplish work in less time and other resources
Defining elements
Data sets are at least a Petabyte in size, and usually larger Data is generated rapidly (constantly and from many sources) Data consists of structured data, free-form text, log files, graphics, audio and video
lift
In market-basket terminology, the ratio of confidence to the base probability of buying an item. Lift shows how much the base probability changes when other products are purchased. If the lift is greater than 1, the change is positive; if it is less than 1, the change is negative
Acquire, Analyze, Publish
Process of obtaining, cleaning, organizing, relating and cataloging source data
BI as a publishing challenge
Publish Process of delivering BI to those who need it Data Visualization
Web Server
Report type: static/dynamic Push options: Alert/RSS Skill level needed:low for static, high for dynamic
SharePoint
Report type: static/dynamic Push options: Alert/RSS, workflow Skill level needed:low for static, high for dynamic
If/Then Rules
Statements that specify that if a particular condition exists, then a particular action should be taken. Used in different ways, by both expert systems and decision tree data mining.
BI publishing alternatives
Static Reports Dynamic Reports *Pull options are the same *Push options vary email or collaboration tool web server sharepoint BI server
BI analysis
The process of creating business intelligence. The four fundamental categories of BI analysis are reporting, data mining, BigData, and knowledge management.
deciding
Which competitions generate the most ad revenue? Develop more of the best competitions Which drones and related equipment are in need of maintenance?
reporting application
a BI application that inputs data from one or more sources and applies reporting operations to that data to produce business intelligence
BI server
a Web server application that is purpose-built for the publishing of business intelligence Report type: dynamic Push options: Alert/RSS, subscription Skill level needed: high
Data Marts
a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business -Subset of data warehouse -Summarized or highly focused portion of firm's data for use by specific population of users -Typically focuses on single subject or line of business divided into data and BI tools for analysis and management ie retail store users obtain data pertaining to a particular business function from the data warehouse, but do not have the data management expertise that data warehouse employees have though they are knowledgeable analysts for a given business function
Preserving organizational memory
capturing and storing lessons learned and best practices of key employees
Using Business Intelligence to find candidate parts
create a team to examine past sales data to determine which part designs can be sold by identifying quality parts and compute how much revenue potential those parts represent obtain an extract of sales data from IS department, then create five criteria for parts that might qualify for this new program
data acquisition
the process of obtaining, cleaning, organizing, relating, and cataloging source data
cross-selling
the sale of related products to customers based on salesperson knowledge, market-basket analysis, or both
business intelligence application
the software component of a BI system analyze data through reporting, data mining, BigData, and Knowledge management divided into BI data source, BI application, BI application result
too many data points
too many rows of data = NOT HELPFUL! In order to meaningfully analyze such data, we need to reduce the amount of data! One good solution to this problem is statistical sampling.
veracity
uncertainty of data
subscriptions
user requests for particular BI results on a particular schedule or in response to particular events
The "V's" of Big Data
volume, variety, velocity, veracity
drill down
with an OLAP report, to further divide the data into more detail
reporting applications
•Create meaningful information from disparate data sources. •Deliver information to user on time.
loan portfolio
a group of loans
velocity
analysis of streaming data
use of BI
project management, problem solving, deciding, informing
Too many attributes
too many columns; can be problematic
Perform Analysis
reporting, data mining, Big Data, and knowledge management
Resistance to Knowledge Sharing
-Employees can be reluctant to exhibit their ignorance out of fear of appearing incompetent, employees may not submit entries to blogs or discussion groups; such reluctance can sometimes be reduced by the attitude and posture of managers one strategy for employees in this situation is to provide private media that can be accessed only by a smaller group of people who have an interest in a specific problem, who discuss the issue in a less-inhibiting form -Employee competition
Hyper-organization theory
-framework for understanding KM -focus shifts from knowledge and content to fostering authentic relationships among knowledge creators and users
Components of a Data Warehouse
-physical storage location for data- the warehouse -software to copy original databases and transfer them to warehouse -interactive software to allow processing of inquiries -a directory for the categories of information kept in the warehouse operational databases, other internal data, external data connect to data extracting/cleaning/preparation programs, which connect to database warehouse (DBMS) (stores prepared data, and extracts and provides data to BI applications), which interact with data warehouse metadata (which stores metadata concerning the data [ie source, format, assumptions and constraints, etc])and data warehouse database and business intelligence tools, and business intelligence tools interact with business intelligence users
Drawbacks
1. Difficult and expensive to develop. •Labor intensive. •Ties up domain experts. 2.Difficult to maintain. •Changes cause unpredictable outcomes. •Constantly need expensive changes. 3.Don't live up to expectations. •Can't duplicate diagnostic abilities of humans
dimension
A characteristic of an OLAP measure. Purchase date, customer type, customer location, and sales region are examples of dimensions.
market basket analysis
A data mining technique for determining sales patterns. A market-basket analysis shows the products that customers tend to buy together. Can estimate the probability that a customer will purchase an item
Online Analytical Processing (OLAP)
A dynamic type of reporting system that provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data. Such reports are dynamic because users can change the format of the reports while viewing them. has measures and dimensions
OLAP cube
A presentation of an OLAP measure with associated dimensions. The reason for this term is that some products show these displays using three axes, like a cube in geometry. Same as OLAP report. data taken from a sample database provided with SQL server, can be displayed in many ways with Excel, format can be altered, can change order of dimensions, drill down into the data, and view data from different perspectives; can come with a cost, including substantial computing power to do necessary calculating, grouping, and sorting for dynamic displays standard commercial DBMS products do have the functions and features required to create OLAP reports but are not designed for such work; are instead designed to provide rapid response to transaction-processing applications
expert system shell
A program in an expert system that processes a set of rules, typically many times, until the values of the variables no longer change, at which point the system reports the results. Process IF side of rules, report values of all variables, knowledge gathered from human experts
RFM analysis
A technique readily implemented with basic reporting operations to analyze and rank customers according to their purchasing patterns. •To produce an RFM score: •Sort customer purchase records by date of most recent (R) purchase. •Divide sorts into quintiles. •Give customers a score of 1 to 5. •Process is repeated for Frequently and Money (amount spent on orders) recency, frequency, monetary
regression analysis
A type of supervised data mining that estimates the values of parameters in a linear equation. Used to determine the relative influence of variables on an outcome and also to predict future values of that outcome.
the Singularity
According to Ray Kurzweil, the point at which computer systems become sophisticated enough that they can create and adapt their own software and hence adapt their behavior without human assistance.
Data Brokers
Acxiom "Database contains information on about 500M consumers...with about 1500 data points per person"
Challenges of Content Management
Databases are huge Content dynamic Documents do not exist in isolation to each other (Documents refer one to another, and when one changes, others must change as well, CMS must maintain linkages among documents so that content dependencies are known and used to maintain document consistency) Contents are perishable (documents become obsolete and need to be altered, removed, or replaced) In many languages
Functions of a data warehouse
Extract data from operational, internal and external databases Cleanse data Organize, relate data warehouse Catalog data using metadata
data warehouse
a facility for managing an organization's BI data; includes data purchased from outside sources, which is not unusual or concerning from a privacy standpoint; distributor; takes data from data manufacturers (operational systems and other sources), cleans and processes the data, and locates the data; data analysts that work there are experts at data management, data cleaning, data transformation, data relationships, etc, but are not usually experts in a given business function
supervised data mining
a form of data mining in which data miners develop a model prior to the analysis and apply statistical techniques to data to estimate values of the parameters of the model data miners develop a model prior to the analysis and apply statistical techniques to data to estimate parameters of the model regression analysis neural networks equation formed created by regression tool, but considerable skill required to interpret the model's quality which depends on statistical factors
decision tree
a hierarchical arrangement of criteria that predict a classification or a value; an unsupervised data model technique; analyst sets up the computer program and provides the data to analyze, and the decision tree program produces the tree can classify loans by likelihood of default; organizations analyze data from past loans to produce a decision tree that can be converted to loan-decision tree that can be converted to loan-decision rules. A financial institution could use such a tree to assess the default risk on a new loan or sell a group of loans to one another or consider purchasing a loan portfolio to use the results of a decision tree program to evaluate the risk of a given portfolio easy to understand and implement using decision rules. Can work with many types of variables, and deal well with partial data. Organizations can use decision trees by themselves or combine them them with other techniques or, in some cases, to select variables that are then used by other types of data mining tools
neural networks
a popular supervised data mining technique used to predict values and make classifications, such as "good prospect" or "poor prospect"
BigData
a term used to describe data collections that are characterized by huge volume, rapid velocity, and great variety that far exceed those of traditional reporting and data mining "A massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques." Defining elements sets are at least a petabyte in size, and usually larger, generated rapidly, has structured data, free-form text, log files, possible graphics, audio, and video
rich directory
an employee directory that includes not only the standard name, email, phone, and address, but also organizational structure and expertise possible to determine where in an organization works, who is the first common manager between the two people, and what past projects and expertise an individual has, and languages spoken for international organizations particularly useful in large organizations where people with particular expertise are unknown
Data Visualization
any effort to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier
Hyper-social KM alternative media
blogs (either public or private, best for defender or belief) discussion groups (including FAQ, either public or private, best for problem solving) wikis( either public or private, best for either) surveys (either public or private, best for problem solving) rich directories (e.g. active directory; private, best for problem solving) standard SM (Facebook, Twitter, etc; public, best for defender of belief) YouTube (public, best for either)
Data Warehouses Versus Data Marts
data producers deliver data to database warehouse (DBMS), which interact with data warehouse metadata and data warehouse database and different types of data marts, which produces design features, analysis, or layout
BI Primary Activities
data sources, acquire data, perform analysis, publish results, use feedback results to get back to data sources or push to knowledge workers, pull from knowledge workers to publish results
push publishing
delivers business intelligence to users without any request from the users; BI results delivered according to schedule or as a result of an event or particular data condition
Possible Problems with Source Data
dirty data (or problematic data), missing values, inconsistent data (ie values difficult to obtain; occur from the nature of business activity), data not integrated, wrong granularity (too fine, not fine enough), too much data (too many attributes, too many data points) although data that is critical for successful operations must be complete and accurate, marginally necessary data need not be
Data Mining Techniques
emerged from the combined discipline of statistics, mathematics, artificial intelligence, and machine learning can be sophisticated or difficult to use well valuable to organizations and some business professionals have become expert in their use unsupervised and supervised
how BI is Used
identifying changes in Purchasing Patterns Entertainment Just-in-Time Medical Reporting
Purchasing patterns
important life events cause custoemrs to change what they buy and, for a short interval, to form new loyalties to new brands Amazon & Predictive Behavior (Many YouTube Videos) Target Video
support
in market-basket terminology, the probability that two items will be purchased together
content management alternatives
in-house custom off-the-shelf public search engine
knowledge workers
individuals valued for their ability to interpret and analyze information include analysts in home office, operations and field personnel who use BI to approve loans, order goods, and decide when to prescribe
Business Intelligence Systems
information systems that process operational, social, and other data to identify patterns, relationships, and trends for use by business professionals and other knowledge workers have five standard components: hardware, software, data, procedures, and people boundaries are blurry
Two Functions of a BI Server
management and delivery BI application provides data to BI server, which interacts with metadata and "any" device through push/pull, which interacts with BI users maintains metadata about the authorized allocation of BI results to users. BI server tracks what results are available, what users are authorized to view those results, and the schedule upon which the results are provided to the authorized users, and adjusts allocations as available results change and users come and go all management data needed by any of the BI servers is stored in metadata. The amount and complexity of such data depends, of course, on the functionality of the BI server BI servers use metadata to determine what results to send to which users and, possibly, on which schedule; expect BI results to be delivered to "any" device
Consumer data that can be purchased
name, address, phone age gender ethnicity religion income education voter registration home ownership vehicles magazine subscriptions hobbies catalog orders marital status, life stage height, weight, hair and eye color spouse name, birth date children's names and birth dates
BI data source
operational data, data warehouse, data mart, content material, human interviews
data sources
operational databases, social data, purchased data, employee knowledge
problem solving
problem is a perceived difference between what is and what ought to be; BI can be used to determine what the problem is as well as what should be how can we save money by rerouting drone flights? how can we increase ad revenue from competitions?
Five Criteria
provided by certain vendors (starting with just a few vendors that had already agreed to make part design files available for sale) purchased by larger customers (individuals and small companies would be unlikely to have 3D printers or the needed expertise to use them) frequently ordered (popular products) ordered in small quantities (3D printing is not suited for mass production) simple in design (easier to 3D print; difficult to evaluate since company doesn't store data on part complexity per se)
Just-in-Time Medical Reporting
provides injection notification services to doctors during exams enter data, software analyzes patient records, and recommends injection prescriptions when needed
Pig
query language platform for large dataset analysis Easy to master. Extensible. Automatically optimizes queries on map-reduce level.
Granularity
refers to the level of detail in the model or the decision-making process; can be too fine or too coarse; too fine data can be made coarser by summing and combining
pull publishing
requires the user to request BI results
Expert Systems
rule-based systems that encode human knowledge in the form of if/then rules created by interviewing human experts in the domain of interest
volume
scale of data
analyze data
second step of BI process; combines data into single table; filters criteria diversely; helps answer certain questions from business intelligence
decision support system
some authors define business intelligence (BI) systems as supporting decision making only, in which case they use this older term as a synonym for decision-making BI systems
business intelligence users
specialists in data analysis
report servers
specialized web servers
Hyper-social knowledge management
the application of social media and related applications for the management and delivery of organizational knowledge resources open airing of product use issues may make traditional marketing personnel uncomfortable, but this KM technique does insert the company in the middle of customer conversations about possible product problems, and, while it does lose control, the organization is at least a party to those conversations
data mining
the application of statistical techniques to find patterns and relationships among data for classification and prediction knowledge discovery in databases
measures
the data item of interest on an OLAP report. It is the item that is to be summed, averaged, or otherwise processed in the OLAP cube. Ex. Total sales, average sales, and average cost
curse of dimensionality
the more attributes there are, the easier it is to build a model that fits the sample data but that is worthless as a predictor
Business Intelligence
the patterns, relationships, and trends identified by BI systems Analytics "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making." used everywhere, especially in the realm of Digital Marketing, where is estimated to grow from $12B in 2014 to $120B by 2026, and will exceed IT Budgets