BCIS 3610 CHAPTER 9
Two Sigma
- Analyzes financial statements, developing news, Twitter activity, weather reports, other sources. - Develops and tests investment strategies
Problems with operational data
-Dirty data(problematic) -Wrong granularity (too fine, not fine enough) -Too much data (too many attributes, too many data points) -missing value -inconsistent data -data not integrated
Three primary activities in the BI process
-acquire data, obtain, cleanse, organize&relate, catalog -perform analysis, reporting, data mining, big data, knowledge management -publish results print, web servers, report servers, automation
expert system suffer three major disadvantage
-diff. and expensive to develop -diff. to maintain -unable to live up to the high expectation set by their name
two human factors inhibit knowledge sharing in org
-employees can be reluctant -employee competition
what are typical applications
-identifying changes in purchasing patterns -BI for entertainment -Just-in time medical reporting
Three Common Alternatives for Content Management Applications.
-in-house custom -off-the-shelf -public search engine
what are the challenges of content management?
-most content databases are huge -CMS content is dynamic -doc. do not exist in isolation from each other -doc. contents are perishable -is provided in many languages
five criteria for parts that might qualify for the new program
-provided by certain vendor -purchased by larger customers -frequently ordered -ordered in small quantities -simple in design
reporting applications produce BI using five basic operations
-sorting -filtering -grouping -calculating -formating
third-party cookies
A cookie that was created by a third party that is different from the primary Web site. doubleclick
OLAP Cube (OLAP Report)
A presentation of an OLAP measure with associated dimensions. The reason for this term is that some products show these displays using three axes, like a cube in geometry. Same as OLAP report. three ac=xes
RFM analysis
A technique readily implemented with basic reporting operations to analyze and rank customers according to their purchasing patterns. it considers how recently (R) s customer has ordered, how frequently (F) a customer ordered, and how much money (M) the customer has spent.
unsupervised data mining
Analysts do not create a model or hypothesis before running the analysis. instead, they apply a data mining application on the data and observe the result
static reports
BI documents that are fixed at the time of creation and do not change
dynamic reports
BI documents that are updated at the time they are requested
the term BI users is diff. from knowledge worker
BI users are generally specialists in data analysis, whereas knowledge workers are often nonspecialist users
Semantic security
Concerns the unintended release of protected data through the release of a combination of reports or documents that are not protected independently.
clickstream data
Data collected about user behavior and browsing patterns by monitoring users' activities when they visit a Web site.
pig
Hadoop's query language
confidence
In market-basket terminology, the probability estimate that two items will be purchased together.
Hyper-social knowledge management
Is the application of social media and related applications for the management and delivery of organizational knowledge resources.
Knowledge Management (KM)
The process of creating value from intellectual capital and sharing that knowledge with employees, managers, suppliers, customers, and others who need it.
subscription
a BI server extends alert/RSS functionality to support user _________, which are user requests for particular BI results on a particular schedule or in response to particular event.
BI server
a Web server application that is purpose-built for the publishing of business intelligence
data mart
a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business
data warehouse
a facility for managing an organization's BI data large organization
MapReduce
a technique for harnessing the power of thousands of computers working in parallel
BigData
a term used to describe data collections that are characterized by huge volume, rapid velocity, and great variety
rich directory
an employee directory that includes not only the standard name, email, phone, and address, but also organizational structure and expertise
Hadoop
an open source program supported by the Apache Foundation that manages thousands of computers and that implements MapReduce
market basket analysis
an unsupervised data mining technique for determining sales patterns. products that customers tend to buy together
Content Management System (CMS)
are information systems that support the management and delivery of documents including report, web page, and other expenses of employee knowledge
dimension
characteristic of a measure purchase date, type.....
supervised data mining
data miners develop a model prior to the analysis and apply statistical techniques to data to estimate parameters of the model
push publishing
delivers business intelligence to users without any request from the users
KM benefits organizations in two fund. ways
improve process quality increase team strength
Statistical Sampling
in order to meaningfully analyze such data we need to reduce the amount of data. one good solution to this problem is ____________.
cross-selling
in the marketing transactions, the method of selling the customer additional related products tied to one name
Business Intelligence (BI) system
information systems that process operational, social, and other data to identify patterns, relationship, and trends for use by business professionals and other knowledge workers.
reporting application
is a BI application that inputs data from one or more sources and applies reporting operations to that data to produce business intelligence two important reporting application: -RFM analyze -OLAP
Decison Tree
is a hierarchical arrangement of criteria that predict a classification or value. unsupervised data mining technique: the analyst sets up the computer program and provides the data to analyze, and the decision tree program produce the tree
Data triangulation
is also used for semantic phenomenon
cookie
is data that a web site stores on your computer to record sth about its interaction with you
measure
is the data item of interest total sale ...
support
is the probability that two items will be purchased together
publish results
is the process of delivering business intelligence to the knowledge workers who need it
data acquisition
is the process of obtaining, cleaning, organizing, relating, and cataloging source data
BI server provide two major functions
management and delivery
two typical data mining tools
market-basket analysis decision tree
regression analysis
measures the impact of a set of variables on another variable
Functions of a Data Warehouse
obtain data, cleanse data, organize and relate data, catalog data
cluster analysis
one common unsupervised technique is _________. statistical techniques identify groups of entities that have similar characteristics
data sources
operational databases, social data, purchased data, employee knowledge
data aggregators
or data broker is a company that acquires and purchases consumer and other data from public records, retailers, internet cookie vender ..... and use it to create business intelligence that it sells tp companies and the gov.
Online Analytical Processing (OLAP)
provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data it is dynamic. the viewer of the report can change the report's format, hence the term online. an OLAP has measures and dimension
Granularity
refers to the level of detail in the model or the decision-making process
pull publishing
requires the user to request BI results
Expert Systems
rule-based systems that encode human knowledge in the form of if/then rules
Decision Support System (DSS)
some authors define BI system as supporting decsion making only, they use older term __________ as a synonym for decision-making BI system
Data Mining (knowledge discovery in databases-KDD)
the application of statistical techniques to find patterns and relationships among data for classification and prediction
Expert Systems
the earliest KM system, called _________, attempted to directly capture employee expertise
prevent the kinds of problems
the goal of KM
curse of dimensionality
the more attributes there are, the easier it is to build a model that fits the sample data but that is worthless as a predictor
Business Intelligence
the patterns, relationships, and trends identified by BI systems are BI
BI analysis
the process of creating business intelligence
expert system shell
the programs that process a set of rules are called _____.
lift
the ratio of confidence to the base probability of buying an item is call ____ it shows how much the base probability increase or decrease when other products are purchased
BI application
the software component of a BI system analyze data -reporting -data mining -bigdata -knowledge management
neutral networks
type of supervised data mining, predicts values and makes classifications such as "good prospect" and "poor prospect"
data mining fall into two broad categories:
unsupervised supervised
the singularity
which is the point at which computer system become sophisticated enough that can adapt and create their own software and hence adapt their behavior without human assistance
drill down
with an OLAP report, it is possible to_____ into the data. this term means to the further divide the data into more detail