MIS 101 chapter 9
a phenomenon called the curse of dimensionality
, the more attributes there are, the easier it is to build a model that fits the sample data but that is worthless as a predictor.
o RFM analysis
: is a technique readily implemented with basic reporting operations, is used to analyze and rank customers according to their purchasing patterns.
The three primary activities in the BI process
Acquire data perform analysis publish results
cross-selling opportunity
In marketing transactions, the fact that customers who buy product X also buy product Y creates a
Granularity:
a term that refers to the level of detail represented by the data.
o Neural networks
are another popular supervised data mining application used to predict values and make classification such as "good prospect" or "poor prospect" customers.
o Content Management systems (CMS
are information's system that support the management and delivery of document including reports, Web pages, and other expression of employee knowledge.
The four fundamental categories of BI
are reporting, data mining, Big Data, and knowledge management.
o Expert system
are rule-based system that encode human knowledge in the form of IF/THEN rules
o Push publishing
delivers business intelligence to users without any request from the users; the BI results are delivered according to a schedule or as a result of an event or particular data condition.
• Reporting application
is a BI application that inputs data from one or more sources and applies reporting operations to that data to produce business intelligence.
o Dimension
is a characteristic of a measure
data warehouse
is a facility for managing an organization BI data.
o Rich directory:
is an employee directory that includes not only the standard name, email, phone and address, but also organizational structure and expertise.
Confidence
is conditional probability estimate
o Cluster analysis
is one common unsupervised technique, with it, statistical techniques identify groups of entities that have similar characteristics.
• Data mining
is the application of statistical techniques to find patterns and relationships among data for classification and prediction
o Measure
is the data item of interest. It is the item that is to be summed or averaged or otherwise processed in the OLAP report
Support
is the probability that two items will be purchased together
BI analysis:
is the process of creating business intelligence.
Publish results
is the process of delivering business intelligence to the knowledge workers who need it.
data acquisition
is the process of obtaining, cleaning, organizing, relating and cataloging source data
>Metadata
its source, its format, its assumptions and constraints, and other facts about the data
o Drill down
means to further divide the data into more detail.
reduce phase
o As the processors finish, their results are combined in what is referred to as the
• MapReduce
o Is a technique for harnessing the power of thousands of computers working in parallel.
hadoop
o Is an opensource program supported by the apache foundation that implements MapReduce on potentially thousands of
data warehouse
o Obtain data o Cleanse data o Organize and relate data catalog
o Hyper-organization theory
provided a framework for understanding this new direction in KM
o OLAP
provides the ability to sum, count, average and perform other simple arithmetic operations on groups data.
o One such analysis which measures the effect of a set of variables on another variable, is called a
regression analysis.
o Pull publishing:
requires the user to request BI results.
o Expert system shells
the programs that process a set of rules
Lift
the ratio of confidence to the base probability of buying an item, shows how much the base probability increases of decreases when other products are purchased
data warehouse
• Large organization typically create and staff a group of people who manage and run