Chapter Twelve: Lecture Notes
direct data mining
(also called supervised, predictive or targeted data mining) has the goal of predicting some future event or value. The analyst uses input data to predict a specified output. Directed data mining stresses classification, prediction and estimation
undirected data mining
(dor unsupervised) data mining is simply exploration of a dataset to see what can be learned. It is about discovering new patterns in the data. The analyst isn't trying to predict or estimate some output. Undirected data mining uses clustering and affinity-grouping techniques
SERVQUAL's latent variables
(revealed by factor analysis) *reliability:* - responsiveness - competence - access - courtesy *communication:* - credibility - security - understanding/ knowing the customer - tangibles
analytics for CRM strategy and tactics
*build revenues:* - cross-sell campaigns - up-sell campaigns - protect valued relationships - generate sales leads - acquire new customers - close more opportunities *reduce costs:* - automate selling processes - service customers online - improve customer self-service - sack unprofitable customers - improve sales rep productivity - improve data quality *enhance loyalty/satisfaction:* - enhance complaints resolution - improve customer service - improve fulfilment process - improve online experiences - improve value proposition - introduce CSat measures
data mining procedures
*directed data mining techniques:* - decision trees - logistic regression - multiple regression - discriminant analysis - neural networks *undirected data mining techniques:* - hierarchical clustering - k-means clustering - two-step clustering - factor analysis
criteria used in prospect scoring
*market:* - size - growth - segmentation - new entrants - number of competitors *organizational:* - revenues - profits - spending on category - certifications - social network participation *personal:* - seniority - decision role - budget owner - influence - years' experience *relational:* - ex-customer - lost opportunity - lead source -website or ad - referral? - new to database? *behavioral:* - website visitor? - registrations? - contracted to current supplier? - video viewed? - research participant?
star schema example
*time dimension:* - order data - year - quarter - month *customer dimension:* - name - address - city - zip *product dimension:* - name - category - price *employee dimension:* - name - supervisor - department - region - territory (middle of the other four) *fact table:* - total - quantity - freight - discount
3Vs of big data
*volume:* - petabytes - records - transactions - tables, files *variety:* - structured - unstructured - semi-structured *velocity:* - batch - real time - streaming
analytics for structured data: introduction
- CRM analytics for structured data are well developed - As questions become more complex and shift from description to explanation or prediction, the analytical procedures required to generate answers also become more complex - OLAP queries allow CRM users to drill down into the reasons why a particular piece of data is as it is - Data mining tools draw on a well-established array of statistical procedures, such as correlation, regression, decision-tree and clustering routines
standard reports
- Can be either pre-defined, or query-based (ad hoc) - Standardized reports are typically integrated into CRM software applications, but often need customization -- Some customization of the report can be done when it is run, for example in selecting options or filtering criteria, but the end result is limited to what the report designers envisaged - Visualization tools include tables, charts, graphs, plots, maps, dashboards, hierarchies and networks
what data mining analytics do
- Classification - Estimation - Prediction - Affinity grouping - Clustering - Description and visualization
how text analytics supports CRM
- Improving the accuracy of the predictive models - Automatic routing - Root cause analysis - Trend analysis - Sentiment analysis
how analytics support strategic CRM
- Strategic CRM focuses on the development of a customer-centric business dedicated to winning and keeping (potentially) profitable customers by creating and delivering value better than competitors cost-effectively - Analysis of customer-related data can help answer crucial strategic CRM questions such as: -- which customers should we serve? -- What is our share of customer spending on our category? -- What do our customers think and feel about their experience of doing business with us? - Operational CRM involves deployment of automated solutions in the sales, marketing and service areas - Analysis of customer-related data can help answer crucial operational CRM questions such as: -- which channels should we use to communicate with our customers? -- What offers should we make, and when should we make them? -- How does our sales performance differ across territories and product ranges, and how can we fix any problems? -- How well do we manage our opportunity pipeline? -- How satisfied are customers with the service we provide and what can be done to improve it?
decision trees
- The graphical model output of decision tree analysis has the appearance of an inverted root and branch structure - Decision trees work through a process called recursive partitioning - The decision tree algorithm progressively partitions the dataset into groups according to a decision rule that aims to maximize homogeneity or purity of the response variable in each of the obtained groups
analytics for unstructured data
- Unstructured data do not fit a pre-defined data model Includes textual and non-textual files such as: -spreadsheets - documents - PDFs - handwritten notes - image - audio - video - multimedia data - Unstructured data often reside outside the business in social media data repositories, which can be huge, hence the term 'Big Data' - Analytics for these types of data are still evolving
next best offer (NBO)
- a subset of NBA - Early groundwork for NBO was laid by Amazon.com - Today's modelling is based on more complex, context-sensitive, predictive analytics that enable the right offer to be made at the right time and in the right channel - The tools that support NBO are known as recommendation engines - Dynamic NBOs are made to customers in real time as they interact at a business's touchpoints
OECD privacy principles (8)
1. Collection Limitation Principle 2. Data Quality Principle 3. Purpose Specification Principle 4. Use Limitation Principle 5. Security Safeguards Principle 6. Openness Principle 7. Individual Participation Principle 8. Accountability Principle
technology essentials for analyzing big data
1. Hadoop, an open-source framework or computing environment, distributes data across a large number of computers, each of which processes a portion of the data 2. Open-source analytics applications 3. Commercial software-solution vendors add further management and decision support tools, frameworks and solutions
safe harbor principles (7)
1. Notice 2. Choice 3. Onward Transfer (Transfers to Third Parties) 4. Access 5. Security 6. Data integrity 7. Enforcement
3 ways to generate analytical insight
1. Standard reports 2. Online analytical processing (OLAP) 3. Data mining
why is this important?
Analytical procedures differ according to the type of data: - Categorical data use nonparametric procedures such as logistic regression. Continuous data use parametric procedures such as linear regression - Methods that are used to correlate sets of ordinal data differ from those used to correlate interval data
types of structured data kept in relational databases
Categorical data, also known as discrete data, are data about entities that can be sorted into groups or categories: - Unordered categorical data are nominal data - Ordered categorical data are ordinal data Continuous data are data that can take on any value within a finite or infinite range: - Interval data are measured along a continuum that has no fixed and non-arbitrary zero point - Ratio data are also interval data but with the added attributes of a fixed data point 0 (zero)
how analytics are used during the customer lifecycle
Customer acquisition: Lead scoring might take account of a wide range of market, organizational, personal, relational and behavioural attributes Customer retention: - Identify which customers have highest future potential CLV Customer development: Identify the next best offer to make the customer
online analytical processing (OLAP)
OLAP technologies allow data stored in a data mart to be subjected to analysis using processes such as: - slice-and-dice - drill-down and roll-up - OLAP data are stored in one or more star schema - A star schema separates data into facts and dimensions -- Facts are quantitative data such as sales revenues and sales volumes -- These facts have related dimensions. Dimensions are the ways in which facts can be disaggregated and analysed
World Wide Web Consortium (W3C)
Privacy Interest Group charter is to 'improve the support of privacy in Web standards by monitoring ongoing privacy issues that affect the Web, investigating potential areas for new privacy work, and providing guidelines and advice for addressing privacy in standards development' The group notes that the evolution of Web technologies has increased collection, processing and publication of personal data
factor analysis
a data reduction procedure - It does this by identifying underlying unobservable (latent) variables that are reflected in the observed variables (manifest variables)
discriminant analysis (DA)
clusters observations into two or more classes - can be used to find out which variables contribute most to explaining the difference between groups
two-step clustering
combines predetermined and hierarchical clustering processes - At step one, records are assigned to a predetermined number of clusters (alternatively you can allow the algorithm to determine the number of clusters) - At step two, each of these clusters is treated as a single case and the records within each cluster subjected to hierarchical clustering - Works well with large datasets
basic data configuration for CRM analytics
data warehouse: - CRM data - ERP data - telephony data - social media data data warehouse > CRM analytics > reporting, OLAP analysis, data mining > person data sources > data storage > data analytics > outputs > data users (pg 2 of slideshow)
next best action (NBA)
erges customer insight (predictive analytics particularly) and context to deliver recommendations for action -- Company desired actions include up-sell or cross-sell Context determines whether or not an offer should be made, what that offer should be and even when it should be made -- A customer with an unresolved complaint should receive an outbound customer service call to establish what the customer expects by way of complaint resolution, not an offer
text analytics
extracts relevant information from unstructured text files, and transforms it into structured information that can then be leveraged in various ways Unstructured textual data are found in: - call centre agent notes - emails - documents on the Web - instant messages - blogs - tweets - customer comments - customer reviews - questionnaire free-response boxes - social media posts - transcripts of telephone calls and interviews and so on
neural networks
fit a model to existing data for classification, estimation and prediction purposes - Neural networks' foundations are machine learning and artificial intelligence - Neural networks can produce excellent predictions from large, complex and imperfect datasets containing hundreds of potentially interactive predictor variables
logistic regression
measures the influence of one or more independent variables that are usually continuous (interval or ratio data) on a categorical dependent variable (nominal or ordinal data)
social media sentiment analysis
picture on pg 3 of slideshow
standard report: active accounts of a sales rep
picture on pg 5 of slideshow
analytical CRM (review)
process through which organizations transform customer-related data into actionable insight for either strategic or tactical purposes
dendrogram output from hierarchical clustering routine
see pg 7 of slideshow for diagram
hierarchical clustering
the 'mother of all clustering models' - It works by assuming each record is a cluster of one and gradually groups records together until there is one super-cluster comprising all records
data mining
the application of descriptive and predictive analytics to large datasets to support the marketing, sales and service functions
K-means clustering
the most widely used form of clustering routine - It works by clustering the records into a predetermined number of clusters. The predetermined number is 'k' - The reference to 'means' refers to the use of averages in the computation -- In this case it refers to the average location of the members of a particular cluster in n-dimensional space, where n is the number of fields that are considered in the clustering routine (see pg 7 for k-means clustering output picture)
multiple regression
uses two or more predictor variables to predict a dependent variable. The dependent variable must be a continuous (interval or ratio) variable