idsc 4444

Ace your homework & exams now with Quizwiz!

Association Rules mining usually makes use of...

"transactions"datasets • Example 1: All the combination of products bought by different customers • Example 2: All the combinations of symptoms in different patients • Each row of the dataset is a "transaction"

DESCRIPTIVE analytics

(exploratory method, don't know what looking for) - Association rules (traditional, old) - good way to ease into data analysis - Cluster analysis - certain groups that form naturally

PREDICTIVE analytics

(largest class of method in machine learning, know exactly what looking for, clear objective) - Classification - Numeric Prediction

Distance Measures

- To measure similarity between individual data points we will use Distance Measures • There exists different Distance Measures, for different data-types Not comparing distances to each other, pick one and decide if similar or not based off of ONE TYPE

Matching Distance

0 = no 1=1 yes N00 = both zeros, mutual absenses, a and b both have a 0 (In this case 0) N01 = mismatch, A = 0, B = 1 N10 = mismatch, A = 1, B = 0 N11 = mutual presences Intuition: number of mismatches divided by the total number of attributes (k) Used for Symmetric Binary Data, where N00 and N11 are equally important Example: D(A,B) = (2 + 0) / 3 ~ 0.66 -range is always [0,1] The higher, the more distant ("different") the data points How many mismatchs between point / number of attributes in data (k)? Range is always between 0 and 1 Symmetric binary data, Knowing if the individual has the drivers license is as informative of if they don't (Mutuals are as import)

Data Pre-Processing

1. Data Cleaning 2. Data Integration

Applications of Clustering

1. Discover natural groups and patterns in the data. Helps gaining insights • Examples: • Marketing Analytics: Create groups of similar customers (segments) • Finance Analytics: Discover which stocks have similar price fluctuations • Health Analytics: Group patients based on how they respond to treatments 2. Facilitate the analysis of very large datasets • Instead of looking at each individual data point, can look at each cluster and study its features

k-means clustering procedure

1. Example: let us say run we set k = 2. The algorithm picks 2 data-points at random; they will represent the initial cluster centroids 2. Next, it will assign the other data-points to the cluster centroid to which they are closest, based on a specified distance measure (usually, squared Euclidian is used) 3. Then, it will update (re-calculate) the clusters centroids based on the new clusters 4. Repeat step 2: re-assign points to the cluster'scentroid to which they are closest 5. Recalculate the centroids for the updated clusters 6. Keep repeating step 2 until convergence: that is, until the points re-assignment does not change anymore • In particular: further rearranging the points does not improve the within-clusters variance • The algorithm stops when the within-clusters variance is minimized

Desired properties of a cluster

1. High intra-similarity 2. Low inter-similarity

Evaluate the quality and meaningfulness of the clusters obtained

1. Need measure to decide if point is similar or different enough from other 2. Are groups close or not to one other? 3. How am I going to do the clustering? Lots of ways to cluster Hierarchal K-Means 4. Stopping criteria (K-Means) = stop at like 4 clusters 5. Algorithm gives solution Do clusters make sense? Meet objectives/good enough

Dendrogram

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering. One of the main outputs of Hierarchical Clustering A diagram that shows the clusters hierarchy: which points/clusters are merged together, at the different iterations Dendrogram: a diagram that shows clusters hierarchy The length of the "lines"is proportional to the distance among involved clusters Lowest to bottom are more similar to each other than any other point, as they were clustered together first, then you go on to the next lowest to see next similar • We can use the dendrogram to identify clustering solutions that include a chosen number of clusters • Example: We want the best 2-clusters solution • Start from the top, identify the last "bar"(clade) • Identify the two branches of the clade and the clusters they point to

Max-Coordinate Distance

Absolute differences between dimensions and take the largest of the two modules (INSTEAD OF ADDING) Considers the max among the absolute differences

Hierarchical clustering

Agglomerative or bottom-up Clustering • Forming larger clusters from smaller ones • Agglomerative or Bottom-Up (HAC) • Start from the individual data points or smaller clusters, and form larger clusters in a hierarchical manner • We do not need to specify the number of clusters. The algorithm will produce a hierarchy of clusters from which we can choose • We need to specify which distance measure (Euclidian, Manhattan, etc.) to use and which linkage method (single, average, centroid, etc.) to use

Non-comparable data

Attributes that take values in very different ranges (example, age and income have very different ranges)

Scatterplots:

Best for: displaying the relationship between two continuous variables.

What is Business Analytics or Data Science for Business?

Business analytics (BA) refers to the skills, technologies, practices for continuous exploration and investigation of data to extract knowledge, gain insights and drive business decisions and strategies.

Lift = 1

Completely independent, pure chance Customers who buy Bread are AS LIKELY to buy Milk as other customers

Lift > 1

Customers who buy x are ___% more likely to buy y compared to other customers in data Customers who buy diapers are 25% more likely to buy {Milk andBeer} than other customers When Lift > 1 means that customers who buy X are more likely to buy Y

Unsupervised Learning Methods

Data is mined to uncover previously unknown useful patterns without having a clear outcome in mind Make use of unlabeled data: there is no outcome variable to predict or classify. As such, may be harder to establish whether the method performed well or not Examples: What items do customers frequently buy together? Do customers naturally fall into different groups based on their features (age, location,etc..)? Methods we will cover: Association Rules, Cluster Analysis

Mixed-Data

Different types of attributes (numerical, binary, etc.)

Manhattan Distance

Distance of two points is the sum of the absolute difference of their coordinates The absolute (or modules) operator | | transforms any number inside into a positive number Used where rectilinear distance is relevant Go around instead of doing a straight line ABSOLUTE DIFFERENCE Absolute operators, always positive Difference between features w absolute value (module) added together

clustering method

Hierarchical Clustering K-Means Clustering

Descriptive and Predictive Business Analytics

It requires a mixture of skills, ranging from an intersection of math, stats, computer science, communication and business

•An Association Rule describes the relationship between

Item-sets • X → Y This is read: "If X then Y" Example: X = {Coffee}, Y = {Bagel) Association Rule: If X -> Y: "If Coffee Then Bagel" BUT they must be not overlapping item- sets (they do not share any item in common, X intersect Y is null) COFFEE AND MIKE ->> BAGEL Any combination of items, not overlapping.

Partitioning-based clustering

K-Means Clustering (most common approach) • Directly partition the data into K-groups, where k is the number of clusters (pre- specified)

lift

LIFT: A measure of how much more likely two item-sets co-occur than pure chance FORMULAS Support Percentage(X then Y) / Support (X) *Support (Y) OR Confidency (x-->y) / S(y) • Here, we must use the support percentage S() in calculation • S(X) * S(Y) is the probability of seeing X co-occurring with Y by pure chance • So, if the numerator > denominator, the association is More likely than pure chance • Note: Lift has No direction

Min-max normalization

MIN/MAX- 0-1 rescale the attributes to have values between 0 and 1 using the min and max. A point with value X will be normalized to: NewValue = (X -min) / (max -min) Has driving license is already between 0 and 1, so we do not need to apply Min-Max #don't want to be skewed w outliers, can use min/max since not normal values between 0 and 1

Binary Data (0-1 or data with only 2 categories)

Matching Distance Jaccard Distance Still between 0 and 1 Close to 0 = 0 similar, close to 1 = 1 different, same as above

Jaccard Distance

Measure of dissimilarity between observations based on Jaccard's coefficient. Used for Asymmetric Binary Data, where N00 is not as important as N11 Cases in which knowing mutual presences is more important than knowing mutual absences Not symmetric binary (cases where mutual absence is not as important and don't take them into account) Grocery store, transactions, knowing they didn't buy the product isn't important because SO MANY PRODUCTS, and lots of people don't buy both What they do buy? Mismatches? But both zeros do not matter

apiori algorithim process

Once we have found all the item-sets (of any size) that are frequent, that is the item-sets that satisfy support>=minsupp, generate all possible association rules between those frequent item-sets. • Compute the Confidence measure for all the generated association rules and assess whether confidence >= minconfidence • minconfidence is the minimum threshold of confidence acceptable by the data scientist • Similarly to minsupp, minconfidence is set based on domain knowledge and goals

k-means pros and cons

Pros: • Computationally less demanding -time complexity is linear 𝑂(𝑛) • Very scalable Cons: • Poor initialization (initial random centroids) can lead to bad results • Not ideal for clusters that have irregular shapes (or not-convex shapes), noisy data (outliers) or clusters with different densities K-means clustering is not ideal for clusters that have different densities

Hierarchical Clustering pros and cons

Pros: • Flexible, data-driven, no need to pre-specify the number of clusters • Produces a solution for different numbers of clusters • It is good at identifying "small"clusters • It works better at identifying "weirder"clusters Cons: • Computationally demanding - time complexity of most of the hierarchical clustering algorithms is quadratic i.e. 𝑂(𝑛2), where n is the number of data points • As such, it is not very scalable

Knowledge Discovery from Data

Selection: What data should I collect? Preprocessing: Raw data is messy, need to structure the data to do the methods Transformation : way to reshape data for a specific method Data mining: implementing methods Interpretation: does it make sense? Use raw data for new insights

Histograms

Show the distribution of a variable/attribute Best for: distribution of a single continuous variable May be used for: • Distribution assumption checking • Anomaly detection

To measure "similarity" For clusters (groups of points) we use the linkage methods

Single, Complete, Average, Centroid and Ward

Numerical Data • Euclidean Distance

The length of a line segment between the two points "The straight line distance" keep going with different coordinates Square difference between dimensions, sum them, and square root Can keep going with third dimension, fourth, etc. Larger number = different Smaller = more similar 0 = identical to another customer

Ward's Method

The objective of Ward's linkage is to minimize the within-cluster sum of squares (within cluster variance). WARDS = base of K-means - Complicated need to go over again lmao - Minimum variance possible - Two desired properties ○ Close to centroid It is robust to outliers; it tends to create denser clusters, with spherical shapes.

Box-Plot

Used for: • Compare groups • Outlier detection Best for side by side comparisons of subgroups on a single continuous variable. Display distribution of variable (Revenue) for different quarters (4 sub-groups) Q1: the value below which 25% of the values lie Q3: the value below which 75% of the values lie Max = Q3 + 1.5(Q3-Q1). Min = Q1 -1.5(Q3-Q1) Outliers: > Max or < Min Details may differ across software

Types of "similarities" measures

Ways to think about similarity: Based off of distance and based off of context (ANSWER CAN CHANGE WITH CONTEXT) Distance : Based on direct distance, one might assume points B and C are more likely to be in the same cluster. Contextual : Take into consideration the context: points A and B belong to the same cluster. Most clustering techniques deal with these two types of similarity

How to Cluster

When the data is high-dimensional (we have a lot of attributes), we need to: Understand how to measure similarity between individual data-points and clusters

Categorical Data (More than 2 categories)

While there are specific measures that can be used if your data is (all) categorical, a practical approach consists of assigning a number to each category and treat is a numerical variable Brown = 1, Blue = 2, Green = 3

Association Rule: If X -> Y: "If Coffee Then Bagel" ANTECEDENT (BODY)

X

Association rules are directional.

XY can be different from YX Example: X: {Coffee, Muffin} Y: {Bagel} XY : {Coffee, Muffin} {Bagel} IS NOT THE SAME AS YX: {Bagel} {Coffee, Muffin} They are different association rules

Association Rule: If X -> Y: "If Coffee Then Bagel" CONSEQUENT (HEAD)

Y

Linkage Methods

agglomerative methods of hierarchical clustering that cluster objects based on a computation of the distance between them

• To measure "similarity": When data is mixed and in different ranges

apply normalization

A high confidence is a good start in comparing association rules but confidence alone it is not enough. We need to rely on lift as well, to make sure the item-sets

are not associated by pure chance (example: high volume/frequently bought products) Even if confidence (frequncy) is slightly lower, high > 1 tells us that when it happens, it is less likely to be a coincidence.

An observation or data-point in our data can be represented

as a point on a plan as function of the respective dimensions/attributes.

Average Linkage

average pairwise distance between points from two different clusters. Compute ALL the distances between points from the cluster and take the average. - Compute all distances between points, look at average and THAT IS the difference between the two clusters Centroid Distance: - Mid-point of cluster = centroid, mean of cluster, doesn't have to exist, MEAN OF POINTS IN CLUSTER, COMPARE TWO CLUSTER, DISTANCE BETWEEN CENTROID = DISTANCE BETWEEN CLUSTERS, observations, take mean attribute by attribute. - Average Linkage and Centroid Linkage :They tend to produce similar results, with clusters that tend to have spherical shapes.

Text Mining

can be both: extract info from text

Dataset

collection of data

item set

collection of items selected from the set of items in the store CAN CONTAIN ANY COMBINATION OF THE EXISTING ITEMS IN THE DOMAIN

Data Integration

combine different data sources

Lift < 1

customers who buy X are less likely to buy Y than other customers

Low inter-similarity

data points in different clusters should be different "enough" from each other

High intra-similarity

data points in the same cluster should be similar to each other

Centroid Linkage

distance between two clusters centroids, i.e. cluster means. Compute the cluster's centroids, then compute the distance between the centroids. - Average Linkage and Centroid Linkage :They tend to produce similar results, with clusters that tend to have spherical shapes.

summary of measures slides

distances EUCLIDEAN, MANHATTAN, MAX -COORDINATE How to interpret them: the higher the distance, the more different the two points or the lower the distance, the more similar the two points K = num of dimensions

Variable/Attribute/Feature

each column. Each column captures a feature for each observation.

Observation/Data Entry/Record

each row. Different datasets may have different units of observation. For example, each observation (or record) corresponds to a customer.

Support Percentage

fraction of transactions containing both X and Y SUPPORT COUNT/ TOTAL NUMBER OF TRANSACTIONS chance if chosen randomly, you'll choose an order with milk, diapers, and coke 0 empirical = working with data that you have, frequency within given transactions 0 support count and percentage both are not directional number of transactions w x and y/ total num of transactions empirical probability that a transaction selected randomly contains itemsets (can be individual) • S(X) = S({Milk, Diapers}) = 3/5 = 0.6; • S(Y) = S({Coke}) = 2/5 = 0.4

stopping criteria

how many clusters we should have

Association Rules •Algorithms

how measures are used in searching for association rules) Apriori Algorithm

Data Cleaning

identify and correct errors, and/or data inconsistencies • Addressing missing values • Identify and correct data entry errors • Identify and deal with outliers • Correct inconsistencies: • Unify units of measures, international standards

Since clustering is based on the features available in the data (so on the dimensions)

if the data is 3D or less, clustering would be very straightforward In low-dimensional spaces, clusters can even emerge from simple plots

clusters

investigate if naturally group together in an interesting way #states (don't include, no pattern just names, ID VARIABLE) #regions (grouping we already know about, no value add, looking for NEW interesting patterns) Main idea: Organizing data into most natural groups called clusters

Complete Linkage

max pairwise distance between points from two different clusters. compute ALL the distances between points and pick the maximum. - Distance between two clusters is the maximum difference we can find - Edges of clusters - It tends to break large clusters to create clusters with similar diameter.

standardization

mean = 0, std dev = 1 transforms data to have a mean of 0 and a standard deviation of 1. A point with value X will be standardized to: NewValue = (X -Sample mean) / (Sample standard deviation) Important: If using Standardization, we need to transform Binary data as well (AGES ALL) Can be skewed with outliers, needs data to have more of a normal distribution Standard deviation : (Age1 - sample mean)^2 + (Age2 - sample mean)^2 ... Divide by number of ages - 1 Take Square Root Standardization: values with mean 0 and SD 1

centroid

mean point of a cluster, coordinates are represented by mean values of dimensions, doesn't have to exist in dataset

Confidence

measures how often items in Y appear within transactions that contain X, estimate given that you have x, what is the probability that you'll see. CONDITIONAL ON KNOWING THAT YOU HAVE ALL THE ITEMS IN X. • "Given We have one thing, what is the probability that we will see the other" • Estimated conditional probability that a transaction selected randomly includes all the items in Y, given that the transaction includes all items in X • Confidence of X Y does not necessarily equal confidence of Y X -Given X, Y how many times X is (RATIO), if everytime x→ y, then we get 1. Support % X - @ random if u choose a transaction, you have a _____ % of milk and beer Support % X and Y - @ random, if u choose a transaction, you have a _____% of milk, beer, bagels X and Y together / number of just X - Confidence

Single Linkage

min pairwise distance between points from two different clusters. compute ALL the distances between points and pick the minimum one. - EX: have two clusters, combine, keep separate? - Using individual data point methods from before on all to find differences between 2 points, pick minimum one Single Linkage Sensitive to noise and outliers but good for "weird"shapes.

Transaction

n instance of an item-set (a given combination of items) :: when you go to the grocery store and buy a TRANSACTION, other contexts like SYMPTOMS, each transaction is a collection of symptoms at a given time • Multiple transactions comprise a dataset

support count

raw count of transactions containing itemsets of interest, denoted as supp(X --> Y) how often items appear in data, how many have MILK AND DIAPERS Not directional

APIORI ALGORITHIM key idea: if an item-set X is NOT frequent

then any larger item-sets containing X cannot be frequent

To measure "similarity": For individual data points,

we use distance measures: Euclidian, Manhattan, Max-coordinate for Numerical Data Matching and Jaccard for binary data

Data Normalization

we will need to transform our attributes so they take values from a common range (wide range of all data) objective is to eliminate specific units of measurement and transforms the attributes to a common scale the term normalization is somewhat used to refer to any method that can be used to scale attributes technically, normalization means rescaling the attribute to have a value between 0 and 1 Once your data is normalized, you can apply one of the numerical distance measures

Within-Cluster Variance Or Within Sum of Squared Errors (WSS)

• A measure of how cohesive the cluster are, within • More specifically, the WSS measures how "close"the points within a cluster are to their centroid • The lower the WSS, the closer the points within a cluster to its centroid, the more the cluster is cohesive within High Intra-Similarity Within-cluster variance is the sum of squared pairwise distances between the cluster centroid and cluster points. When comparing two clusters, the method compares the within-cluster variance obtained if the two clusters were merged into one to the (sum) of the within-cluster variances for each, if the two were separated Assessing Cluster's "Quality" For each data-point in the cluster (x), the error is defined as the (Euclidian) distance to its own cluster center: d(x, m ) . Sum the squared errors for all the data points in a cluster and get the WSS for the cluster Repeat for every cluster Sum the results for each cluster to get the TWSS The lower the (T)WSS , the more the clusters are cohesive within == High Intra-Similarity

How to Find Association Rules

• All the measures introduced before serve to assess how meaningful an association rule is • But how do we generate all the rules that would be "good"candidates? • In an ideal world: look at all possible combinations of items in your dataset • Problem: With a large number of items, there could be a huge (exponential) amount of item-sets to consider • In fact, with N items, there will be 2^(N -1) potential item-sets • Example: consider a dataset of 20 items. There would be 2^19 different item-sets to consider! • Solution: only consider combinations of items that occur with higher frequency in the dataset frequent item-sets

Choosing the Number of Clusters

• Clustering is exploratory in nature, there is no "right"number of clusters -BUT there are bad solutions and good solutions • It depends on needs and it is subjective • May be informed by business goal and domain knowledge • Nevertheless, there are some tools we can use to make a "reasonable"decision • With hierarchical clustering, the Dendrogram can help • Sometimes clusters may emerge "naturally" by looking at the dendrogram • Analyze the candidate clustering solution by looking at the features that characterize each cluster

Real Association or Coincidence?

• Consider the following situation: in a supermarket, 90% of all customers buy bread, and 95% of all customers buy milk. • By pure chance, 85% (0.9*0.95) of customers buy bread and milk (simply because those are basic grocery items used frequently by households) high volume product, people buy frequently regardless, not associated • As such, the association rule Bread Milk may have strong confidence even if there is no real association between them • We need a metric to assess that: Lift

Between Sum of Squared Errors (BSS): Assessing Cluster's "Quality"

• Define m* as the centroid of the dataset • For each cluster, compute the (Euclidian) distance of the cluster's centroid from m*, square it and weight it by ni, the number of points in each cluster • Sum results for all the clusters and obtain the BSS The higher the BSS means, the larger the distance of each cluster centroid from the data centroid, the more the clusters are separated from each other == Low Inter-Similarity

Association Rule Mining

• Discovering interesting relationships among items/events/variables • Find out which items predict the occurrence of other items • Also known as Affinity Analysis or Market Basket Analysis because it was born in Marketing Analytics • Used to find out which products tend to be purchased together • Other example: • Healthcare, analyze which symptoms and illnesses manifest together BEER AND DIAPERS

Elbow Plot

• Example, varying k from 1 to 10. For each k, compute the Total Within Sum of Squared Errors • Plot the WSS as function of the number of clusters k • Pick the number of clusters that corresponds to the "Elbow point": point where the curve bends Choosing the Number of Clusters • While the Elbow plot seen before mostly relies on the WSS, we can also look at the BSS • The higher BSS, the lower the inter-similarity • We can implement the clustering algorithm (e.g., k-means) for different values of k., where k is how many clusters we want. • Example, varying k from 1 to 10. For each k, compute WSS and BSS • Plot the WSS and BSS as function of the number of clusters k • While the Elbow plot seen before mostly relies on the WSS, we can also look at the BSS • The higher BSS, the lower the inter-similarity • We can implement the clustering algorithm (e.g., k-means) for different values of k., where k is how many clusters we want. • Example, varying k from 1 to 10. For each k, compute WSS and BSS • Plot the WSS and BSS as function of the number of clusters k

a lift < 1 does give an indication of a "negative"association, is it as informative as a lift > 1?

• Example. Think of the healthcare context. We find that two symptoms are LESS likely to occur together (lift < 1). Is this more important than knowing which symptoms DO occur together?

Hierarchical clustering procedure

• First, we need to compute the distance between points based on the distance measure of choice, and create a Distance Matrix • Distance Matrix: square matrix containing the distances, taken pairwise, between the data-points included in your data • The distance matrix will be fed to the algorithm and used to decide which points to cluster together The algorithm considers each data-point individually, as its own 1-point cluster It merges the two 1-point clusters that are nearest to each other (based on the distance matrix) and forms a new cluster It continues by merging the next two points or clusters (of any size) closer to each other (based on distance measure and linkage method selected) It repeats the process until there is only 1 cluster left (all data points assigned to one big cluster)

Data Visualization

• Graphical representation of information and data • Used before, while and after implementing Data-Mining methods • Descriptive graphs/plots, used to display general data patterns, summary statistics, usually before applying data-mining methods • Common plots include: histograms, box-plots, scatterplots, etc.. • Specific graphs produced to visualize intermediate and final results generated by data-mining methods

"desirable" Clusters should have

• High intra-similarity (The lower WSS, the more the clusters are cohesive within) • Low inter-similarity (The higher the BSS means, the larger the distance of each cluster centroid from the data centroid, the more the clusters are separated from each other)

steps to find association rules

• How frequently should an item-set appear, to be considered a frequent item-set? • We (the data scientists) need to specify a minimum threshold. • Specify the minimum support (minsupp) • frequent item-sets are the item-sets for which support >= minsupp • How to decide on the minsupp? Based on domain knowledge or business goals

Why do we need algorithms?

• If the data has 3 Dimensions or less, clustering would be very straightforward • Observation/Data-point: each row, observation. • Variable/Feature/Dimension: Each column that captures a feature (or a dimension) • A 2D dataset is a dataset with 2 dimensions, so 2 features; a 3D dataset has 3 dimensions, and so on. This has 3 dimensions, clustering = straightforward, anymore = complex

How to Deal with Some of the Issues of K-Means

• Pre-process the data: • Deal with missing values • Remove outliers Reduce dispersion of data points by removing atypical data • Normalize the data Preserve differences in data while dampening the effect of variables with wider ranges How to choose initial random centroids: While there is no one best way to choose the initial points, usually data scientists perform multiple runs: repeat clustering using different random points and see how the result changes

Take Action in regards to apiori algorithim, with a Caveat

• Remember: Association Rules are exploratory in nature. • They provide some initial directions to work on. • Setting specific business strategies requires domain expertise and more careful analysis and testing

Apriori Algorithm definition

• Still, if we have a large dataset with many items, checking whether each item-set is frequent, that is checking that support >= minsupp, can take forever • Many techniques have been proposed to reduce the computational burden • The classic algorithm is the Apriori Algorithm

Association Rules •Measures (commonly computed metrics)

• Support Count • Support Percentage • Confidence • Lift

Data Transformation

• Transform, reduce or discretize your data -strongly depends on the context and type of analysis • Examples: • Normalization or Standardization • Scale values to a common range • Attribute/Variable Construction • Calculate new attribute or variable based on other observed attributes

Cluster analysis is an exploratory tool

• Useful when it produces meaningful clusters • Be aware of chance results: data may not have definite "real"clusters • An "optimal" or "best" clustering solution is not guaranteed - BUT there are good solutions and bad solutions

N-Dimensional Spaces

• We usually have to deal with N-Dimensional Spaces! • In other words, our dataset will likely have more than 3 dimensions (attributes) • Generally, a dataset of N columns-features and M rows (records) can be considered as having M observations of N dimensions (M times N Space)

assess k-means clusters

• Within Sum of Squared Errors (WSS) • Between Sum of Squared Errors (BSS)

Once Confident in strong association rule, Take Action

• You find an association rule {beer}-> {diaper} and conclude it is strong enough. Now what? Possible Marketing Actions • Put diapers next to beer in your store • Or, put diapers away from beer in your store • Bundle beer and diapers in a "new parent coping kit" • Lower the price of diapers, raise it on beer


Related study sets

Multiplication Set 12 (3x5, 3x6, 3x7, 3x8)

View Set

Financial Accounting Midterm review

View Set

патфіз тести модуль 1

View Set

AP Government Chapter 4: Civil Liberties

View Set

Introduction to Philosophy FINAL!!!!!!!

View Set

Ch 8 commercial Property Insurance

View Set

Organic Chemistry Distillation Quiz (Lab 1)

View Set

Gen Bio II: Ch. 35 Connect LS questions

View Set