Rec Systems
NMF is a 'best in class' option for many recommendation problems:
- Includes overall, user, & item bias as well as latent factor interactions - Can fit via SGD or ALS - No need to impute missing ratings - Use regularization to avoid over-fitting - Can handle time dynamics, e.g., changes in user preferences - Used by winning entry in Netflix challenge
Collaborative challenges
- cold start - echo chambers - shilling attacks
content pros cons
- no other users' data - requires context about items
Types of data
Data can be: Explicit: - User provided ratings (1 to 5 stars) - User like/non-like Implicit: - Infer user-item relationships from behavior - More common - Example: buy/not-buy; view/not-view To convert implicit to explicit, create a matrix of 1s (yes) and 0s (no)
Ranking
Candidate generator -> retrieves top-k -> embedding space through matrix factorization -> knn use normalized discounted cumulative gain
The cold-start problem
Cold-start problem: - Need utility matrix to recommend - Can ask users to rate itemsInfer ratings from behavior, e.g., viewing an item Must also handle new users and new items Approaches: - Use ensemble of (bad) recommenders until you have enough ratings - Use content-based recommender - Exploit implicit, tag, and other side data - Use ItemSimilarityModel until you have enough rating data - recommend popular items - presenting random items to sub-groups that might have interest
Content-based recommendations
Compute (cosine) distance between user profile and item profiles May want to bucket items first using random-hyperplane and locality-sensitivity-hashing (LSH) ML approach: - Use random forest or equivalent to predict on a per-user basis - Computationally intensive -- usually only feasible for small problems
Collaborative filtering recipe
Compute predictions by similarity: - Normalize (demean) utility matrix - Compute similarity of users or items - Predict ratings for unrated items - Add prediction to average rating of user/item Note: - Precompute utility matrix for each user -- it is relatively stable - Only compute predictions at runtime
Item Profile
Consists of (feature, value) pairs Consider setting feature to 0 or 1 Consider how to scale non-Boolean features
Choosing a similarity measure
Cosine: - Use for ratings (non-Boolean) data - Treat missing ratings as 0 - Cosine + de-meaned data is the same as Pearson Jaccard: - Use only Boolean (e.g., buy/not buy) data - Loses information with ratings data Then compute similarity matrix of pair-wise similarities between items (users)
Problem of retraining for each new user
Deep Learning Extension
user profile
Describes user preferences (utility matrix) Consider how to aggregate item features per user: - Compute "weight" a user puts on each feature - E.g., "Julia Roberts" feature = average rating for films with "Julia Roberts" Normalize: subtract average utility per user - E.g., "Julia Roberts" feature = average rating for films with "Julia Roberts" - average rating
Evaluation issues
Historically, used RMSE or MAE But, only care about predicting top 𝑛 items - Should you compute metric over all missing ratings in test set? - No need to predict items undesirable items well Precision at n: percentage of top 𝑛n predicted ratings that are 'relevant' Recall at n: percentage of relevant items in top 𝑛n predictions Lift or hit rate are more relevant to business Performance of recommender should be viewed in context of user experience (UX) ⇒ run A/B test on entire system Cross validation is hard: - What do you use for labels because of missing data? - Users choose to rate only some items ⇒ selection bias - Not clear how to fix this bias, which is always present Beware of local optima ⇒ use multiple starts
Cross-validation
Randomly sample ratings to use in training set Split on users Be careful if you split temporally Do not split on items
To get best performance with NMF:
Model bias (overall, user, and item) Model time dynamics, such as changes in user preferences Add side or implicit information to handle cold-start
NMF
Non-negative matrix factorization
Building a production recommender is also challenging:
Part of entire UX Should consider: - Diversity of recommendations - Privacy of personal information - Security against attacks on recommender - Social effects - Provide explanations
Collaborative filtering using matrix factorization
Predict ratings from latent factors: Compute latent factors 𝑞𝑖qi and 𝑝𝑢pu via matrix factorization Latent factors are unobserved user or item attributes: - Describe some user or item conceptAffect behavior - Example: escapist vs. serious, male vs. female films - Predict rating: $\hat{r}_{ui} = q_i^T p_u$ - Assumes: - Utility matrix is product of two simpler matrices (long, thin): - ∃ small set of users & items which characterize behavior - Small set of features determines behavior of most users - Can use NMF, 𝑈𝑉, or SVD
Predict ratings from similarity
Predict using a similarity-weighted average of ratings:
Content-based Recommender
Recommend based on item properties/characteristics Construct item profile of characteristics using various features - Construct item features: - Text: use TF-IDF and use top 𝑁 features or features over a cutoff - Images: use tags -- only works if tags are frequent & accurate Construct user profile Compute similarity: Jaccard, Cosine max dot product
Recommend best items
Recommend items with highest predicted rating: - Sort predicted ratings 𝑟̂ 𝑢𝑖 - Optimize by only searching a neighborhood which contains the 𝑛 items most similar to 𝑖 - Beware: - Consumers like variety - Don't recommend every Star Trek film to someone who liked first - Best to offer several different types of item
SVD vs. NMF
SVD: - Must know all ratings -- i.e., no unrated items - Assumes can minimize squared Frobenius norm - Very slow if matrix is large & dense NMF: - Can estimate via alternating least squares (ALS) or stochastic gradient descent (SGD) - Must regularize - Can handle big data, biases, interactions, and time dynamics
SVD
Singular Value Decomposition
long tail
So the problem of the long tail is that learning to rank the items becomes critical. A secondary problem is diversity as you want to show items of different types or categories to the user, not all items in your recommendation should be similar. - look into diversity increasing algorithms
Two methods to estimate NMF factors:
Stochastic gradient descent (SGD): - Easier and faster than ALS - Must tune learning rate - Sometimes called 'Funk SGD' after originator Alternating least squares (ALS): - Use least squares, alternate between fixing 𝑞𝑖 and 𝑝𝑢 - Available in Spark/MLibFast if you can parallelizeBetter for implicit (non-sparse) data Beware of local optima!
Cosine distance
Use for ratings (non-Boolean) data Treat missing ratings as 00 Cosine + de-meaned data is the same as Pearson
Jaccard Distance
Use only Boolean (e.g., buy/not buy) data Loses information with ratings data
CF using similarity
Use similarity to recommend items: - Make recommendations based on similarity: - Between users - Between items - Similarity measures: - Pearson - Cosine - Jaccard matrix sparsity = # ratings/ total # elements - low sparsity -> don't do collab filtering
Utility Matrix
User rating of items User purchase decisions for items Most items are unrated ⇒⇒ matrix is sparse Unrated are coded as 0 or missing Use recommender: - Determine which attributes users think are important - Predict ratings for unrated items - Better than trusting 'expert' opinion
Two types of similarity-based CF
User-based: predict based on similarities between users - Performs well, but slow if many users - Use item-based CF if |𝑈𝑠𝑒𝑟𝑠|≫|𝐼𝑡𝑒𝑚𝑠| Item-based: predict based on similarities between items - Faster if you precompute item-item similarity - Usually |𝑈𝑠𝑒𝑟𝑠|≫|𝐼𝑡𝑒𝑚𝑠|⇒ item-based CF is most popular - Items tend to be more stable: - Items often only in one category (e.g., action films) - Stable over time - Users may like variety or change preferences over time - Items usually have more ratings than users ⇒⇒ items have more stable average ratings than users
collaborative filtering
a process that automatically groups people with similar buying intentions, preferences, and behaviors and predicts future purchases
presentation bias
analyze bias in data, stress diversity results appearing lower affect: - if post is even seen - even if seen, they may not interact because further down divide by bias (probability of item is clicked over some other relevant lower ranked item) * irrelevant item bias personalized feed - ex. scrolling speed
Normalization
user-item rating bias = global avg + item's avg rating + user's avg rating