《推荐系统实践》关于Latent Factor Model

Latent Factor Model,很多人称为SVD,其实是比较伪的SVD,一直是最近今年推荐系统研究的热点。但LFM的研究一直是在评分预测问题上的,很少有人用它去生成TopN推荐的列表,而且也很少有人研究如何将这个数据用到非评分数据上。

本来这本书不准备在实践部分讲这个算法,而只准备在后面介绍学术界研究热点的时候讲这个算法。但后来发现,如果不讲,显得实践部分都是些加减乘除的小把戏,没啥技术含量啊。于是我还是将如何在非评分数据上做LFM放到了实践的部分,当然这方面的相关论文还非常少。不过我觉得LFM在实践部分还是有其前景的。

具体怎么做,先卖个关子不忙说。先公布一个实验结果吧。我们知道,LFM有一个副产品是对物品自动聚类,我今天写书的时候在MovieLens数据集上试了一把,发现效果不错,先公布出来。

推荐系统实践样章部分公布

http://www.ituring.com.cn/article/725

样章很奇怪,是书的中间一章,关于标签推荐的。之所以选这章作为样章,是因为这一章内容相对独立,而且内容不是很多,比较容易写出来。这次样章分批公布,这次公布的是这一章的前言部分,也就是经常被和我一样的广大民工同志们称为废话的一部分。这一章的剩余部分会在未来的几周陆续公布。不过样章中有些实验结果尚未公布,只讨论了方法,结果要等正式出版时公布,留点悬念。

此外,我一开始写这本书的时候很多人都怕写的很晦涩难懂,不过这次我好像矫枉过正,写的过于简单了。同志们多批评。

目前我主要在集中写这一章的前一章,即如何利用隐反馈数据,主要包括neighborhood-based, latent factor model和 graph。

个性化选择RSS源并生成个性化的Feed

自从GoogleReader改版之后,现在很难找到一个阅读列表里面大部分文章是自己喜欢的,每天都得点击自己比较喜欢的那些订阅的feed,一个个的看,相当的麻烦。而且,因为没有朋友的分享,很难发现新的feed,只能在自己订阅的feed里找来找去。

于是,我把之前爬下来的google reader数据分析了一下,做了一个工具: http://www.reculike.com/reader.php

打开这个工具,首先看到的是google reader里最热门的feed,你可以选择你喜欢的进行订阅,当选择完一页后,可以点击刷新按钮,他会根据你之前的选择生成新的一屏的个性化的feeds推荐,你可以继续选,每次不满意,就刷新一下。

所有的用户行为都纪录在cookie中,当你想换个兴趣重新找feed的时候,可以点击重置按钮清空之前的历史行为纪录。选完feed后,可以点击生成rss按钮,他会生成一个rss,这个rss包含了所有你之前选的feed的文章的最新100条纪录,你可以在google dreader订阅这个feed。

不过因为服务器性能很土憋,不能保证你订阅的feed的文章实时更新,所以你也可以把找到的feed自己一个个加到阅读器里。

比如下面是一个我选择的技术feed的合并feed
http://www.reculike.com/site/reader/myfeed.php?uid=21

Talk at MLA11 : Our solution of KDDCup 2011

Hulu’s Recommendation System

This article comes from Hulu tech blog http://tech.hulu.com/blog/2011/09/19/recommendation-system/

This article is written by zhenghua, lihang and me. Haha

As the Internet gets more and more popular, information overload poses an important challenge for a lot of online services. With all of the information pouring out from the web, users can be overwhelmed and confused as to what, exactly, they should be paying attention.

A recommendation system provides a solution when a lot of useful content becomes too much of a good thing. A recommendation engine can help users discover information of interest by analyzing historical behaviors. More and more online companies — including Netflix, Google, Facebook, and many others — are integrating a recommendation system into their services to help users discover and select information that may be of particular interest to them.

With literally tens of thousands of hours of premium video content, Hulu users are also prone to content overload. Given the wide variety of content available on the service at any one time, it may be difficult for Hulu users to discover new video that best matches their historic interests. So the first goal of Hulu’s recommendation system is to help users find content which will be of interest to them.

In addition to users, Hulu’s recommendation system should also help content owners promote their video. Part of our mission is to deliver a service that users, advertisers, and content owners all unabashedly love. We have many different content partners, and we understand that these content partners want to more Hulu users to watch their videos — especially when new videos are released. By using personal recommendation instead of more traditional recommendation systems, we can promote video content more effectively since we will promote directly to users who are likely to enjoy the content we are recommending.

Data Characteristics

Before explaining the design of our recommendation system, we wanted to explain some parameters within our data.

Since a lot of our content is comprised of episodes or clips within a show, we have decided to recommend shows to users instead of individual videos. Shows are a good method of organization, and videos in the same show are usually very closely related.

Our content can be mainly divided into two parts: on-air shows and library shows. On-air shows are highly important since more than half of our streaming comes from them.

Although on-air shows occupy a large part of our content, they are touched by a seasonal effect. During summer months, most of on-air shows do not air, causing on-air show streaming to decrease. Furthermore, there are fewer shows aired during weekends, thus the streaming of library shows will increase. Keeping this information in mind we can design the recommendation system to recommend more library shows to users during the weekend or summer months, as an example.

The key data that drives most recommendation systems is user behavior data. There are two main types of user behavior data: implicit user feedback data and explicit user feedback data. Explicit user feedback data primarily includes user voting data. Implicit feedback data includes information on users watching, browsing, searching, etc. Explicit feedback data can show a user’s preference on a show explicitly, but implicit feedback data cannot. For example, if a user gives a 5-star rating to a show, we know that this user likes the show very much. But if a user only watches a video from a show page or searches for a show, we don’t know whether this user likes the show.

As the quantity of implicit data at Hulu far outweighs the amount of explicit feedback, our system should be designed primarily to work with implicit feedback data.

Architecture

There are many different types of recommendation algorithms, and perhaps the most famous algorithm is collaborative filtering (CF). CF relies on user behavior data, and its main idea is to predict user preferences by analyzing their behaviors. There are two types of CF methods: user-based CF (UserCF) and item-based CF (ItemCF). UserCF assumes that a user will prefer items which are liked by other users who have similar preferences to that user. ItemCF assumes that a user will prefer items similar to the assets he or she preferred previously. ItemCF is widely used by many others (for example, Amazon and Netflix), as it has two main advantages. Firstly, it is suitable for sites where there are a lot more users than items. This allows ItemCF to easily explain recommendations given users’ historical behaviors. For example, if you have watched “Family Guy” on Hulu, we will recommend “American Dad” to you and tell you that we recommend this because you have watched “Family Guy”. So we use ItemCF as our basic recommendation algorithm in Hulu.

On-line Architecture

Figure 1 shows our on-line architecture of the recommendation system. This system contains 5 main modules:

  1. User profile builder: When a user first comes into the recommendation system, we will first build a profile for them. The profile includes the user’s historical behaviors and topics, and these are generated from their old behaviors. Users can have many different types of behaviors. For example, they can watch videos, add shows to favorites, search for videos and vote on videos and shows. All these behaviors are all considered by our system and, after extracting all these behaviors, we use a topic model which is trained offline to generate users’ preference on topics.
  2. Recommendation Core: After generating the list of user’s historical preferences on shows and topics, we put all of those similar shows into raw recommendations.
  3. Filtering: For some pretty obvious reasons, raw recommendation results cannot be presented to users directly. We need to filter out shows the user has already seen or engaged with, so we can increase the recommendations shows a little more precise.
  4. Ranking: The ranking module will re-rank raw recommendations to make them better fit users preferences. First, we’ll make recommendation more diverse. Then we’ll increase novelty of recommendations so that users will find shows they like, but have never seen before.
  5. Explanation:Explanation is one of the most important components of every recommendation system. The explanation module generates some reasoning for every recommendation result using the user’s historical behaviors. For example, we will recommend “American Dad” to a user who had previously watched “Family Guy.” The explanation will say, “We recommend ‘American Dad’ to you because you have watched ‘Family Guy’”.
    Figure 1 : Architecture for Hulu 

    Off-line Architecture

    In the above on-line architecture, some components rely on offline resources, such as the topic model, related model, feedback model, etc. The off-line system is also an important part of our recommendation system. Our off-line system has these main components:

    1. Data Center: The data center contains all user behavior data in Hulu. Some of them are stored in Hadoop clusters and some of them are stored in a relational database.
    2. Related Table Generator: The related table is an important resource for on-line recommendation. We use two main types of related table: one that’s based on collaborative filtering (which we’ll call CF), and another based on content. In CF, show A and show B will have high similarity if users who like show A also like show B. With content filtering, we use content information including title, description, channel, company, actor/actress, and tags.
    3. Topic Model: A topic is represented by a group of shows that have similar content. Topics are thus larger in scope than shows, but they’re still smaller than channels. Our topics are learned by LDA, which is a popular topic model in machine learning.
    4. Feedback Analyzer: Feedback specifically means users’ reactions to recommendation results. Using user feedback can improve recommendation quality. For example, say a show is recommended to many users, but most of them do not click this show. In that case, we’ll decrease the rank of this show. Users will also have different types of behavior, so we’ll use all these behaviors in developing the recommendations. However, some users may prefer recommendations to come from their prior watch history, and some users may prefer their recommendations to come from their voting behavior. All these effects can be modeled offline by analyzing users’ feedback on their recommendations.
    5. Report Generator: Evaluation is most important part of the recommendation system. The report generator will generate a report including multiple metrics every day to show the quality of recommendations. At Hulu we monitor metrics including CTR, conversion ratio, etc.

     

    Figure 2 : Architecture for Hulu 

     

    Algorithms

    So far, we’ve given a brief overview of our recommendation architecture. From previous discussion, we can see that Hulu’s recommendation system is primarily based on ItemCF. We’ve added many improvements on top of the ItemCF algorithm, too, in order to make it generate better recommendations. To test these improvements, we’ve performed many A/B tests on different algorithms. In following sections, we’ll introduce some of these algorithms and the experiment results.

    Item-based Collaborative Filtering

    Item-based Collaborative Filtering (ItemCF) is the basis of all our algorithms. In ItemCF, let N(u) be a set of items user u has preferred previously. User u’s preference on item j (j is not in N(u)) can then be measured by:

    p(u,i) = \sum_{j \in N(u)} r(u,j) s(i,j)

    Here, r(u,i) is the preference weight of user u on show i, and s(i,j) is the similarity between show i and show j. In CF, the similarity between two shows is calculated by user behavior data on these two shows. Let N(i) be a set of users who watched show i and N(j) be a set of users who watched show j. Then, the similarity s(i,j) between show i and show j is calculated by following formula:

    s(i,j)=\frac{\left | N(i)\cap N(j) \right |}{\sqrt{\left | N(i) \parallel N(j) \right |}}In this definition, show i will be highly relevant to show j if most users who watch show i will also watch show j. However, this definition will have the “Harry Potter problem,” which means that every show will have high relevance with popular shows.

    Recent Behavior

    The first lesson we learned from A/B testing is that recommendations should fit users’ recent preference and that users’ recent behavior is more important than their older, historical behaviors. So, in our engine, we will put more weight on users’ recent behaviors. In our system, CTR of recommendations that originate from users’ recent watch behavior is 1.8 times higher than CTR of recommendations originating from users’ old watch behavior.

     

     

    Novelty

    Just because a recommendation system can accurately predict user behavior does not mean it produces a show that you want to recommend to an active user. For example, “Family Guy” is a very popular show on Hulu, and thus most users have watched at least some episodes from this show. These  users do not need us to recommend this show to them — the show is popular enough that users will decide whether or not to watch it by themselves.

    Thus, novelty is also an important metric to evaluate recommendations. The first way we think can increase novelty is by revising ItemCF algorithm:

    1. First, we will decrease weight of popular shows that users have watched before.
    2. Then, we’ll put more weight on shows that are not only similar to shows the active user watched before, but also less popular than shows the active user watched before.

    Explanation-based Diversity

    Most users have diverse preferences, so the recommendation should also meet their diverse interests. In our system, we use explanations to diversify our recommendations. We think a diverse recommendation means most of the recommendation shows have different explanations.

    We have performed an A/B test to show the usefulness of diversification (shown in the above figure). The results of the experiment show that, for active users who had previously watched 10 or more shows, diversification can increase recommendation CTR significantly.

    Temporal Diversity

    A good recommendation system should not generate static recommendations. Users want to see new suggestions every time they visit the recommendation system. If a user has new behaviors, she will find her recommendations have changed because we have put more weight on the user’s recent behaviors. But if a user has no new behaviors, we also need to change our recommendations. We use three methods to keep temporal diversity of our system:

    1. First, we’ll recommend recently-added shows to users. Many new shows are added to Hulu every day, and we will suggest these shows to users who will like them. Thus, users will see fresh ideas for shows to watch when new ones are added.
    2. Second, we will randomize our recommendations. Randomization is the simplest way to keep recommendations fresh.
    3. Finally, we’ll decrease rank of recommendations which users have seen many times. This is called implicit feedback, and data show that CTR is increased by 10% after using this method.

    Performance of Hulu’s Recommendation Hub

    The recommendation hub is a personal recommendation page for every user. On this page users will see 6 carousels. The top carousel is “top recommendations”, which includes shows that we think users will prefer very much. After top recommendations, there are three carousels for three genres. These three genres are selected by analyzing users’ historical preferences. The next carousel is bookmarks, which include shows that users have indicated they’d like to watch later. The last carousel is filled with shows that the user has already rated. This carousel is designed to collect more explicit feedback from users.

    We have performed an A/B test to compare our recommendation algorithms with two simple recommendation algorithms: Most Popular (which recommends the most popular shows to every user) and Highest Rated (which recommends highly-rated shows to every user). As shown in the above figure, experiment results show that the CTR of our algorithm is much higher than both simple methods.

    Lessons

    Every user behavior can reflect user preferences.

    In our system, we use a slew of user behaviors to come up with our recommendations. We’ve calculated the CTR of recommendations originating from different types of behaviors. As shown in Figure 3, we can see that recommendations from every type of behavior can generate recommendations that will be clicked by users.

    Figure 3 : CTR of recommendations come from different types of behaviors 

    Explicit Feedback data is more important than implicit feedback data

    As shown in Figure 3, CTR of recommendations that originate from users’ historically loved (vote 5 stars on shows) and liked (vote 4 stars on shows) behaviors is higher than CTR of recommendations that come from users’ historical subscribe/watch/search behavior. So although the size our explicit feedback data is much smaller than implicit feedback data, they’re much more important.

    Recent behaviors are much more important than old behaviors

    Novelty, Diversity, and offline Accuracy are all important factors

    Most researchers focus on improving offline accuracy, such as RMSE, precision/recall. However, recommendation systems that can accurately predict user behavior alone may not be a good enough for practical use. A good recommendation system should consider multiple factors together. In our system, after considering novelty and diversity, the CTR has improved by more than 10%.

    Based on the paper “Recommendation System at Hulu” by Liang Xiang, Hua Zheng and Hang Li.
    Hua Zheng is the senior lead developer in charge of the Hulu content recommendation and behavior targeting systems.
    Dr. Xiang and Dr. Li, associate researchers, are working together on the recommendation system, helping users discover and enjoy relevant premium videos.

 

推荐系统的有效性——Amazon到底是百分之多少

Amazon作为推荐系统的老大(King of recommender system),关于推荐系统对amazon究竟起了多大的作用,一直广受学术界和工业界的关注,而各方面的数字也很多。我发挥了考据学的精神,把这些数字都考据出来,当然我也不知道什么是真的。

长尾理论一书的作者虽然没有对推荐系统的作用做出估计,但估计了长尾内容对Amazon销售额的贡献比例。Anderson对这个数字做过两次估计。第一次是和MIT的研究团队一起估计出57%的销售来自长尾。这个数字的基础是Amazon曾经公布过在2001年到2003年排名前100000的商品占了39.2%的销售额。但在后来的更精确的估计中他提出这个比例应该在25%到36%之间。

Amazon的前科学家Greg Linden在他们的blog中也讨论了推荐系统在Amazon中的作用,他提到在他离开时,推荐系统对Amazon的贡献额在20%左右:

Personalization was responsible for well more than 20% of sales when I left Amazon in 2002.

此外,Amazon的一位科学家曾经在斯坦福讲推荐系统的课,一位听了他的课的同学在自己的blog里提到20% – 30% 的销售额来自于推荐系统。

从上面的考据可以基本判定,推荐系统对Amazon的销售额的贡献在20%到30%之间。

更新 :

9月21号Greg Linden又发表了一篇blog提到了这个问题,里面给出的数字是35%。然后他引用的是这篇文章。这篇文章中有下面一段话:

Amazon says 35 percent of product sales result from recommendations.

不过文章并没有给出这句话的来源。而评论里也有群众提到了这个问题:

Nice write-up, Matt. From where did you get “Amazon says 35 percent of product sales result from recommendations”?

所以,20%到30%应该还是差不多靠谱的数字。但这个数字还是很高了。

推荐系统有效性—— Digg 40%的提升

最近准备写一个系列,是关于各个公司用了推荐系统后的关于推荐系统究竟有没有给他们带来利益的报告。所以称为推荐系统有效性系列。

今天的例子是Digg。Digg在2008年有一篇官方的blog,可以从下面的地址看到 http://about.digg.com/blog/digg-recommendation-engine-updates

Blog的主要意思就是说,Digg在那个时候发布了他们的新的推荐算法,然后他们经过了1个月的测试,测试到了一系列数据。他们觉得他们的算法NB了,于是贴出来炫耀一下。下面是一些他们提高的指标

1. Digg行为的活跃度获得了明显的提高,新算法发布后,每天的用户Digg总数提高了40%

2.推荐系统的影响越来越大,平均每个有digg行为的用户每天会获得200个推荐结果,这些结果来自和他们有相似兴趣的其他digger。由此可以看到Digg的推荐算法是类似于基于用户的协同过滤算法。统计结果显示,平均每个活跃用户会有34个和他们兴趣相似的digger。

3. 用户好友数增加了24%。

4. 用户的评论数增加了11%。

看完上面的数据,大家肯定热血沸腾了,但大家肯定困惑,Digg是通过什么算法取得这个效果的。不用担心,关于Digg的推荐系统,在2008年还有一篇文章 http://www.technologyreview.com/Infotech/21045/page1/ 这篇文章详细讨论了背后的算法。 会英文的同学可以很容易的读懂上面的文章,但我在这里还是想总结一下。

1. Digg的算法和Amazon不同,他不是给用户推荐和他们的历史行为相关的物品,而是更加依赖于集体智能,给用户推荐和他们兴趣相似的用户喜欢的文章。换句学术的话,Digg用的是UserCF算法,而不是ItemCF算法

2. Digg一开始是让用户提交他们喜欢的文章的链接,如果别的用户喜欢这个文章,可以顶一下,不喜欢可以踩一下。而原先digg的首页展示了被顶的最多的热门文章。因此,在Digg的系统中,热门度是文章很重要的属性。UserCF的算法可以在提供个性化的同时保证热门度,而很多基于Item的算法不能保证这一点,这也是Digg选择UserCF的一个原因。

3. 系统在利用UserCF的同时,也考虑到了主题(Topic)的影响。Digg认为,如果两个人digg了很多同样的体育文章,不代表他们的政治观点也是一致的。因此,他们会计算不同topic中的用户的兴趣相似度。也就是说用户在体育领域兴趣相似,不会扩展到政治领域。当然,digg的topic都是很大的topic,所以并不会太多的影响推荐结果的多样性。

4. 我们知道,UserCF在计算用户的兴趣相似度时,两个用户有相似的兴趣是因为他们共同看过同一篇文章,但如果这样的话,一个用户看过一篇热门的文章,就会和很多用户产生相似度。Digg注意到了这个问题,并尽量去除了这种影响。

reculike的几点改动

reculike.com 上线后感谢大家的支持,有了一定的访问量。不过目前的访问量还不支持能够算出好的推荐结果,所以希望大家能多反馈。

最近对reculike做了一些改动,总结如下

1. 用户的主要显性反馈为两种。每篇paper下面可以让用户bookmark,表示用户对这篇paper感兴趣,准备记录下来,以后有时间仔细研究。另外,在paper的页面,用户可以recommend一篇paper,表示用户觉得自己对这篇文章很熟悉,觉得很好,希望推荐给别人。目前,用户如果要recommend文章,就一定要写推荐语。

这两种行为代表了一种专家和普通用户的互动。今后在这方面还有一些后续的功能。比如,一个普通用户可能bookmark一篇文章,表示他对文章有兴趣,那么这个时候,如果有专家recommend这篇文章,系统就会在首页上告诉这个用户有专家recommend这篇文章了,那么如果这个用户对这篇文章有疑问,可以向这位专家请教。因此,可以通过paper来联系用户,实现用户的互动。

2. 在首页显示了用户的bookmark过的paper,用户recommend过的paper,和系统给用户的推荐paper。默认显示推荐的paper,但用户可以通过点击上面的链接来切换不同的paper列表。

目前系统还很粗糙,欢迎大家使用。有什么问题可以在sina微博上 @xlvector

RecULike 论文推荐系统初步上线

我们开发的论文推荐系统RecULike (http://www.reculike.com) 已经初步上线,不过目前还有很多bug,但基本能用,还在不断的改善中。

该系统是一个开源项目,他的源代码可以从下面获取

http://code.google.com/p/paperlens/

该项目的主要贡献者是

WangXing 和 GuoJing

机器推荐系统 vs 人肉推荐系统

推荐系统不仅仅存在于电脑上,现实社会中也有大量的推荐系统,比如推销员,买保险的,都是人肉推荐系统。今天看到grouplens的一篇奇文”Who predicts Better? – Results from an Online study comparing humans and online recommender system”

简单的说,grouplens的人做了个实验。他们找了几十个用户,展示了他们对电影打分的历史记录。然后,每个用户都有几个打分是隐藏的。然后,他们用movielens的推荐算法计算这些人会给那些隐藏的电影打多少分,同时他们又让一些人通过看这些用户的打分历史记录来估计他们会对隐藏的那些电影打多少分。

然后他们将用户的实际评分分别和机器预测器的预测结果以及人肉预测器的预测结果计算MAE。论文的结论有两个
1. 机器预测器NB
2. 人研究用户历史记录的时间越长,预测的越准确

个人认为这个实验很有意思,但并不公平。人做出的推荐可能在MAE的维度并不准确,但不见得在是否受用户欢迎这个维度差。人在推荐时会综合考虑各方面的因素,而机器的强项其实就是优化一个目标。如果要让机器统筹考虑,估计就比不过人了。