Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

先来看一下使用流程:
1)拿到DataModel
2)定义相似度计算模型 PearsonCorrelationSimilarity
3)定义用户邻域计算模型 NearestNUserNeighborhood
4)定义推荐模型 GenericUserBasedRecommender
5)进行推荐
  @Test
  public void testHowMany() throws Exception {
    DataModel dataModel = getDataModel(
            new long[] {1, 2, 3, 4, 5},
            new Double[][] {
                    {0.1, 0.2},
                    {0.2, 0.3, 0.3, 0.6},
                    {0.4, 0.4, 0.5, 0.9},
                    {0.1, 0.4, 0.5, 0.8, 0.9, 1.0},
                    {0.2, 0.3, 0.6, 0.7, 0.1, 0.2},
            });
    //用于计算最相似的用户,领域用户
    UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
    UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, dataModel);
    
    Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
    List<RecommendedItem> fewRecommended = recommender.recommend(1, 2);
    List<RecommendedItem> moreRecommended = recommender.recommend(1, 4);
    for (int i = 0; i < fewRecommended.size(); i++) {
      assertEquals(fewRecommended.get(i).getItemID(), moreRecommended.get(i).getItemID());
    }
    recommender.refresh(null);
    for (int i = 0; i < fewRecommended.size(); i++) {
      assertEquals(fewRecommended.get(i).getItemID(), moreRecommended.get(i).getItemID());
    }
  }

 

相似度计算,参考上篇的PearsonCorrelationSimilarity。

NearestNUserNeighborhood ,获取最近的N个用户,怎么实现的呢?
~/mahout-core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericUserBasedRecommender.java

  @Override
  public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
    Preconditions.checkArgument(howMany >= 1, "howMany must be at least 1");

    log.debug("Recommending items for user ID ‘{}‘", userID);
    
    //根据similarity模型进行计算,计算最相似的N个用户
    long[] theNeighborhood = neighborhood.getUserNeighborhood(userID);

    if (theNeighborhood.length == 0) {
      return Collections.emptyList();
    }
    //获取其他领域用户进行评分而且当前用户所没有进行评分的Item列表,作为推荐的基本池子 
    FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID);

    //获取池子里面,当前用户偏好最高的TopN进行推荐
    TopItems.Estimator<Long> estimator = new Estimator(userID, theNeighborhood);

    List<RecommendedItem> topItems = TopItems
        .getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);

    log.debug("Recommendations are: {}", topItems);
    return topItems;
  }
Estimator的实现,是这样的:

  private final class Estimator implements TopItems.Estimator<Long> {
    
    private final long theUserID;
    private final long[] theNeighborhood;
    
    Estimator(long theUserID, long[] theNeighborhood) {
      this.theUserID = theUserID;
      this.theNeighborhood = theNeighborhood;
    }
    
    @Override
    public double estimate(Long itemID) throws TasteException {
      return doEstimatePreference(theUserID, theNeighborhood, itemID);
    }
  }
}
 
  protected float doEstimatePreference(long theUserID, long[] theNeighborhood, long itemID) throws TasteException {
    //把相似用户对该Item的偏好累加起来,再做平均值,当做当前用户对改Item的偏好
    if (theNeighborhood.length == 0) {
      return Float.NaN;
    }
    DataModel dataModel = getDataModel();
    double preference = 0.0;
    double totalSimilarity = 0.0;
    int count = 0;
    for (long userID : theNeighborhood) {
      if (userID != theUserID) {
        // See GenericItemBasedRecommender.doEstimatePreference() too
        Float pref = dataModel.getPreferenceValue(userID, itemID);
        if (pref != null) {
          double theSimilarity = similarity.userSimilarity(theUserID, userID);
          if (!Double.isNaN(theSimilarity)) {
            preference += theSimilarity * pref;
            totalSimilarity += theSimilarity;
            count++;
          }
        }
      }
    }
    // Throw out the estimate if it was based on no data points, of course, but also if based on
    // just one. This is a bit of a band-aid on the ‘stock‘ item-based algorithm for the moment.
    // The reason is that in this case the estimate is, simply, the user‘s rating for one item
    // that happened to have a defined similarity. The similarity score doesn‘t matter, and that
    // seems like a bad situation.
    if (count <= 1) {
      return Float.NaN;
    }
    float estimate = (float) (preference / totalSimilarity);
    if (capper != null) {
      estimate = capper.capEstimate(estimate);
    }
    return estimate;
  }

 

总结:
1)计算最相似的N个用户
2)从最相似的N个用户中,获取自己没有评分过的Item
3)预计自己对每个Item的偏好
4)取偏好最高的N个Item进行推荐



 

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。