Skip to content

Recommendations

Introduction

Funnelback has the ability to recommend a list of items that are believed to be related to a given seed item. For example, it might recommend a list of URLs that were clicked on in the same search sessions as the seed URL, or recommend a list of products that were frequently purchased in the same session as a given product.

Recommendations are designed to be accessed via an API, and used within individual contents pages to provide recommendations such as people who were interested in this item were also interested in...

Recommendations are not enabled by default. To enable them, please see the recommender collection.cfg setting.

Data sources

The main source of information for the recommender system are Funnelback's query and click logs, which record information on which queries users submitted and which results they clicked on. These logs will be processed automatically as part of the collection update process, and no special configuration is needed to set the recommender system up. By default only query and click records from the last 60 days will be taken into account when processing the logs.

The results of this processing are a number of data sources which the system takes into account, in decreasing order of preference:

  1. CO_CLICKS: items that co-occur as clicks in the same (time-limited) search session as the original seed URL.
  2. RELATED_CLICKS: items that were clicked on for the top N queries associated with the seed URL.
  3. RELATED_RESULTS: the most frequently occurring search results that are returned for the top N queries associated with the seed URL.
  4. EXPLORE_RESULTS: results from running an explore query. These suggestions will be based on how similar their textual content is to the seed URL.

There is another data source that does not have data of its own, and this is the default used if no explicit source setting is given in a request to the system:

  • DEFAULT: Query all the sources listed above and combine the results.

If recommendations have come from more than one data source we combine the lists, using the following comparison chain:

    compare(x.getSource(), y.getSource())
    .compare(y.getFrequency(), x.getFrequency())
    .compare(x.getRank(), y.getRank())
  • The data sources above are listed in order of preference, so items from the CO_CLICKS source will always win out over other sources and be listed first.
  • If two items have the same source then their frequency of occurrence (across all sources) will be compared next. This means that the more often an item appears the higher it will rise in the list.
  • Finally, if two items have the same source and frequency, we will compare their rank in the source they came from.
  • There should be no items with duplicate IDs (e.g. the same URL) in the final sorted list.

Other sources of data outside those processed as standard might include:

  1. Social media "likes" (e.g. from the Facebook API, Twitter mentions etc.)
  2. Purchase data (e.g. from client's e-commerce database system)
  3. Data from web analytics software (e.g. Google Analytics)
  4. Web server access logs

To support these other sources their data would need to be exported and converted into a 'pseudo' click log format, for processing by Funnelback. For example, the fact that someone purchased a product would be recorded as a click on the product URL in the generated click log.

Architecture

The diagram below shows the architecture of the Recommender System:

Recommender-architecture.png

On the bottom left hand side we see the standard Funnelback click logs being processed by the recommender, which then produces a suggestions database. The section above shows implementation-specific purchase data and social media data being exported and converted into pseudo click logs for use by the recommender system. As noted in the diagram, integration is required for this i.e. some kind of conversion script or program will have to be written to achieve this.

Once the suggestions database is available for a particular collection the recommender end-point can respond to RESTful HTTP requests from callers, returning a JSON response.

RESTful API

The following example URL is one that a caller might request to get recommendations for the given seed item and collection:

http://example.com/s/recommender/similarItems.json?seedItem=http://example.com/jobs/graduate/&collection=sample&maxRecommendations=10&scope=example.com/jobs/&source=default

The parameters for the request are:

Parameter Description
seedItem item (URL) to get suggestions for
collection Funnelback collection that the URL is expected to be in
maxRecommendations maximum number of recommendations to return (optional, may be less than this available). If not specified then the system will attempt to return as many recommendations as it can.
scope comma separated list of scopes to match (optional)
source source of recommendations, one of default, co_clicks, result_clicks, related_results or explore_results (optional). Here default specifies that all sources will be queried and the results blended. Note that if you are trying to support cross-domain requests from within browsers then you will need to make use of JSONP. For example, if you are using jQuery then setting dataType to jsonp will cause an extra callback parameter to be added to the request URL to have a given callback function specified.

JSON Response

A sample JSON response is shown below:

{
    "RecommendationResponse": {
        "status": "OK",
        "seedItem": "http://example.com/jobs/graduate/",
        "collection": "sample",
        "scope": "example.com/jobs/",
        "maxRecommendations": 10,
        "sourceCollection": "sample",
        "source": "DEFAULT",
        "timeTaken": 37,
        "recommendations": [
            {
                "itemID": "http://example.com/jobs/graduate/how-to-apply/",
                "source": "CO_CLICKS",
                "title": "Graduate Jobs Application Process",
                "date": 1379944800000,
                "qieScore": 4.679,
                "metaData": {
                    "f": "text/html",
                    "d": "2013-04-10",
                    "t": "Graduate Jobs Application Process",
                    "b": "http://example.com/legal/",
                    "s": "Careers, Jobs, Graduates",
                    "c": "This document provides information on how to apply for graduate-level jobs at example.com",
                    "a": "Information Management Services (Phone: 9265 2876)",
                    "l": "en",
                },
                "description": "This document provides information on how to apply for graduate-level jobs at example.com",
                "format": "text/html",
                "frequency": 1
            },
        ]
    }
}

In addition to all of the input parameters being reflected back at the start of the JSON response, details on the meaning of the other fields are as follows:

Main response fields:

Field name Description
status Status of response. This will be one of: OK, SEED_NOT_FOUND (seed item is not known about in this collection) or NO_SUGGESTIONS_FOUND.
numRecommendations The actual number of recommendations returned in this response.
sourceCollection The source collection that the recommendations come from. This will usually be the same as the requested collection, but may be different if the requested collection was a meta collection and the data source (click logs) were present in a component collection.
source The requested data source of the recommendations (same as the 'source' parameter in the request).
timeTaken The amount of time taken to generate this response, in milliseconds.

Details on the fields in the individual recommendations are as follows:

Field name Description
itemID The ID of this recommended item (usually a URL, but could be a unique product ID etc.).
source Source of this individual item - see section on data sources above.
title Title of the item (e.g. HTML title of a web page).
date The last modified date of the item, expressed as a Unix timestamp (milliseconds since the epoch).
qieScore The QIE score for this item.
metadata Values for individual metadata classes.
description Description as extracted from document metadata.
format The format (MIME type) of this item.
frequency The frequency of occurrence of this item across all queried data sources. The example above shows details of only one recommendation - in practice there will usually be more than one item in the list.

Item IDs

  • If the Item ID returned in the JSON response is a URL then it will be the URL as indexed by Funnelback.
  • This may be different to the display URL, which may be a transformed version of the indexed URL so that a user can load it correctly in their web browser.
  • For example, a database collection may have record URLs which need to be transformed by a Groovy filter during the filtering phase.
  • This means that the caller which processed the JSON response from the recommender system may need to do a similar transformation so that they can display working URLs to end users.

Logging

Recommender update log messages will be written to:

$SEARCH_HOME/data/<collection>/log/update_recommender.log

Caveats

  • The recommendation system does not currently operate on push collections.

See also

top

Funnelback logo
v15.24.0