Skip to content

Social media collections: Facebook


Facebook is a social media site focused sharing content among groups of friends, though it has become widely used by organisations seeking to connect with their customers.

Funnelback supports crawling Facebook pages and gathering data such as posts, events and page information. Funnelback is closely linked to Facebook's graph search API. If you are not familiar with it you may want to look at getting started with Facebook's graph API. You should be comfortable at finding page IDs. You should apply for written approval of automated data collection, by going here.

A Facebook gathering template is included as part of Funnelback's social media collections support to allow content from Facebook to be gathered and then presented within Funnelback search results.

Please note that your usage of Funnelback to gather content from Facebook must comply with Facebook's terms of service.

Setting up collection

To create a Facebook collection, you will need to create a social media collections and select the Facebook template.

Once you have created the collection you will need to fill out the collection.cfg file. To do this go to the Administer tab of the admin home page and then click on the Browse Collection Configuration Files link.

Getting your API key and secret

Before you can crawl Facebook you are going to need to get an app ID and secret. To do this first get a Facebook developer account and then go to the developer app page and create a new app. Following this should give you your app ID and secret.

Configuration options

Facebook's gathering template will read the configuration from collection.cfg. The following settings are supported:

  • Application ID
  • Application secret
  • Comma delimited list of IDs of the Facebook pages/accounts to crawl
  • facebook.debug: Boolean flag to enable debug mode. When debug mode is enabled the gathering script will print out the crawled records in XML form.

# This is the ID of Funnelback's Facebook page

The gathering template can be further customised to crawl only specific entity types (e.g. Events, Posts) and to configure which fields should be returned for each entity.

Metadata mappings

The Facebook gathering template includes predefined Facebook specific metadata mappings:

Class IDTypeBehaviourExplanationMetadata fields included
authortextcontent/FacebookXmlRecord/eventOwner/name, /FacebookXmlRecord/page/nameWithLocationDescriptor, /FacebookXmlRecord/postFrom/name
authorIdtextdisplay/FacebookXmlRecord/eventOwner/id, /FacebookXmlRecord/postFrom/id
ctextcontentDescription/FacebookXmlRecord/eventDescription, /FacebookXmlRecord/page/about, /FacebookXmlRecord/page/description, /FacebookXmlRecord/postMessage
citytextcontent/FacebookXmlRecord/eventVenue/city``/FacebookXmlRecord/eventVenue/city, /FacebookXmlRecord/page/location/city, /FacebookXmlRecord/postLocation/city
countrytextcontent/FacebookXmlRecord/eventVenue/country, /FacebookXmlRecord/page/location/country, /FacebookXmlRecord/postLocation/country
ddatedateDate/FacebookXmlRecord/eventStartTime, /FacebookXmlRecord/postCreatedTime
identifiertextdisplay/FacebookXmlRecord/eventId, /FacebookXmlRecord/page/id, /FacebookXmlRecord/postId
imagetextdisplay/FacebookXmlRecord/page/cover/source, /FacebookXmlRecord/postPictureURL
latLonggeospatial x/y co-ordinateN/A/FacebookXmlRecord/eventVenue/latitudeLong, /FacebookXmlRecord/postLocation/latLong
locationtextdisplayEvent location/FacebookXmlRecord/eventLocation
postcodetextdisplayZip/post code/FacebookXmlRecord/eventVenue/zip, /FacebookXmlRecord/page/location/zip, /FacebookXmlRecord/postLocation/zip
postLinktextcontentPost link/FacebookXmlRecord/postLink
statetextdisplay/FacebookXmlRecord/eventVenue/State, /FacebookXmlRecord/page/location/state, /FacebookXmlRecord/postLocation/state
streettextdisplay/FacebookXmlRecord/eventVenue/street, /FacebookXmlRecord/page/location/street, /FacebookXmlRecord/postLocation/street
ttextcontentEvent/page title/FacebookXmlRecord/eventName, /FacebookXmlRecord/page/name


Please note that Facebook applies limits to the volume of content which can be retrieved from their APIs, and so in the case of large pages Funnelback may be unable to gather all historical content.

Customising gathering template

To apply further customisation please edit custom_gather.groovy.

Define queries

To crawl Facebook you need to tell Funnelback what to crawl. This is done by specifying queries to Facebook, the general format of the query is:

 new FacebookQuery***(facebookConnector, id + "URL", parameters)


  • FacebookQuery***: The type of query Funnelback should expect. Care should be take to ensure the URL will return a page that is related to the facebookQuery.
  • facebookConnector: This is the connector to Facebook which can use the app id and secret.
  • id: The id of the user or page to crawl.
  • URL: The URL suffix to crawl, for example "/feed".
  • parameters: A comma separated list of parameters.

Types of queries


Gathers all events created by the specified page.

new FacebookQueryEvent(facebookClientApp, id + "/events")


Gathers page related information for the specified page.

new FacebookQueryPage(facebookClientDefault, id)


Gathers all posts from the specified page.

new FacebookQueryPost(facebookClientApp, id + "/feed")

You should create a list of all the queries to specify what you want to crawl.


List<FacebookQuery> queries = []
pageIds.each{ pageId ->
    queries.add(new FacebookQueryPost(facebookClientApp, pageId + "/feed"))
    queries.add(new FacebookQueryEvent(facebookClientApp, pageId + "/events"))
    queries.add(new FacebookQueryPage(facebookClientDefault, pageId))

In this example, the first query gets the posts from the page's feed, the second query gets the events and the third gets details about the page.

Working with the crawled data

Funnelback will crawl Facebook and convert responses into XML. You can use the metadata customisation tool to map elements to a metadata class.

FacebookQueryPost XML Example

  <postMessage>this is the post message</postMessage>
  <postCreatedTime>Tue Aug 27 13:42:17 EST 2013</postCreatedTime>
  <postFrom class="com.restfb.types.CategorizedFacebookType">
    <id>id of poster</id>
    <name>Name of poster</name>
      <commentId>comment id</commentId>
      <url> id</url>
      <commentMessage>comment one of three</commentMessage>
      <commentFrom class="com.restfb.types.CategorizedFacebookType">
        <id>Comment poster ID</id>
        <name>Name of poster</name>
      <commentCreatedTime>2013-08-27 03:42:37.0 UTC</commentCreatedTime>

FacebookQueryEvent XML Example

  <eventStartTime>2024-02-13 20:00:00.0 UTC</eventStartTime>
  <eventEndTime>2024-02-13 23:00:00.0 UTC</eventEndTime>
  <eventLocation>Lima, Peru</eventLocation>
    <id>owner id</id>

FacebookQueryPage XML Example

    <name>page namme</name>
    <description>the long description</description>
    </about>Another description</about>

See also


Funnelback logo