Facebook collections
Introduction
Facebook is a social media site focused sharing content among groups of friends, though it has become widely used by organisations seeking to connect with their customers.
Funnelback supports crawling Facebook pages and gathering data such as posts, events and page information. Funnelback is closely linked to Facebook's graph search API. If you are not familiar with it you may want to look at getting started with Facebook's graph API. You should be comfortable at finding page IDs. You should apply for written approval of automated data collection, by going here.
Please note that your usage of Funnelback to gather content from Facebook must comply with Facebook's terms of service.
Getting your API key and secret
Before you can crawl Facebook you need to get an app ID and secret. To do this first get a Facebook developer account and then go to the developer app page and create a new app. Following this should give you your app ID and secret. Alternatively an access token can be provided using facebook.access-token collection configuration option.
Configuration options
Facebook collections support the following settings:
- facebook.app-id: Application ID
- facebook.app-secret: Application secret
- facebook.page-ids: Comma delimited list of IDs of the Facebook pages/accounts to crawl
crawler.request_delay
: Specifies the number of milliseconds to wait before making another request to Facebook Graph API.- facebook.debug: Enable debug mode to preview facebook fetched records
- facebook.access-token: Optional Access Token
- facebook.page-fields: A comma delimited list of Facebook page fields
- facebook.post-fields: A comma delimited list of Facebook post fields
- facebook.event-fields: A comma delimited list of Facebook event fields
Metadata mappings
The Facebook gathering template includes predefined Facebook specific metadata mappings:
Class ID | Type | Behaviour | Explanation | Metadata fields included |
---|---|---|---|---|
author |
text | content | /FacebookXmlRecord/eventOwner/name , /FacebookXmlRecord/page/nameWithLocationDescriptor , /FacebookXmlRecord/postFrom/name |
|
authorId |
text | display | /FacebookXmlRecord/eventOwner/id , /FacebookXmlRecord/postFrom/id |
|
c |
text | content | Description | /FacebookXmlRecord/eventDescription , /FacebookXmlRecord/page/about , /FacebookXmlRecord/page/description , /FacebookXmlRecord/postMessage |
category |
text | content | /FacebookXmlRecord/page/category |
|
city |
text | content | /FacebookXmlRecord/eventVenue/city , /FacebookXmlRecord/page/location/city , /FacebookXmlRecord/postLocation/city |
|
country |
text | content | /FacebookXmlRecord/eventVenue/country , /FacebookXmlRecord/page/location/country , /FacebookXmlRecord/postLocation/country |
|
d |
date | date | Date | /FacebookXmlRecord/eventStartTime , /FacebookXmlRecord/postCreatedTime |
eventEndTime |
text | display | /FacebookXmlRecord/eventEndTime |
|
eventPrivacy |
text | display | /FacebookXmlRecord/eventPrivacy |
|
identifier |
text | display | /FacebookXmlRecord/eventId , /FacebookXmlRecord/page/id , /FacebookXmlRecord/postId |
|
image |
text | display | /FacebookXmlRecord/page/cover/source , /FacebookXmlRecord/postPictureURL |
|
latLong |
geospatial x/y co-ordinate | N/A | /FacebookXmlRecord/eventVenue/latitudeLong , /FacebookXmlRecord/postLocation/latLong |
|
location |
text | display | Event location | /FacebookXmlRecord/eventLocation |
pageFounded |
text | display | /FacebookXmlRecord/page/founded |
|
pageMission |
text | display | /FacebookXmlRecord/page/mission |
|
pageProduct |
text | display | /FacebookXmlRecord/page/products |
|
phone |
text | display | /FacebookXmlRecord/page/phone |
|
postcode |
text | display | Zip/post code | /FacebookXmlRecord/eventVenue/zip , /FacebookXmlRecord/page/location/zip , /FacebookXmlRecord/postLocation/zip |
postIconUrl |
text | display | /FacebookXmlRecord/postIconURL |
|
postLink |
text | content | Post link | /FacebookXmlRecord/postLink |
postLinkDescription |
text | content | /FacebookXmlRecord/postLinkDescription |
|
postLinkTitle |
text | content | /FacebookXmlRecord/postLinkCaption |
|
state |
text | display | /FacebookXmlRecord/eventVenue/State , /FacebookXmlRecord/page/location/state , /FacebookXmlRecord/postLocation/state |
|
street |
text | display | /FacebookXmlRecord/eventVenue/street , /FacebookXmlRecord/page/location/street , /FacebookXmlRecord/postLocation/street |
|
t |
text | content | Event/page title | /FacebookXmlRecord/eventName , /FacebookXmlRecord/page/name |
type |
text | display | /FacebookXmlRecord/type |
Use the -SF
query processor option to access these metadata fields on the
search response and in the templates (i.e. -SF=[author,country]
).
Limits
Please note that Facebook applies limits to the volume of content which can be retrieved from their APIs, and so in the case of large pages Funnelback may be unable to gather all historical content.
Caveats
Crawling Facebook events is only possible if the facebook.access-token property is specified with a never expiring page access token.
Working with the fetched data
Funnelback will crawl Facebook and convert responses into XML. You can use the metadata customisation tool to map elements to a metadata class.
Note: To preview the crawled records please enable debug mode by setting facebook.debug=true
in collection.cfg
file.
FacebookQueryPost XML Example
<FacebookXmlRecord>
<postId>post_id</postId>
<url>www.facebook.com/the_post_id</url>
<postMessage>this is the post message</postMessage>
<postCreatedTime>Tue Aug 27 13:42:17 EST 2013</postCreatedTime>
<type>POST</type>
<postFrom class="com.restfb.types.CategorizedFacebookType">
<id>id of poster</id>
<name>Name of poster</name>
<category>Community</category>
</postFrom>
<postComments>
<FacebookXmlRecord>
<commentId>comment id</commentId>
<url>www.facebook.com/comment id</url>
<commentMessage>comment one of three</commentMessage>
<commentFrom class="com.restfb.types.CategorizedFacebookType">
<id>Comment poster ID</id>
<name>Name of poster</name>
<category>Community</category>
</commentFrom>
<commentCreatedTime>2013-08-27 03:42:37.0 UTC</commentCreatedTime>
<type>COMMENT</type>
</FacebookXmlRecord>
</postComments>
</FacebookXmlRecord>
FacebookQueryEvent XML Example
<FacebookXmlRecord>
<eventId>id</eventId>
<url>www.facebook.com/id</url>
<eventName/>
<eventDescription/>
<eventStartTime>2024-02-13 20:00:00.0 UTC</eventStartTime>
<eventEndTime>2024-02-13 23:00:00.0 UTC</eventEndTime>
<eventLocation>Lima, Peru</eventLocation>
<eventVenue>
<city/>
<country/>
<latitude>-12.043333</latitude>
<longitude>-77.028333</longitude>
<state/>
<street/>
<zip/>
</eventVenue>
<eventPrivacy>OPEN</eventPrivacy>
<eventOwner>
<id>owner id</id>
<name/>
</eventOwner>
<EventRsvpStatus/>
<type>EVENT</type>
</FacebookXmlRecord>
FacebookQueryPage XML Example
<FacebookXmlRecord>
<pageId>id</pageId>
<url>http://www.facebook.com/pages/page</url>
<page>
<id>id</id>
<name>page namme</name>
<category/>
<link>http://www.facebook.com/pages/page</link>
<founded/>
<mission/>
<products/>
<description>the long description</description>
<phone/>
<about>Another description</about>
<talkingAboutCount>0</talkingAboutCount>
<isPublished>true</isPublished>
<location>
<street/>
<city/>
<state/>
<country/>
<zip/>
</location>
</page>
<type>PAGE</type>
</FacebookXmlRecord>