Skip to content

Social media collections: Flickr


Flickr is a social media site focused sharing photos.

Funnelback can crawl user and groups on Flickr. Funnelback crawls the textual data available for each photo and gives you information such as the name and the link for each photo.

A Flickr gathering template is included as part for Funnelback's social media collections support to allow content from Flickr to be gathered and then presented within Funnelback search results.

Please note that your usage of Funnelback to gather content from Flickr must comply with Flickr's terms of service.

Setting up collection

To create a Flickr collection, you will need to create a social media collections and select the Flickr template.

Once you have created the collection you will need to fill out the collection.cfg file. To do this go to the Administer tab of the admin home page and then click on the Browse Collection Configuration Files link.

Getting API keys and tokens

To crawl Flickr you need a API key, secret and user authentication tokens.

Getting your API key and secret

To get your the API key and secret you will first need a Flickr account. Once you have to apply for an app. If you want to find the API keys and secrets you already have visit API - registered keys.

Getting user authentication tokens

A API key and secret will only let you talk to flickr's API and fetch some public content such as the public content on a users page. If you want to fetch content from a group you may need to auth a user authentication token even though the photos themselves are completely public. To join your Flickr account to the group, ensure you are logged in as your user and run:

# Unix
java -cp '$SEARCH_HOME/lib/java/all/*' apiKey apiSecret

# Windows
java -cp %SEARCH_HOME%\lib\java\all\^* apiKey apiSecret

This will return a URL, ensure you visit that URL as the correct Flickr user. As you are getting authentication tokens for the user that is currently logged in.

Configuration options

Flickr's gathering template will read the configuration from collection.cfg. The following settings are supported:

  • flickr.api-key: API key.
  • flickr.api-secret: API secret.
  • flickr.auth-token: Authentication token.
  • flickr.auth-secret: Authentication secret.
  • flickr.user-ids: Comma delimited list of user accounts IDs to crawl.
  • flickr.groups.public: List of group IDs to crawl with a "public" view. Only public photos part of the group will be retrieved, private photos will be skipped.
  • flickr.groups.private: List of group IDs to crawl with a "private" view. All photos part of the group (public and private) will be retrieved. Note that the user id specified in flickr.user-ids must be a member of the group to access private photos.
  • flickr.debug: Boolean flag to enable debug mode. When debug mode is enabled the gathering script will print out the crawled records in XML form.


# This is the ID of the National Library of Australia
# This is a public group the National Library of Australia is member of

The gathering template can be further customised to crawl only specific entity types (e.g. photos for a group, photos for a user).

Metadata mappings

The Flickr gathering template includes a number of Flickr specific metadata mappings:

Class IDTypeBehaviourExplanationMetadata fields included
authortextcontent/, /
ctextcontentDescription/, /
imageSmalltextdisplayImage URL - small/
imageMediumtextdisplayImage URL - medium/
imageLargetextdisplayImage URL - large/
imageSquaretextdisplayImage URL - small (square)/
image320PixelstextdisplayImage URL - small (320px)/
image640PixelstextdisplayImage URL - medium (640px)/
image800PixelstextdisplayImage URL - medium (800px)/
imageThumbnailtextdisplayThumbnail URL/
latLonggeospatial x/y co-ordinateN/A/
ttextcontentTitle/, /
usernametextdisplay/, /


Please note that Flickr applies limits to the volume of content which can be retrieved from their APIs, and so in the case of large photo streams Funnelback may be unable to gather all historical content.

Customising gathering template

To apply further customisation please edit custom_gather.groovy.

Define queries

To crawl Flickr you need to tell Funnelback what to crawl. This is done by specifying queries to Flickr, the general format of the queries is:

new FlickrQuery***(flickrConnector, idOfPageOrUser)


  • FlickrQuery***: The type of query Funnelback should expect. Care should be take to ensure the URL will return a page that is related to the flickrQuery.
  • flickrConnector: This is the connector to Flickr that can access private data (using authentication token) or only crawl public pages (using API key and secret).
  • id: The id of the user or page to crawl.

Types of queries


Gathers photos for a group.

new FlickrQueryGroupPhotos(flickrConnector, groupId)

If Flickr connector is using the user authentication tokens, the crawl will be able to get all photos that user can see on the group.

If Flickr connector is using only API key and secret, the crawl may not get all photos on the group as the entire photostream of the group may not be public. However the missed photos may indeed be public. It is recommend that when crawling a group you should join the Flickr user to the group and use the Flickr connection that has that user's authentication tokens.


Gathers all albums of a user.

new FlickrQueryPhotosets(flickrConnector, userId)


Gathers all of the public photos from the user's photostream.

new FlickrQueryUserStream(flickrConnector, userId)

You should create a list of all the queries to specify what you want to crawl.


// The following connector can access private data
def flickrPrivate = FlickrFactory.createFlickr(apiKey, apiSecret, userAuthToken, userAuthTokenSecret)

// The following connector can only crawl public pages
def flickrPublic = FlickrFactory.createFlickr(apiKey, apiSecret)

List<FlickrQuery> queries = []
groupIds.each { groupId ->
	queries.add(new FlickrQueryGroupPhotos(flickrPrivate, groupId))
	queries.add(new FlickrQueryGroupPhotos(flickrPublic, groupId))

In this example, the first query gets all of the photos in a group including both public and private ones; the second query only gets all of the photos which are publicly accessible.

// The following connector can only crawl public pages
def flickrPublic = FlickrFactory.createFlickr(apiKey, apiSecret)

List<FlickrQuery> queries = []
userIds.each { userId ->
	queries.add(new FlickrQueryUserStream(flickrPublic, userId))
	queries.add(new FlickrQueryPhotosets(flickrPublic, userId))

In this example, queries get all of the public photos (the first query) and albums (the second query) from the user's photostream.

Working with the crawled data

Funnelback will crawl Flickr and convert responses into XML. You can use the metadata customisation tool to map elements to metadata classes. The XML that Funnelback generates for a Flickr collection is as follows:

  <url>url to photo in flickr, do not use this url to get the actual image</url>

The <url> element will not give the image URL, it gives the URL of the photo inside a Flickr page. If you want to get the actual picture URL, ie something ending in .jpg, then you need to use a element whose name is <*Url>. For example <photoLargeImageUrl> will give you a URL to a large version of the picture and this could be included in a <img> HTML tag.

See also


Funnelback logo