Connecting enterprise repositories using Manifold CF
Funnelback includes support for connecting to a number of enterprise repository systems through an open source connector framework called Manifold CF. This framework, along with the associated Funnelback ManifoldCF connector, allows Funnelback to be populated content from supported repositories and to apply Document level security for repositories where Manifold CF supports fetching security information.
As of writing, the current version of ManifoldCF, 2.3, supports the following repositories: Alfresco, Any CMIS repository, DropBox, Google Drive, HDFS, Jira, LiveLink (OpenText), Documentum, SharePoint, Meridio and FileNet (IBM).
See https://manifoldcf.apache.org/release/release-2.3/en_US/included-connectors for specific details about supported versions of these repositories.
Funnelback integrates with the connector framework by providing a connector which allows content to be filtered, formatted and added to a Push collection, as well as Funnelback components allowing Funnelback to provide document level security.
The following diagram illustrates how Funnelback interacts with the repository and the authority systems via the connector framework.
The blue lines within the diagram illustrate the gathering of content form the repository system (by the connector framework), and the addition of the content to a Funnelback push collection. The red lines within the diagram illustrate the interactions which occur during a search request to ensure that only content which should be visible to the current search user is returned.
Manifold CF should be downloaded from http://manifoldcf.apache.org/en_US/download and deployed according to its instructions at https://manifoldcf.apache.org/release/release-2.3/en_US/how-to-build-and-deploy#Running+ManifoldCF. Please note that the current version of the Funnelback connector has only been tested with Manifold CF 2.3, however it is believed that later 2.x versions will continue to work correctly, though any future ManifoldCF 3.x releases are not likely to be compatible with this version.
Once ManifoldCF has been installed, please follow the README.txt instructions provided within the manifoldcf-funnelback-connector.zip archive included in $SEARCH_HOME/tools.
Please note that for performance reasons, Funnelback strongly recommends installing Manifold CF on a dedicated server, and using a standalone database rather than the embedded one included with the default ManifoldCF installation.
Configuring the Funnelback collection
The first step in configuring Funnelback to interact with an enterprise repository is to create a target push collection within the Funnelback installation. See also additional collection creation details. After creating it, make a note of the push collection's name.
Collections intended for use with ManifoldCF require a number of additional collection.cfg settings to configure authentication and document level security.
The ManifoldCF system to use as a user authority (usually something like http://manifoldcf-server.example.com:8345/mcf-authority-service)
The name of the domain in which users are assumed to belong.
The user mapper to use to access user information from the authority. ManifoldCF should be used with ui.modern.authentication on Windows, and ManifoldCFDebug can be used on Linux or on other systems where the remote user name should be passed into Funnelback through a URL parameter (i.e. &user=THE_USERNAME).
true on Windows (this causes Funnelback to authenticate search users against the active directory domain in which the Funnelback server resides.
Configuring the Funnelback Connector
New output connectors can be created in ManifoldCF from the 'List Output Connectors' link on the left. After selecting Funnelback as the output type a number of Funnelback specific settings are provided as detailed below.
The server URL must point at a Funnelback server's push-api service, which runs by default on port 8443 alongside the administration interface. An example URL for the service is https://funnelback-server.example.com:8443/push-api
The username and password provided must be a valid administration user with permission to access the push collection.
The filter classes setting must be in the form specified by the filter.classes collection.cfg option.
Configuring the authority connector
Funnelback will access the configured authority connector to obtain information about the permissions assigned to a given user when they make a search request.
Active Directory is the most commonly used authority, and documentation on configuring it is available from the ManifoldCF's "Defining Authority Connections" documentation.
Configuring the repository connector
ManifoldCF will gather content from external repository systems based on the settings provided in a repository connector. ManifoldCF's "Defining Repository Connections" documentation details how to configure repository connections.
Configuring the gathering 'job'
ManifoldCF links repository connections to output connections (such as Funnelback) through a gathering job. ManifoldCF's documentation for creating new jobs and for executing jobs provides details of how to create and execute these jobs from within the ManifoldCF administration interface.
Searching the collection
Once the gathering job has been run it should be possible to perform search requests against the push collection through the search box within the Funnelback interface. When using
security.earlybinding.user-to-key-mapper=ManifoldCF search requests will automatically be authenticated against ActiveDirectory, however for testing or on Linux
security.earlybinding.user-to-key-mapper=ManifoldCFDebug can be used. When using ManifoldCFDebug an additional &user=THE_USERNAME parameter must be included in the search request URL to tell Funnelback which user to secure the results for. Note that ManifoldCFDebug is not a secure approach to use unless other steps have been taken to prevent users from modifying this user parameter (e.g. by providing access to the search only via some other wrapper which controls the user parameter).
Configuring additional ManifoldCF repositories
Due to licensing restrictions, ManifoldCF's standard package does not include all the libraries required for some repository connectors, so these libraries must be installed and the relevant connectors enabled before they can be used. Details on specific libraries required are available in README files at
Some repository connectors, such as Documentum and FileNet, also require supporting processes to be run alongside ManifoldCF. The systems required are included within ManifoldCF at
$MANIFOLD_HOME/*-process/. Further details are available in the ManifoldCF 'Building and Deploying' documentation.
Once the appropriate libraries are in place and any required processes are running, the relevant repository connector must be enabled within
$MANIFOLD_HOME/connectors.xml. After changing this file, the Funnelback jetty web server process must be restarted for the change to take effect.
Assistance with the ManifoldCF system is available from the ManifoldCF mailing lists.