Database collections
Introduction
A database collection indexes the data from a relational database. Funnelback uses an SQL query to obtain data from the database. Each row returned by the query is saved as a separate XML file.
Before you start
Funnelback's database connector indexes the single table of results returned by an SQL query as separate XML records.
Because of this, the table that is returned by the query must be fully denormalised (Funnelback can't resolve any ID references to other database records and has no concept of the relational database).
The best way to achieve this is to set up a database view containing all the (denormalised) fields that are relevant to the search that is being set up.
For databases that have multiple types of content it is often necessary to set up several Funnelback database collections that index each of these types separately.
Supported databases
The database connector utilises a JDBC database driver to connect to the database system. Funnelback ships with a driver for PostgreSQL. Indexing of other databases require installation of an appropriate JDBC driver.
See: Installing JDBC drivers for further information on supported databases and driver installation.
Configuring a database collection
Create the collection
Directory collections are created by selecting create collection and selecting database from the administration interface.
The database collection is defined by configuring the following properties:
- database driver: a valid JDBC database driver, installed on the Funnelback server (see above)
- URI: JDBC URI of the database
- SQL query to select the rows
- primary key column in the SQL query
- user name and password for the database
Test the connection settings by clicking the check database connectivity button.
This button allows a quick check of the database configuration to be performed to ensure Funnelback is able to connect.
Clicking the button first causes a connection to be established based on the database connection configuration options. If the connection fails, an error indicating the type of failure and suggested next steps will be provided. If the connection succeeds, the specified SQL query will then be executed to ensure it is valid and that it contains the specified primary key column.
After creating the collection, perform an initial update of the collection.
Define database field mappings
After a successul update has run, configure metadata mappings for the database fields. The update of the database collection downloads each row from the SQL query and converts it to an XML record.
The fields that were detected in the SQL results should be listed amongst the XML sources. Map the relevant fields to metadata classes. Relevant fields include:
- fields that you wish to display in the search results summaries, or use for faceted navigation.
- fields that contribute something useful to the record's searchability. e.g. keywords, types.
- If there is a field containing a URL that should be used as the target when the result is clicked on then this should be mapped as the document's URL using the advanced XML configuration options available from the administer tab in the administration interface.
Rebuild the index by selecting reindex the live view from the advanced update option on the update tab in the administration interface.
After the reindex is complete metadata should be available for display in the search results.
Configure search results
Search results can then be configured. This involves:
- Defining the metadata classes to return in the search results by setting the summary fields (
-SF
) query processor options. - If using a Freemarker search template, customising the template to display the metadata fields in the search results template within the
<@s.Results>
code block.
Live links
By default the URL used in database collections is a system-assigned URL. If the default system-assigned URL is not modified (for example by filters) the modern UI will use the cache link as the live URL. This will result in the XML of the record being shown to the user when a result is clicked. You can use XSLT on the cache controller to style the response.
Modifying database records
Database records can be modified (locally) prior to indexing using the filter framework.
Configuration options
The following database options are available for database collections. These options can be set in the collection.cfg
.
Option | Description |
---|---|
db.bundle_storage_enabled | Allows storage of data extracted from a database in a compressed form. |
db.custom_action_java_class | ⚠ Deprecated. Use the filter framework instead. Allows a custom java class to modify data extracted from a database before indexing. |
db.full_sql_query | The SQL query to perform on a database to fetch all records for searching. |
db.incremental_sql_query | The SQL query to perform to fetch new or changed records from a database. |
db.incremental_update_type | Allows the selection of different modes for keeping database collections up to date. |
db.jdbc_class | The name of the Java JDBC driver to connect to a database. |
db.jdbc_url | The URL specifying database connection parameters such as the server and database name. |
db.password | The password for connecting to the database. |
db.primary_id_column | The primary id (unique identifier) column for each database record. |
db.single_item_sql | An SQL command for extracting an individual record from the database |
db.update_table_name | The name of a table in the database which provides a record of all additions, updates and deletes. |
db.username | The username for connecting to the database. |
db.use_column_labels | Flag to control whether column labels are used in JDBC calls in the database gatherer |
db.xml_root_element | The top level element for records extracted from the database. |
filter.classes | Specifies which java classes should be used for filtering documents. |