Command line administration
The recommended method to administer a Funnelback installation is through the web based administration interface. This provides an easy-to-use frontend for administrators. However, it is also possible to administer Funnelback from the command line. This is useful if other systems need to be integrated with Funnelback.
We assume in the following instructions that the $SEARCH_HOME environment variable is defined. This should point to your installation directory. By default, this is /opt/funnelback/ on Linux and C:\funnelback\ on Windows.
Contains all administration scripts.
Contains various global configuration files, as well as collection specific configuration files, under
Contains global log files, such as the create.log file, which records creation of collections and the
delete.log file, which records deletion of collections.
Contains files relating to the admin console and public search interface (such as the cgi files). Web server configuration files are stored in
Contains collection specific data, such as gathered documents, indexes and log files.The data area has the following structure:
- Each collection will have a subdirectory under data, containing live, offline, log and archive directories.
- The archive directory contains compressed query and click log files for the collection.
- The log directory contains logs that don't fit into any other category, such as update logs and reporting logs.
- The live and offline directories contain gathered and filtered documents (in "data"), indexes (in "idx") and collection specific logs (in "log").
Creating a collection
All collection configuration files are created from a collection template at
All configuration information for a collection is stored in a directory at
$SEARCH_HOME/conf/<collection name>/. This includes the main collection.cfg file.
To create a collection from the command line, administrators can create the collection configuration directory, copy the collection template to collection.cfg in this directory, edit the collection configuration and run create-collection.pl over the collection configuration.
A separate convenience script, new-collection.pl, is available and will create the configuration directory and collection configuration file automatically. An optional start URL or location can be passed to this script, as well as a type, allowing the creation of web, local, filecopy, database collections, etc.
The created collection configuration should still be manually checked and edited to change default configuration options. The following options are especially important to check:
Creating a meta collection
A meta collection is one which has no data or indexes of its own but instead points to a set of underlying collections. To create a meta collection, administrators can use the new-collection.pl script, specifying a "meta" collection type.
The administrator must then create a meta.cfg file in the appropriate location:
$SEARCH_HOME/conf/<collection name>/meta.cfg. This file is used to list the sub-collections which make up the meta collection.
The format is to list the internal names of the sub-collections, one per line. For example, the file might look like:
You also need to create an index.sdinfo file which lists the full path to the index stems for the subsidiary collections. This file should be placed in
$SEARCH_HOME/data/<collection name>/live/idx/ and
$SEARCH_HOME/data/<collection name>/offline/idx/, and will look something like:
Once this is done the meta collection will be as up to date as its component subcollections. This means that you do not need to call the update script for a meta collection.
Updating a collection
To update a collection, use the update.pl script, redirecting the output status messages to an appropriately named update log e.g.
update.pl $SEARCH_HOME/conf/example/collection.cfg > $SEARCH_HOME/log/update-example.log 2>&1
Note that an update may take a significant amount of time, depending upon the update timeout, number of documents found and other factors.
During the update, messages will be logged to the appropriate logs in
$SEARCH_HOME/data/<collection name>/offline/log/ and $SEARCH_HOME/data/<collection name>/log/.
To prevent multiple simultaneous updates of the same collection, update.pl will create a lock file at the start of an update. This lock file will be placed at
$SEARCH_HOME/data/<collection name>/log/<collection name>.lock. A collection update will not occur unless
update.pl can create and gain exclusive access to this lock file. The lock file is removed at the end of a successful update or if an error occurs during the update.
Deleting a collection
Administrators may fully delete a collection using the
delete-collection.pl script. This script will delete all data and configuration associated with the deleted collection:
- gathered documents
- configuration files
- scheduled updates
User configuration files are also edited to remove references to the deleted collection.
Command line scripts reference
Detailed internal documentation may be gained for many scripts through the standard Perl "perldoc" command.
Creates a collection, including its collection.cfg file.
new-collection.pl <collection name> <collection type> [start url]
Creates a collection from an already existing collection.cfg file.
create-collection.pl <collection config>
Deletes a collection, including its gathered documents, indexes, configuration, scheduled updates and logs. It also removes references to the now non-existent collection from user configuration files.
delete-collection.pl <collection config>
Update.pl is a wrapper around the entire update process, and calls the appropriate update subscripts.
update.pl <collection config> [update type: -incremental, -reindex, ...]
Gathers documents from web collections.
crawl.pl <collection config> [update type: -check, -incremental, -instant-update]
Gathers documents from filecopy collections.
filecopy.pl <collection config> [other options]
Gathers documents from database collections.
dbgather.pl <collection config> [--full] [other options]
Calls Padre to index a collections documents.
index.pl <collection config> [-reindex] [-instant-update]
Processes a collections data files, producing reports on their contents.
make_report.pl [--collection "collection config"] [--log] ...
Updates the trend alerts reports for a collection (or all collections if none is specified).
outliers-log-processing.pl [--collection "collection name"]
Swaps the live and offline views of a collection after a successful update, placing the newly gathered and indexed data in live for querying, and safely storing the older gathered and indexed data in offline.
swap-views.pl <collection config> [-force]
Archives a collections queries.log and clicks.log log files to the collection's archive directory.
archive-log.pl <collection config> [view]
Reads a collections log files and stores a binary database for reporting purposes. The admin UI report frontend will read this database for displaying reports.
reports-load-queries-log.pl [--collection "collection-id"]
Sends email query reports to users who have requested them for the specified collection (or for all if none was specified).
reports-send-email.pl [--collection "collection-id"]
Updates the location of the perl interpreter for all .cgi and .pl scripts.
Trigger local or remote administrative tasks.
Changes a user's password.
$SEARCH_HOME/web/bin/change_password.sh <user> <password>