Command line administration

The recommended method to administer a Funnelback installation is through the web based administration interface. This provides an easy-to-use frontend for administrators. However, it is also possible to administer Funnelback from the command line. This is useful if other systems need to be integrated with Funnelback.

Locations

We assume in the following instructions that the $SEARCH_HOME environment variable is defined. This should point to your installation directory. By default, this is /opt/funnelback/ on Linux and C:\funnelback\ on Windows.

$SEARCH_HOME/bin/

Contains all administration scripts.

$SEARCH_HOME/conf/

Contains various global configuration files, as well as collection specific configuration files, under $SEARCH_HOME/conf/<collection name>.

$SEARCH_HOME/log/

Contains global log files, such as the create.log file, which records creation of collections and the delete.log file, which records deletion of collections.

$SEARCH_HOME/web/

Contains files relating to the admin console and public search interface (such as the cgi files). Web server configuration files are stored in $SEARCH_HOME/web/conf/.

$SEARCH_HOME/data/

Contains collection specific data, such as gathered documents, indexes and log files.The data area has the following structure:

Each collection will have a subdirectory under data, containing live, offline, log and archive directories.
The archive directory contains compressed query and click log files for the collection.
The log directory contains logs that don't fit into any other category, such as update logs and reporting logs.
The live and offline directories contain gathered and filtered documents (in "data"), indexes (in "idx") and collection specific logs (in "log").

Creating a collection

All collection configuration files are created from a collection template at $SEARCH_HOME/conf/collection.cfg.default.

All configuration information for a collection is stored in a directory at $SEARCH_HOME/conf/<collection name>/. This includes the main collection.cfg file.

To create a collection from the command line, administrators can create the collection configuration directory, copy the collection template to collection.cfg in this directory, edit the collection configuration and run create-collection.pl over the collection configuration.

A separate convenience script, new-collection.pl, is available and will create the configuration directory and collection configuration file automatically. An optional start URL or location can be passed to this script, as well as a type, allowing the creation of web, local, filecopy, database collections, etc.

The created collection configuration should still be manually checked and edited to change default configuration options. The following options are especially important to check:

collection
collection_root
exclude_patterns
include_patterns
service_name
start_url

Creating a meta collection

A meta collection is one which has no data or indexes of its own but instead points to a set of underlying collections. To create a meta collection, administrators can use the new-collection.pl script, specifying a "meta" collection type.

The administrator must then create a meta.cfg file in the appropriate location: $SEARCH_HOME/conf/<collection name>/meta.cfg. This file is used to list the sub-collections which make up the meta collection.

The format is to list the internal names of the sub-collections, one per line. For example, the file might look like:

funnelback_website
shakespeare

You also need to create an index.sdinfo file which lists the full path to the index stems for the subsidiary collections. This file should be placed in $SEARCH_HOME/data/<collection name>/live/idx/ and $SEARCH_HOME/data/<collection name>/offline/idx/, and will look something like:

$SEARCH_HOME/data/funnelback_website/live/idx/index
$SEARCH_HOME/data/shakespeare/live/idx/index

Once this is done the meta collection will be as up to date as its component subcollections. This means that you do not need to call the update script for a meta collection.

Updating a collection

To update a collection, use the update.pl script, redirecting the output status messages to an appropriately named update log e.g. update-<collection>.log:

update.pl $SEARCH_HOME/conf/example/collection.cfg > $SEARCH_HOME/log/update-example.log 2>&1

Note that an update may take a significant amount of time, depending upon the update timeout, number of documents found and other factors.

During the update, messages will be logged to the appropriate logs in $SEARCH_HOME/data/<collection name>/offline/log/ and $SEARCH_HOME/data/<collection name>/log/.

Lock files

To prevent multiple simultaneous updates of the same collection, update.pl will create a lock file at the start of an update. This lock file will be placed at $SEARCH_HOME/data/<collection name>/log/<collection name>.lock. A collection update will not occur unless update.pl can create and gain exclusive access to this lock file. The lock file is removed at the end of a successful update or if an error occurs during the update.

Deleting a collection

Administrators may fully delete a collection using the delete-collection.pl script. This script will delete all data and configuration associated with the deleted collection:

gathered documents
indexes
configuration files
scheduled updates
logs

User configuration files are also edited to remove references to the deleted collection.

Command line scripts reference

Detailed internal documentation may be gained for many scripts through the standard Perl "perldoc" command.

new-collection.pl

Creates a collection, including its collection.cfg file.

new-collection.pl <collection name> <collection type> [start url]

create-collection.pl

Creates a collection from an already existing collection.cfg file.

create-collection.pl <collection config>

delete-collection.pl

Deletes a collection, including its gathered documents, indexes, configuration, scheduled updates and logs. It also removes references to the now non-existent collection from user configuration files.

delete-collection.pl <collection config>

update.pl

Update.pl is a wrapper around the entire update process, and calls the appropriate update subscripts.

update.pl <collection config> [update type: -incremental, -reindex, ...]

crawl.pl

Gathers documents from web collections.

crawl.pl <collection config> [update type: -check, -incremental, -instant-update]

filecopy.pl

Gathers documents from filecopy collections.

filecopy.pl <collection config> [other options]

dbgather.pl

Gathers documents from database collections.

dbgather.pl <collection config> [--full] [other options]

index.pl

Calls Padre to index a collections documents.

index.pl <collection config> [-reindex] [-instant-update]

make_report.pl

Processes a collections data files, producing reports on their contents.

make_report.pl [--collection "collection config"] [--log] ...

outliers-log-processing.pl

Updates the trend alerts reports for a collection (or all collections if none is specified).

outliers-log-processing.pl [--collection "collection name"]

swap-views.pl

Swaps the live and offline views of a collection after a successful update, placing the newly gathered and indexed data in live for querying, and safely storing the older gathered and indexed data in offline.

swap-views.pl <collection config> [-force]

archive-log.pl

Archives a collections queries.log and clicks.log log files to the collection's archive directory.

archive-log.pl <collection config> [view]

reports-load-queries-log.pl

Reads a collections log files and stores a binary database for reporting purposes. The admin UI report frontend will read this database for displaying reports.

reports-load-queries-log.pl [--collection "collection-id"]

reports-send-email.pl

Sends email query reports to users who have requested them for the specified collection (or for all if none was specified).

reports-send-email.pl [--collection "collection-id"]

modify_perl_hashbang_line.pl

Updates the location of the perl interpreter for all .cgi and .pl scripts.

modify_perl_hashbang_line.pl

mediator.pl

Trigger local or remote administrative tasks.

mediator.pl --help

change_password.sh

Changes a user's password.

$SEARCH_HOME/web/bin/change_password.sh <user> <password>

Managing encrypted passwords

Funnelback encrypts passwords within collection.cfg in an encrypted form to minimise the risk of exposing the passwords. Administrators may, however, need to be able to encrypt and decrypt passwords manually, and this can be achieved by using the following commands.

Within these examples Funnelback is assumed to be installed at /opt/funnelback and the example password used is secret - Please adjust the commands to suit your specific circumstance.

Please be aware that:

Funnelback will generate a unique encryption key on installation, so your encrypted passwords will not match this example.
The commands themselves may change between Funnelback versions and hence they may not be appropriate for use in long-lived scripts.

Encrypting

> GROOVY_TURN_OFF_JAVA_WARNINGS=true JAVA_HOME=/opt/funnelback/linbin/java/ /opt/funnelback/tools/groovy/bin/groovy -cp "/opt/funnelback/lib/java/all/*" -e "println new com.funnelback.config.security.ConfigPasswordEncryptionService(new File('/opt/funnelback')).encryptPassword(new com.funnelback.config.types.ConfigPasswordUnencrypted('secret')).getEncryptedPassword();"

ENCRYPTED:AU4zJO2wUnoORBxsutBDZr1tnbUFqt7WAKd8vmzIDJm3hxnmbgHi

Decrypting

> GROOVY_TURN_OFF_JAVA_WARNINGS=true JAVA_HOME=/opt/funnelback/linbin/java/ /opt/funnelback/tools/groovy/bin/groovy -cp "/opt/funnelback/lib/java/all/*" -e "println new com.funnelback.config.security.ConfigPasswordEncryptionService(new File('/opt/funnelback')).decryptPassword(new com.funnelback.config.types.ConfigPasswordEncrypted('ENCRYPTED:AU4zJO2wUnoORBxsutBDZr1tnbUFqt7WAKd8vmzIDJm3hxnmbgHi')).getCleartextPassword();"

secret

top