Configuring search analytics
Excluding items from analytics reports
Excluding queries containing or matching specific terms
- Queries that exactly match specific terms can be removed from the analytics reports by adding the term to the reporting blacklist.
- Queries containing specific terms can be removed from the analytics reports by adding the term to the reporting stop words list.
Excluding queries from specific IP addresses
- Queries that exactly match specific IP addresses can be removed from the analytics reports by adding the term to the reporting blacklist.
Configuring search analytics alerts
The analytics system can be configured, per collection, to send out notification emails which provide a regular analytics summary, or trend alert notifications.
Email notifications can be configured for the following:
- Query report summary: The notification frequency can be chosen.
- Trend alert notifications: Trend alerts can be sent out when a query spike is detected to allow real-time action to be taken as required.
The alerts are sent to one or more email addresses.
To configure email alerts for analytics, click the edit analytics email settings link in the collection's analyse tab.
The email settings provide the following options:
- Email sender: Configures the email address used as the from address for analytics emails. Must contain a single valid email address.
- Email addresses: Configures the list of recipients for analytics alerts. Must contain a comma separated list of email addresses.
- Query report summary: Configures the frequency of query report alerts.
- Trend alert email: Enables email alerting for trend alerts.
Please note that Funnelback must be configured with a valid SMTP server during installation for email to be sent successfully. SMTP settings can be adjusted in the global.cfg file in
$SEARCH_HOME/conf/global.cfg if required.
Search analytics update options
Funnelback provides a number of configuration options that affect how analytics is built.
Report update options
|analytics.max_heap_size||Configure the maximum memory (Java heap size) allocated for analytics updates.|
|analytics.scheduled_database_update||Control whether reports for the collection are updated on a scheduled basis.|
|userid_to_log||Controls how logging of IP addresses is performed.|
Query report options
|analytics.data_miner.range_in_days||Length of time range (in days) the analytics data miner will go back from the current date when mining query and click log records.|
|analytics.reports.max_facts_per_dimension_combination||Advanced setting: controls the amount of data that is stored by query reports.|
|analytics.reports.checkpoint_rate||Advanced setting: controls the rate at which the query reports system checkpoints data to disk.|
|analytics.reports.disable_incremental_reporting||Disable incremental reports database updates. If set all existing query and click logs will be processed for each reports update.|
Trend alerts options
|analytics.outlier.day.minimum_average_count||Trend alerts: Control the minimum number of occurrences of a query required before a day pattern can be detected.|
|analytics.outlier.day.threshold||Trend alerts: Control the day pattern detection threshold.|
|analytics.outlier.exclude_collection||Trend alerts: Disable query spike detection for a collection|
|analytics.outlier.exclude_profiles||Trend alerts: Disable query spike detection for a profile|
|analytics.outlier.hour.minimum_average_count||Trend alerts: Control the minimum number of occurrences of a query required before a hour pattern can be detected.|
|analytics.outlier.hour.threshold||Trend alerts: Control the hour pattern detection threshold.|
Query log naming and rotation
When operating Funnelback in a multi-server configuration, some care must be taken to ensure query logs are available to the trend alerts system. For performance reasons trend alerts requires that archived log files be identified with a date stamp in the file name (for example queries.log.20090902.gz), as these date stamps are used to restrict the logs required for pattern analysis.
Standard practice within a multiple server set-up would be to transfer all query log files to a server responsible for analytics, retaining the date stamp in the file name and adding a hostname to ensure log names are unique.
Please also note that trend alerts may fail to detect some queries when processing historical data for collections which have been updated less frequently than once per month. This issue can be rectified by manually splitting any query log files spanning more than a day into individual logs with date stamps.
Analytics reports hardware requirements
The table below gives minimum hardware requirements for processing various query log volumes.
|Number of queries||Minimum memory||Minimum hard disk space|
|>= 20 million over 3 years||2.5GB||10GB|
|<= 1 million||500MB||4GB|
When updating query reports for a collection with a large number of queries the analytics.max_heap_size collection setting should be increased.