Skip to content

Configuring search analytics

Excluding items from analytics reports

Excluding queries containing or matching specific terms

  • Queries that exactly match specific terms can be removed from the analytics reports by adding the term to the reporting blacklist.
  • Queries containing specific terms can be removed from the analytics reports by adding the term to the reporting stop words list.

Excluding queries from specific IP addresses

  • Queries that exactly match specific IP addresses can be removed from the analytics reports by adding the term to the reporting blacklist.

Configuring search analytics alerts

The analytics system can be configured, per collection, to send out notification emails which provide a regular analytics summary, or trend alert notifications.

Email notifications can be configured for the following:

  • Query report summary: The notification frequency can be chosen.
  • Trend alert notifications: Trend alerts can be sent out when a query spike is detected to allow real-time action to be taken as required.

The alerts are sent to one or more email addresses.

To configure email alerts for analytics, click the edit analytics email settings link in the collection's analyse tab.

Spike-email-link.png

The email settings provide the following options:

  • Email sender: Configures the email address used as the from address for analytics emails. Must contain a single valid email address.
  • Email addresses: Configures the list of recipients for analytics alerts. Must contain a comma separated list of email addresses.
  • Query report summary: Configures the frequency of query report alerts.
  • Trend alert email: Enables email alerting for trend alerts.

Trend-alerts-email-settings.png

Please note that Funnelback must be configured with a valid SMTP server during installation for email to be sent successfully. SMTP settings can be adjusted in the global.cfg file in $SEARCH_HOME/conf/global.cfg if required.

Search analytics update options

Funnelback provides a number of configuration options that affect how analytics is built.

Report update options

OptionDescription
analytics.max_heap_sizeConfigure the maximum memory (Java heap size) allocated for analytics updates.
analytics.scheduled_database_updateControl whether reports for the collection are updated on a scheduled basis.
userid_to_logControls how logging of IP addresses is performed.

Query report options

OptionDescription
analytics.data_miner.range_in_daysLength of time range (in days) the analytics data miner will go back from the current date when mining query and click log records.
analytics.reports.max_facts_per_dimension_combinationAdvanced setting: controls the amount of data that is stored by query reports.
analytics.reports.checkpoint_rateAdvanced setting: controls the rate at which the query reports system checkpoints data to disk.
analytics.reports.disable_incremental_reportingDisable incremental reports database updates. If set all existing query and click logs will be processed for each reports update.

Trend alerts options

OptionDescription
analytics.outlier.day.minimum_average_countTrend alerts: Control the minimum number of occurrences of a query required before a day pattern can be detected.
analytics.outlier.day.thresholdTrend alerts: Control the day pattern detection threshold.
analytics.outlier.exclude_collectionTrend alerts: Disable query spike detection for a collection
analytics.outlier.exclude_profilesTrend alerts: Disable query spike detection for a profile
analytics.outlier.hour.minimum_average_countTrend alerts: Control the minimum number of occurrences of a query required before a hour pattern can be detected.
analytics.outlier.hour.thresholdTrend alerts: Control the hour pattern detection threshold.

Query log naming and rotation

When operating Funnelback in a multi-server configuration, some care must be taken to ensure query logs are available to the trend alerts system. For performance reasons trend alerts requires that archived log files be identified with a date stamp in the file name (for example queries.log.20090902.gz), as these date stamps are used to restrict the logs required for pattern analysis.

Standard practice within a multiple server set-up would be to transfer all query log files to a server responsible for analytics, retaining the date stamp in the file name and adding a hostname to ensure log names are unique.

Please also note that trend alerts may fail to detect some queries when processing historical data for collections which have been updated less frequently than once per month. This issue can be rectified by manually splitting any query log files spanning more than a day into individual logs with date stamps.

Analytics reports hardware requirements

The table below gives minimum hardware requirements for processing various query log volumes.

Number of queriesMinimum memoryMinimum hard disk space
>= 20 million over 3 years2.5GB10GB
10 million1.5GB8GB
5 million1GB6GB
<= 1 million500MB4GB

When updating query reports for a collection with a large number of queries the analytics.max_heap_size collection setting should be increased.

top

Funnelback logo
v15.16.0