Funnelback 11.0.0
Release notes for Funnelback 11.0.0
Released: 26 August 2011
New Features
- Redeveloped query processing layer for more efficient query processing and improved search presentation customisation.
- New Push collection type for feeding non-web content into a funnelback index from a remote system over time, without the scalability limitations of instant updates.
- New Directory collection type for searching Active Directory and LDAP repositories.
- Administrator search tuning system allowing search ranking factors to be optimised for specific collections.
- Content optimisation system which provides detailed guidelines for content authors on how to improve a specific result's ranking.
- Preview and publish system for developing search form files without affecting production search presentation.
- Ability to blend result sets for multiple queries from spelling suggestions, synonyms and other sources into a single result list.
- Assorted web crawling improvements including support for revisiting infrequently changing content less often.
Upgrade Issues
-
Result summaries aren't highlighted by default anymore so that form authors have complete control over the query highlighting. You'll need to use the
<s:boldicize />
tag on your existing forms to have the summaries highlighting back. - When upgrading trim collections from version 10, a full update of the collection is required to update the URLs of records to support the new instant update functionality. -
The
<s:boldicize />
and<s:italicize />
tags now use<strong>
and<em>
HTML tags instead of<b>
and<i>
previously. If you were using these tags in your CSS stylesheet you'll need to update it.
- Using the Crawler form interaction system no longer disables cookie support by default. If a collection is using the form interaction system and can't crawl password protected sites successfully after the upgrade, please explicitly disable cookie support by setting
crawler.accept_cookies=false
. - The default treatment of nepotistic links has been changed to limit their effect. This will reduce indexing time, and should have a positive effect on the ranking in most web collections, particularly large ones covering multiple domains. This change can be reverted by setting the -nep_action indexer option value to zero.
- The isolated mode filter has been renamed
IsolatedFilterProvider
(PreviouslyIsolatedPublishorFilterProvider
) and is now able to use any filter classes.- It will use the Tika filter provider by default, so you'll need to update your collection configurations if you want to continue using the Davisor filters in isolated mode.
- The
<s:Truncate>
tag no longer supports thestripMiddle
attribute. - The default behaviour for the web crawler is now to skip revisiting a proportion of infrequently changing pages during each crawl. This behaviour can be configured through the crawler revisit policy.
- Data reports are now specific to web collections and are no longer available for other collection types.
Selected improvements and bugfixes
- Increased permitted number of meta collection components.
- Added ability to analyse URLs remaining in a web crawl frontier.
- Support for gathering multiple Exchange mail boxes through the EntropySoft connector in a single collection.
- Added ability for web crawler to read cookies from a file on startup.
- Improved crawler form interaction cookie handling.
- Improved handling of non UTF-8 web content.
- Improved query highlighting in results, especially with UTF-8 characters.
- Corrected handling of UTF-8 form files.
- Support for collection profiles when tuning search quality.
- Added ability to index HTTP header and Facebook Opengraph protocol metadata.
- Fixed incorrect addition of collection name to C metadata by default.
- Reworked query completion JavaScript to avoid conflicts with other JavaScript libraries.
- Support for multiple facets per tag in freemarker templates.
- Added distance from origin to XML output when searching geospatial data.
- Reduced warning messages from result transforms on missing metadata.
- Added support for resolving relative links within the IncludeURL form tag.
- Better handling of special characters in indexer options.
- Added spelling whitelist file for words which should be provided as spelling suggestions.
- Changed boldicize tag to use HTML strong tags rather than bold tags.
- Changed query processing ordering to apply spelling suggestions after synonym expansion.
- Introduced ability to execute custom code during query processing.
- Eliminated log files produced by inactive crawler threads.
- Fixed incorrect permission settings on init.d scripts.
- Improved layout and display of the Funnelback administration interface.
- Fixed handling of column names with special characters during database gathering.
- Added setup documentation for IIS 7.5.
- Automated installation of 64bit versions of search indexing and query processing components.
- Improved crawler tolerance for timeouts on seed pages.
- Improved index 'warm up' scripts.
- Fixed sorting of results when early binding security is used.
- Added headers to CSV exports from the analytics dashboard.
- Added support for instant updates on TRIM collections.
- Improved Javascript link extraction logic to avoid some invalid link cases.
- Improved ordering of collections in Funnelback's administration interface.
- Added tools for managing WARC archive files.
- Fixed collection configuration cache clearing under mod_perl.