Funnelback 11.0.0

Release notes for Funnelback 11.0.0

Released: 26 August 2011

New Features

Redeveloped query processing layer for more efficient query processing and improved search presentation customisation.
New Push collection type for feeding non-web content into a funnelback index from a remote system over time, without the scalability limitations of instant updates.
New Directory collection type for searching Active Directory and LDAP repositories.
Administrator search tuning system allowing search ranking factors to be optimised for specific collections.
Content optimisation system which provides detailed guidelines for content authors on how to improve a specific result's ranking.
Preview and publish system for developing search form files without affecting production search presentation.
Ability to blend result sets for multiple queries from spelling suggestions, synonyms and other sources into a single result list.
Assorted web crawling improvements including support for revisiting infrequently changing content less often.

Upgrade Issues

Result summaries aren't highlighted by default anymore so that form authors have complete control over the query highlighting. You'll need to use the <s:boldicize /> tag on your existing forms to have the summaries highlighting back. - When upgrading trim collections from version 10, a full update of the collection is required to update the URLs of records to support the new instant update functionality.
The <s:boldicize /> and <s:italicize /> tags now use <strong> and <em> HTML tags instead of <b> and <i> previously. If you were using these tags in your CSS stylesheet you'll need to update it.

Using the Crawler form interaction system no longer disables cookie support by default. If a collection is using the form interaction system and can't crawl password protected sites successfully after the upgrade, please explicitly disable cookie support by setting crawler.accept_cookies=false.
The default treatment of nepotistic links has been changed to limit their effect. This will reduce indexing time, and should have a positive effect on the ranking in most web collections, particularly large ones covering multiple domains. This change can be reverted by setting the -nep_action indexer option value to zero.
The isolated mode filter has been renamed IsolatedFilterProvider (Previously IsolatedPublishorFilterProvider) and is now able to use any filter classes.
- It will use the Tika filter provider by default, so you'll need to update your collection configurations if you want to continue using the Davisor filters in isolated mode.
The <s:Truncate> tag no longer supports the stripMiddle attribute.
The default behaviour for the web crawler is now to skip revisiting a proportion of infrequently changing pages during each crawl. This behaviour can be configured through the crawler revisit policy.
Data reports are now specific to web collections and are no longer available for other collection types.

Selected improvements and bugfixes

Increased permitted number of meta collection components.
Added ability to analyse URLs remaining in a web crawl frontier.
Support for gathering multiple Exchange mail boxes through the EntropySoft connector in a single collection.
Added ability for web crawler to read cookies from a file on startup.
Improved crawler form interaction cookie handling.
Improved handling of non UTF-8 web content.
Improved query highlighting in results, especially with UTF-8 characters.
Corrected handling of UTF-8 form files.
Support for collection profiles when tuning search quality.
Added ability to index HTTP header and Facebook Opengraph protocol metadata.
Fixed incorrect addition of collection name to C metadata by default.
Reworked query completion JavaScript to avoid conflicts with other JavaScript libraries.
Support for multiple facets per tag in freemarker templates.
Added distance from origin to XML output when searching geospatial data.
Reduced warning messages from result transforms on missing metadata.
Added support for resolving relative links within the IncludeURL form tag.
Better handling of special characters in indexer options.
Added spelling whitelist file for words which should be provided as spelling suggestions.
Changed boldicize tag to use HTML strong tags rather than bold tags.
Changed query processing ordering to apply spelling suggestions after synonym expansion.
Introduced ability to execute custom code during query processing.
Eliminated log files produced by inactive crawler threads.
Fixed incorrect permission settings on init.d scripts.
Improved layout and display of the Funnelback administration interface.
Fixed handling of column names with special characters during database gathering.
Added setup documentation for IIS 7.5.
Automated installation of 64bit versions of search indexing and query processing components.
Improved crawler tolerance for timeouts on seed pages.
Improved index 'warm up' scripts.
Fixed sorting of results when early binding security is used.
Added headers to CSV exports from the analytics dashboard.
Added support for instant updates on TRIM collections.
Improved Javascript link extraction logic to avoid some invalid link cases.
Improved ordering of collections in Funnelback's administration interface.
Added tools for managing WARC archive files.
Fixed collection configuration cache clearing under mod_perl.

top

Funnelback 11.0.0

Release notes for Funnelback 11.0.0

New Features

Upgrade Issues

Selected improvements and bugfixes

Contents