Historical release notes
This page provides release note information for older Funnelback releases which are no longer supported. The release notes for recent versions of Funnelback are available on the Release notes page.
Released : 30th May 2015
14.2.3 - Selected improvements and bug fixes
- Addresses an issue when using @groovy collection directories where the collection's root data directory would be deleted during updates.
- Fixed an issue where fully matching result counts could be calculated incorrectly when using result collapsing and scoping operators.
- Fixed an issue where Push might create a errant Vacuum task while it is stopping which could run while the collection claimed to be stopped.
- The service startup workaround described in the 14.2.2 errata has now become the default. Some more detail is available at https://sourceforge.net/p/yajsw/discussion/810310/thread/e254f319/
Released : 29th April 2015
14.2.2 - Selected improvements and bug fixes
- Added support for configuring the Jetty webserver keystore via
- Provided additional documentation and examples for Configuring Funnelback's embedded web server.
- Improved performance for Push collection operations.
- Improved error case handling for missing faceted navigation files in configuration upgrades.
- Added additional troubleshooting options for Push collections.
- Fixed redirect bug in content auditor which could create a loop if deployed by a load balancer with a different port configuration.
- Fixed redirect bug in SEO Auditor which was preventing use of queries with some operators.
- Reduced unnecessary/repetitive warnings in indexer logs.
- Fixed problem which disallowed access to Content Auditor when the jetty port and actual port differed.
- Fixed handling of external metadata files with values containing a colon character.
- Fixed filtering of binary document types such as PDFs and Word documents in Push collections.
- Fixed handling of regular expression characters in reports for facet categories.
- Fixed a Memory leak in Push when using custom filters.
14.2.2 - Upgrade Issues
- Although the index format is unchanged in 14.2.2, earlier versions of Funnelback will not operate correctly with indexes generated by 14.2.2. This would not affect a normal upgrade process, but may be relevant in a half-upgraded multi-server setup or an upgrade rollback scenario.
14.2.2 - Errata
- Some reports of Funnelback services failing to start correctly in some environments have been received. Applying the workaround described at http://sourceforge.net/p/yajsw/discussion/810311/thread/9f7800e0/ to service files in INSTALL_DIRECTORY/services/ appears to resolve the issue in some cases, and this change is being evaluated as a future product default. Please also note that this issue appears to occur more commonly where the Scientific Linux operating system is used. Switching to one of Funnelback's officially supported operating systems may also resolve the issue.
Released : 23rd March 2015
14.2.1 - Selected improvements and bug fixes
- Fixed a problem in the collection update script when processing faceted_navigation.cfg.
- Fixed a problem which caused some collections on Windows to remain in an updating state after the completion of an update.
- Restored Jetty webserver rewrite rule support.
Released : 13th March 2015
14.2.0 - New features
- New Content Auditor feature producing reports on a collection's content.
- Additional metadata classes with multi-character names are now supported.
- Support for domain-level collection redirection.
- Support for the tilde operator to up-weight specific query terms in daat mode.
14.2.0 - Selected improvements and bug fixes
- Push collections are now able to provide parent meta collections with spelling and query suggestions.
- Introduced a setting to control storage of URLs which have no content after filtering.
- Version upgrades of a number of components, including the Java runtime (now version 8), the Groovy runtime (now 2.3), the Jetty web server (now 9.1).
- Groovy Filters can now be placed under a
@groovy/folder under the collection configuration folder.
- Access to the modern UI search interface via the admin web server port now requires authentication as it did prior to Funnelback version 12.
- IP address restrictions are now consistently applied to all modern UI endpoints except log/redirect and search-session related endpoints.
- Search session and history are now supported on non-web collections.
14.2.0 - Notable Internal Changes
- The groovy binary has changed location from tools/groovy-1.8.6/bin/groovy[.bat] to tools/groovy/bin/groovy[.bat] along with being upgraded to 2.3.7.
- The blat binary has changed location from wbin/blat-2.6.2/full/blat.exe to wbin/blat/full/blat.exe along with being upgraded to 3.2.3.
- When in count_urls mode, padre will no longer count URL protocols as a separate component. This does not affect any existing facet types, however the change would be visible in the raw padre data model.
- Instant update on Filecopy collections will require valid URIs when deleting.
- The mechanism for starting the Funnelback Jetty web server has changed from XML configuration files to Java code, with a Groovy script customization mechanism available fro custom deployments. See Configuring Funnelback's embedded web server for details.
14.2.0 - Upgrade Issues
- If upgrading from version 14 you must run a bash script, supplied by the installer when run, before the installation can be upgraded. See Installing_and_Upgrading.
- The syntax for listing metadata classes has changed from a simple list of characters (e.g. xyz) to a comma separated list surrounded by square brackets (e.g. [x,y,z]) to support multi-character class names. Collection and profile configuration files are automatically updated during upgrade, however any external or custom configuration of such lists must be updated.
- Funnelback now indexes metadata field boundaries by default (previously enabled only through the -ifb indexer option). A new -noifb option is available to disable this behavior if needed, however the change is not known or expected to affect any implementations.
crawler.store_empty_content_urlsmust now be set to true for any collection where URLs with empty content should be stored. This was never done prior to 13.2, and was always done in 13.2 and 14.0, but was not desirable in most cases, so a setting with a default of false has been introduced.
collection.cfgwill now return live for instant update workflow and steps after swapping views, where as previously it returned offline. Any existing workflow steps may need updating to account for this.
- Image scaling will now return
.pngimages by default, instead of returning the type of the input image. When a different format is desired, this must be specified using the format cgi parameter. See Presenting Images for further details.
- Connector collections, which relied on a third party framework which is no longer available, are no longer supported. Such collections should be removed before upgrading to 14.2.0 or later versions.
- Push collections created prior to version 14.0.0 (also known as Continuous Updating collections) are no longer supported. Such collections should be upgraded to the new Push system or removed before upgrading to 14.2.0 or later versions. Contact Funnelback for assistance in migrating any existing collections of this type to the new push collection system.
- Filecopy now stores properly encoded URLs (e.g. spaces encoded to
%20, etc.). That will result in a re-crawl of all the documents with filenames (or containing folders) that needs encoding, causing the first crawl after the upgrade to take than usual as those documents will be re-filtered rather than being copied from the previous crawl.
- Funnelback now internally uses a view URL parameter within the modern and classic UI systems. Implementations should avoid setting any value for this parameter.
Released : 7th October 2014
14.0.1 - Selected improvements and bug fixes
- Introduced better pagination for the WCAG Compliance Auditor URL listing pages.
- Addressed a security vulnerability in a defunct Classic UI CGI script (remotepadre.cgi).
- Addressed security vulnerabilities in Classic UI scripts cache.cgi, feedback.tcgi and showThumbnails.cgi
- Removed tagging system from Classic UI due to several security vulnerabilities within it.
- Restored IE8 support for time based charts within the WCAG Compliance Auditor.
- Upgraded the bundled version of Apache ManifoldCF to the current version, 1.7.1.
14.0.1 - Upgrade Issues
- Classic UI no longer supports the tagging feature which existed in earlier versions.
Released : 10th September 2014
14.0.0 - New Features
- Data API providing access to reporting and wcag information in a structured (JSON) form.
- New curator ruleset editing interface
- New curator trigger based on predictive segmentation features and query term addition action
- Push collections have been reimplemented to be more stable and scalable.
- Graphical redesigns of WCAG reporting and Content Optimiser.
- When files kill_exact.cfg, kill_partial.cfg and gscopes.cfg exist, they will be automatically applied to the collection when the collection is indexed. Push collections will only apply gscopes.cfg.
- Predictive Segmentation feature for providing market segmentation information for website visitors.
14.0.0 - Selected improvements and bug fixes
- Added support for custom query completions (query_completion.csv) in profiles.
- Added support for Windows Server 2012 R2.
- Redeveloped warc file library providing a simpler interface.
- Padre indexing now automatically ignores MapDB database files.
- Contextual navigation is now enabled for new collections by default.
- Indexes may now be upgraded on installations which are in read only mode.
- Various reliability improvements to the tuning system, including support for processing redirects.
- Fixed sending of analytics emails on Linux installations where the schedule has been changed from the default.
- New Java based indexer workflow.
- Logs for each indexing stage are now separated, and named in the form Step-(stage name).log
- Halting an update on Unix platforms will now terminate indexing processes rather than waiting for them to complete.
- A copy of Funnelback's query completion script jquery.funnelback-completion.js is now created with Funnelback's version number (e.g. jquery.funnelback-completion-14.0.0.js) to allow result pages to continue using the old version after upgrades if desired.
- New padre result modes for producing config files for setting 'killed' document flags and query independent evidence.
- API compatibility and rate limiting improvements to a number of the social media gathering libraries.
- Improved scripting support for WCA database upgrade tool.
- Fixed handling of query processor option upgrades when the options contained pipe '|' symbols.
- Fixed padre indexing collfield option when reindexing a collection's live view.
- Fixed a problem in analytics PDF exporting when a query contained an ampersand.
- Default configuration now suppresses any Java exception traces which would otherwise be displayed by the Jetty web server.
- Improved image scaling quality of low colour bit depth images.
- Added support for quick links in search result for pages with protocols other than http.
- Query completion data for profiles is now compiled in parallel, significantly reducing update time for collections with many profiles
- Incompatible component indexes in a meta collection will be skipped by the query processor, setting -exit_on_bad_component=1 will cause the query processor to revert to the previous behaviour.
14.0.0 - Upgrade Issues
- Some previous versions of Funnelback on Windows fail to uninstall ActivePerl correctly during upgrades. Version 14's installer will provide a warning if this causes a problem for the upgraded installation. If this warning occurs you must manually uninstall ActivePerl from the system before reinstalling Funnelback.
- The Curator Groovy Trigger and Action interfaces have been moved from the package com.funnelback.publicui.curator.action to com.funnelback.publicui.search.model.curator.trigger. Any custom implementations of these interfaces must be updated to reflect this change.
- Curator's DisplayProperties action now takes an 'additionalProperties' parameter rather than calling it 'properties' (for consistency with other actions). Existing curator.yaml file using DisplayProperties must be updated.
- Curator's CountryNameTrigger previously took an argument of 'targetCounties' - The spelling has been corrected to 'targetCountries'. Existing curator.yaml file using CountryNameTrigger must be updated.
- Support for Novell fileshare document level security has been removed.
- The default cache URL has been changed from
/s/cache(classic UI to modern). For collections requiring the old cache URL, this may be set in the collection's
- On TRIMPush collections the record types and classification security files are now stored inside
live/idx. The query processor security plugin has been updated to account for that change.
- Funnelback's Jetty web server now logs to
log/jetty.log.#instead of including its log in the wrapper process's
- A unset value for query_processor_options in
conf/<collection>/collection.cfgie "query_processor_options=" will no longer be overridden by the the default value, instead it will be treated as supplying no options to the query processors.
- All standard Funnelback security plugins now prefix keys with [collection_name]; to allow keys to be sent only to the relevant sub-collections when a meta collection is used. Any custom security plugins must be updated to perform the same prefixing.
- ManifoldCF may experience an SQLException when loading databases from earlier versions in some cases. Removing the $SEARCH_HOME/databases/manifoldcf/dbname will cause a new database to be created to resolve this issue, however ManifoldCF must then be reconfigured. Manually upgrading the version of ManifoldCF should solve the problem without data loss once a fixed version is available.
- A number of classes in the Funnelback common package have been organized into new package structures. Where these have been used in external scripts, these scripts must be updated.
- Funnelback now uses the modern UI's
/s/redirectclick tracking system by default instead of the classic UI's /search/click.cgi.
14.0.0 - Errata
- Funnelback 14.0.0 does not fully support Internet Explorer version 8 within the WCA reporting interface. This presentation issue will be addressed in a future release.
- Funnelback 14.0.0 does not fully support producing content optimiser information for push collections. This issue will be addressed in a future release.
- Push 2 in Funnelback 14.0.0 has a bug where anchor text is not correctly applied.
Released : 11 April 2014
13.2.0 - New Features
- Funnelback is now able to perform website accessibility checking against version 2 of the W3C's web content accessibility guidelines (WCAG 2.0). See Accessibility_Auditor for further details.
13.2.0 - Selected improvements and bugfixes
- Additional sources of information (e.g. "related clicks" and "related results") are now used in the Recommendations system, improving the quality of suggestions.
- Improved crawl consistency when restarting from crawler checkpoints.
- Ability to clear caches for scaled images (See Presenting Images).
- Ability to calculate upper and lower bounds for numeric metadata.
- Improvements to result summaries (removing duplicate sentences and noscript content).
- Separate configuration of Funnelback webserver ports and link-generation ports to better support load balancer configurations.
- Inclusion of a high-performance XML splitting tool.
- Improved performance of user account editing.
- Fixed memory and file handle leaks in
libi4uthat could cause issues when using recommendations or search session and history.
13.2.0 - Upgrade Issues
- Users will need to have 'wcag' role to access the WCAG Compliance Auditor (previously was report_viewer). Grant access in Admin UI under Manage Users.
- WCAG databases from earlier un-bundled accessibility checker versions must be upgraded by running the following
# Linux $SEARCH_HOME/linbin/java/bin/java -cp "$SEARCH_HOME/lib/java/all/*" com.funnelback.wcag.database.AccessibilityDatabaseUpgrade collection_name # Windows %SEARCH_HOME%\wbin\java\bin\java -cp %SEARCH_HOME%\lib\java\all\^*" com.funnelback.wcag.database.AccessibilityDatabaseUpgrade collection_name
- Custom collections no longer preserve data between crawls by default. Collections which need to preserve data will need to implement that behavior the
convertRelativeparameter on the
<@fb.IncludeUrl>tag now supports both syntax:
Released: 4 November 2013
13.0.1 - Selected improvements and bugfixes
- Fixed a problem with the uninstaller generating error messages about non-existent services on Windows installations
13.0.1 Upgrade Issues
- If upgrading from version 13.0.0, the uninstaller for version 13.0.0 should be run manually, and error related to service stop/delete errors manually dismissed rather than allowing the uninstaller to be automatically run as part of the new installation. Upgrades from versions earlier than 13.0.0 do not require this step.
- Due to a known issue, the
convertRelativeparameter on the
<@fb.IncludeUrl>tag only supports the all-lowercase syntax:
Released: 31 October 2013
13.0.0 - New Features
- Social media gathering support
- Curator search result page customisation
- Recommendations system
- Improved, responsive default form
- Modern UI search session and history support
- Various Modern UI improvements
- Result collapsing
- New Enterprise Connector framework (with many included repository types)
- Support for Windows Server 2012
13.0.0 - Selected improvements and bugfixes
- New custom collection type supporting custom-developed gathering components
-filepath_exclusion_patternindexer option to exclude files from the index before they're scanned
- A new RSS form has been created for the Modern UI
- Mediator.pl is now capable of pushing report folders
- Better support for workflow scripts written in Groovy by expanding the $GROOVY_COMMAND variable
- Gscopes can be used with Push collections
- Query Independent Evidence (QIE) can be used with Push Collections
- Ability for autocompletion to extend the existing query (e.g. to select facets)
- The bundled versions of Java, ActivePerl and Groovy have been upgraded
- A range of improvements/refinements to the Trimpush gathering process
- Data API allowing access to Funnelback-stored data for a given URL
- Fixed bugs relating to tuning when the collection and profile both specified query processor options
- Ability to enable geolocation of user's IP (Using MaxMind GeoLite) address to allow result customisation based on location
- Improvements to in-crawl form interaction
- Performance improvement when using the fb.IncludeUrl tag with start and end restrictions
- Support for grouping collections within Funnelback's administration interface
- New filter step (InjectNoIndexFilterProvider) for marking content as no-index during filtering
- Compression of archived search log files is now performed automatically
- Improved calculation of facet count estimates in document-at-a-time query processing
- Validation and concatenation of external metadata files
- Support for conversion of WARC files containing non-web URLs to file and directory layouts
- Modern UI controllers for serving repository documents
- Access restrictions based on X-Forwarded-For headers and CIDR notation
- Correct logging of request source when multiple X-Forwarded-For entries are provided
- Replaced file editor component avoiding cut/paste issues previously experienced
- Ability to clear update locks from within Funnelback's administration interface
- Relative date formatting in search form files
13.0.0 - Upgrade Issues
- As of version 13 Funnelback no longer supports Windows Server 2003 or any 32 bit operating system.
- If you need to continue using Classic UI under IIS for document level security, you need to install a separate 32bit copy of ActivePerl and configure IIS accordingly.
- Funnelback no longer supports the creation of new classic UI collections (though existing ones are still supported on upgrade)
- The field
userIdToLogin the Modern UI data model has been renamed to
requestId. Hook scripts must be updated accordingly.
- The data model node
extracontaining extra searches has been renamed to
extraSearchesin the JSON and FreeMarker mode to be consistent with the name used in hook scripts and the XML output. Existing forms must be updated accordingly. Backwards compatibility can be achieved by setting
<#assign extra=extraSearches? />at the top of the form file.
post_update_commandis now run in the background on Windows (as of Linux). That might affect scripts that were relying on the exit status of the this command to fail the update.
- Handling of URLs logged with unencoded spaces within analytics has been improved. Collections with affected analytics databases should remove the old database after upgrading to have the database fully rebuilt.
mediator.plhas been deprecated,
PushViewshould be used instead.
- The ActivePerl installation folder no longer contains a version (e.g.
$SEARCH_HOME/linbin/ActivePerl). Custom scripts that accessed ActivePerl at the old location must be updated accordingly.
RedisDBPerl module is not included with Funnelback anymore, but is now installed in ActivePerl. Scripts using a different perl interpreter (such as the OS perl) will need to install this module.
- The HTML markup for query completion categories has changed from
<li class="ui-autocomplete-category">to comply with jQuery UI best practices. CSS must be updated accordingly.
- A new 'databases' directory has been added to a collection's 'live' and 'offline' directories. This is used for storing database files required by some features e.g. Recommendations and Text Mining. The 'update-configs.pl' script run during upgrades will create these directories for existing collections and move TextMiner database files if required.
- When using multiple query processors you must change when publish_index.pl is called. It will now push from live to offline, so must be called during the post_archive_command rather than the post_index_command.
post_update_commandto support record types and classification security on TRIMPush collections has changed. It was previously taking the server details and path to the index folder, but is now taking the name of the collection and will automatically look up the relevant settings from
Released: 4 April 2013
12.4.0 - Upgrade Issues
- The field
userIdin the Modern UI data model has been renamed to
userIdToLog. Hook scripts must be updated accordingly.
12.4.0 - New Features
- Introduced a 'read only mode' for Funnelback administration to assist in managing redundant configurations.
- Collection update statistics are now captured and reports on the composition of update time are available.
12.4.0 - Selected improvements and bugfixes
- Improved Admin UI editing component for form files.
- Analytics will now ignore invalid gzipped log files rather than aborting processing.
- Failed analytics updates will now be reported via email to the collection administrator.
- Instant updates now work correctly when WARCStore is used.
- Index upgrading can now be performed in multiple threads to reduce the overall upgrade time on multi-code systems.
- Resolved a reflected cross-site scripting vulnerability when using corrupt facetScope values.
- Fixed a null character handling issue the collection parameter.
Released: 7 February 2012
12.2.1 - Upgrade Issues
- When upgrading from version 12.2.0 on Windows, all funnelback services (named funnelback-*) must be stopped in the Windows task scheduler before starting the new installer.
12.2.1 - Selected improvements and bugfixes
- Avoid mishandling of the uninstaller file when upgrading to 12.2.0, which resulted in a missing uninstaller on Windows systems.
- Implement a monitoring system to ensure search query processes are cleaned up if the modern UI runs out of memory during a query.
- Generate query completion files only for live profiles to improve indexing performance on collections with large numbers of profiles.
- Avoid a query processor crash when a short query is expanded to a long one with a synonym.
- Fixed an erroneous log message regarding redis password generation.
Released: 20 December 2012
12.2.0 - New Features
- Support for multiple query processors.
- Support for max_files_stored setting in web crawler Site Profiles.
- A distinction is now made between user entered query parameters (
query) and system generated ones (
s). Spelling suggestions, blending and synonyms now applies only to the user entered parameters.
12.2.0 - Upgrade Issues
- Groovy filters have been moved from
SEARCH_HOME/lib/java/commonto a specific folder
question.queryExpressionsnode of the the Modern UI data model has been removed. Query constraints generated from the
meta_*date-related parameters are now stored in
question.metaParameters. Form files or hook scripts that were using this node must be updated accordingly.
- All the query processor options have now been converted to a standard syntax. All options are now specified in the form
option=valueand are consistent between command-line, configuration files and CGI parameters. External systems setting CGI values should be updated accordingly. See Query processor options (collection.cfg) for full details.
- WARC Storage is now the default for Filecopy collections. This will cause the Filecopy collections to be re-crawled entirely. store.raw-bytes.class (collection.cfg) can be set to
com.funnelback.common.io.store.bytes.FlatFileStoreto retain the previous behavior.
- Query blending is incompatible with TAAT mode and is now ignored in that mode.
12.2.0 - Selected improvements and bugfixes
- Text Mining performance improvements.
- Text Mining is now compatible with meta collections.
- Web Crawler:
- Support for
- Support for renewing form-based authentication cookies during the crawl.
- Fix throughput graph.
- Support for
- Modern UI:
- Support for Web resources folder on the Modern UI.
- FreeMarker loop variables
..._indexare now exposed by Funnelback custom tags.
- Use server hostname in query logs filenames.
- Fix contextual navigation links containing
- Fix click tracking links when there's no user-entered query.
- Fix handling of URLs containing multiple dots (e.g. http://server.com/...folder/file), URL encoding / decoding improvements.
- Indexer: Fix host feature calculations on very large collections.
- Query processor:
- Improvements on partial matches in DAAT mode.
- Overhauling the implementation of query blending so that the top result for the original query remains top of the blended list, unless query variants are given weights greater than or equal to 1.0.
- Fix faceted navigation counts when using query blending.
- Fix per-profile
blending.cfgwhen run on the command line.
- Fix handling of mandatory operators in DAAT mode.
- SSL support for Directory collections.
- WARC files are now tied to their table-of-contents and cache companion files. An error will be raised if a WARC file is used with the wrong TOC or cache companion file.
- Reduced number of background requests on the Admin home page to track collection updates progress.
- Fix in directories permissions when running Funnelback under Apache.
- Fix link in Trend_Alerts_reports emails.
- Fix NTFS early binding DLS causing some results to never be returned to users.
- ClearLocks mediator task now clears the stop flag as well as the update status.
- Index merger: Fix merging indexes with same documents but different document flags.
- Index orderer: Implement the same ranking defaults as in the query processor.
- build_autoc: Provide a
-partialsoption. This allows multi-word organic query completions to be triggered by sub-strings starting at each word. E.g. the suggestion australian workers union can be triggered when a person starts typing that string, or when they start typing workers union or just union.
- build_autoc: Provide a
-label_organicsoption for organic completions to be shown under the category General suggestions.
Released: 30 August 2012
12.0.0 - New Features
- Major improvements to Filecopy and TRIM collection types - Performance, reliability and the ability to search collections during gathering.
- Text Mining: Extraction of named entities and definitions from text.
- Simplified support for managing Funnelback query processor systems attached to an administration server.
- Graphical redesign of the Funnelback administration interface.
- Improved query processing performance with skip-block index structure.
- Improved default ranking parameter settings.
- Automated document classification.
- Document level security and metadata integration with Squiz Matrix.
- Date based facet generation.
- Image presentation tools including scaling and web page preview rendering.
- Support for NOARCHIVE and NOSNIPPET directives in robots metatags. more info....
- Significant improvements to support for CJKT languages (including support for mapping of traditional to simplified Chinese and vice-versa).
- Support for localisation of search template files.
12.0.0 - Upgrade Issues
- Due to the standardisation of many [query processor options](../../more/extra/padre_query_processor_options.md the format of some CGI parameters has been changed, and must be updated in any external systems setting CGI values. See Query processor options (collection.cfg) for full details.
- The Redis service included with Funnelback 11.x may not shut down correctly during upgrades. Please ensure you shut down Funnelback manually before upgrading and check that the redis-server process is stopped.
- Funnelback 11.x and greater embed a 64-bit Java virtual machine (JVM). If you are upgrading from an older version of Funnelback you may need to increase the max_heap_size by 50% to account for the increased memory requirements of the 64-bit architecture.
- Some components of Funnelback now depend on the installation of freetype and fontconfig on the local system. On RedHat/CentOS systems, these may be installed with
yum install freetype fontconfigif they are not present.
<@IncludeUrl />Modern UI tag is now part of the
<@fb.IncludeUrl />. Form files must be updated accordingly.
- The default filter for PDF to HTML conversion is now Tika (previously
pdf2html). This might slightly affect the rendering of converted PDF files.
-cjktquery processor option is no longer required and no longer supported.
- The truncation operator (
*) is no longer supported in queries unless the
-service_volume=lowquery processor option is used.
12.0.0 - Selected improvements and bugfixes
- Standardised the syntax of many query processor option settings.
- Improved query completion performance.
- Suppress duplicate descriptions in query completions.
- Ability to bias result ranking based on gscope settings.
- Funnelback shell tool for manual manipulation of stored content.
- Services for remotely performing storage and mediator operations on Funnelback servers.
- The Modern UIData Model
inputParameterMapare now backed by the same Map: Any change made in one will be visible in the other. Existing hook scripts and FreeMarker template shouldn't be affected but some code simplification might be possible when parameters were injected into those Maps. Additionally, the
additionalParametersmap now takes arrays of Strings as values, as opposite to just Strings previously.
padre-swmay now be accessed directly at
/s/padre-sw.cgifor debugging purposes.
- Support for selectable metadata values.
- Ability to turn off stemming of particular words when stemming is on by default.
- More accurate reporting of filetypes.
- Facet counts are now supported (through estimation) in DAAT mode.
- Support many more complex query types in DAAT mode.
- Support for anchor-text processing and redirect/duplicate information in push collections.
- Ability to restart gathering from a checkpoint in Filecopy and TRIM collections.
- Support for individual server authentication details via site profiles.
- Fixed redis server security so manual security configuration is no longer required.
- Improved URL status collection tool for web collections.
- Added ability to snapshot push collection indexes so they can be distributed to remote servers safely.
- Improved Funnelback uninstaller to remove environment settings.
- Funnelback now includes a full Java development environment (rather than just the runtime environment) to provide better performance monitoring tools.
- Fixed incorrect spelling in continuous updating configuration parameter names.
- Fixed restarting of push collections after a service restart.
- Improved logging for jetty web applications.
Released: 6 February 2012
11.4.0 - New Features
- Support for refresh updates when storing content in WARC files.
- Locale support for result sorting.
- Added support for cross-domain query completion.
- Introduced padre-mi tool for merging search indexes.
11.4.0 - Upgrade Issues
- Please note that the version numbering scheme for Funnelback has changed with this release. In future, the second digit in the version number (e.g. 4 in 11.4.0) will always be even in officially released, stable versions, and odd in internal release versions.
11.4.0 - Selected improvements and bugfixes
- Added ability to send collection update emails on failure only.
- Fixed a regression in 11.1 where geographic sort ordering was incorrect if other numeric metadata fields were used.
- Improved crawler performance when using WARC files and on multi-domain collections.
- Introduced the ability to provide results inline for facet categories to avoid multiple search requests.
- Fixed incremental crawling when using WARC files and a previous crawl which had been restarted.
- Allowed custom code to control crawler file storage.
- Provide clearer errors when an invalid workflow.cfg file is in use.
- Fixed handling of empty padre_opts.cfg files for Windows installations.
- Configurable error handling on the Modern UI.
Released: 9 December 2011
- Support for different site web crawling profiles, allowing high performance sites within a larger collection to be crawled more quickly.
- Support for larger numeric metadata values and more strict equality checks.
- If Funnelback is upgraded to 11.1 from a version earlier than 10.1 Funnelback's administration interface will initially show all previous collection updates as failed. Updating the collections after the upgrade will resolve this display issue.
- Collections using WARC storage must be updated after upgrading to allow cached documents to be returned.
Selected improvements and bugfixes
- Improved modelling of TRIM early binding security
- Documentation refinements
- Support for X-Forwarded-For headers in the modern search interface
- Improved key encoding handling for WARC files
- Fixed handling of LDAP directory repository authentication without domains
- Ability to configure the web crawler's DNS cache size
- Reduced load from editing large tuning data sets and improved CSV tuning data import
- Fixed errors when exporting empty frontier entries at the end of a web crawl
- Fixed incorrect facet display caused by interactions with synonyms
- Improved detection of successful collection updates
- Improved file-copy and security support for Novell Netware
- Fixed temporary file opening issue in analytics reporting on Windows
Released: 13 September 2011
Selected improvements and bugfixes
- Refinements to content optimiser presentation.
- Improved tuning data entry interface.
- Added proxy server support when using crawler form interaction.
- Reduced index file storage size in push collections.
- Fixed failure of initial tuning runs on Windows.
- Fixed problem with index upgrading process when upgrading from version 10 systems.
- Fixed incorrect shutdown of previous Funnelback systems when starting installation.
Released: 26 August 2011
- Redeveloped query processing layer for more efficient query processing and improved search presentation customisation.
- New Push collection type for feeding non-web content into a funnelback index from a remote system over time, without the scalability limitations of instant updates.
- New Directory collection type for searching Active Directory and LDAP repositories.
- Administrator search tuning system allowing search ranking factors to be optimised for specific collections.
- Content optimisation system which provides detailed guidelines for content authors on how to improve a specific result's ranking.
- Preview and publish system for developing search form files without affecting production search presentation.
- Ability to blend result sets for multiple queries from spelling suggestions, synonyms and other sources into a single result list.
- Assorted web crawling improvements including support for revisiting infrequently changing content less often.
Result summaries aren't highlighted by default anymore so that form authors have complete control over the query highlighting. You'll need to use the
<s:boldicize />tag on your existing forms to have the summaries highlighting back. - When upgrading trim collections from version 10, a full update of the collection is required to update the URLs of records to support the new instant update functionality.
<s:italicize />tags now use
<em>HTML tags instead of
<i>previously. If you were using these tags in your CSS stylesheet you'll need to update it.
- Using the Crawler form interaction system no longer disables cookie support by default. If a collection is using the form interaction system and can't crawl password protected sites successfully after the upgrade, please explicitly disable cookie support by setting
- The default treatment of nepotistic links has been changed to limit their effect. This will reduce indexing time, and should have a positive effect on the ranking in most web collections, particularly large ones covering multiple domains. This change can be reverted by setting the -nep_action indexer option value to zero.
- The isolated mode filter has been renamed
IsolatedPublishorFilterProvider) and is now able to use any filter classes.
- It will use the Tika filter provider by default, so you'll need to update your collection configurations if you want to continue using the Davisor filters in isolated mode.
<s:Truncate>tag no longer supports the
- The default behaviour for the web crawler is now to skip revisiting a proportion of infrequently changing pages during each crawl. This behaviour can be configured through the crawler revisit policy.
- Data reports are now specific to web collections and are no longer available for other collection types.
Selected improvements and bugfixes
- Increased permitted number of meta collection components.
- Added ability to analyse URLs remaining in a web crawl frontier.
- Support for gathering multiple Exchange mail boxes through the EntropySoft connector in a single collection.
- Added ability for web crawler to read cookies from a file on startup.
- Improved crawler form interaction cookie handling.
- Improved handling of non UTF-8 web content.
- Improved query highlighting in results, especially with UTF-8 characters.
- Corrected handling of UTF-8 form files.
- Support for collection profiles when tuning search quality.
- Added ability to index HTTP header and Facebook Opengraph protocol metadata.
- Fixed incorrect addition of collection name to C metadata by default.
- Support for multiple facets per tag in freemarker templates.
- Added distance from origin to XML output when searching geospatial data.
- Reduced warning messages from result transforms on missing metadata.
- Added support for resolving relative links within the IncludeURL form tag.
- Better handling of special characters in indexer options.
- Added spelling whitelist file for words which should be provided as spelling suggestions.
- Changed boldicize tag to use HTML strong tags rather than bold tags.
- Changed query processing ordering to apply spelling suggestions after synonym expansion.
- Introduced ability to execute custom code during query processing.
- Eliminated log files produced by inactive crawler threads.
- Fixed incorrect permission settings on init.d scripts.
- Improved layout and display of the Funnelback administration interface.
- Fixed handling of column names with special characters during database gathering.
- Added setup documentation for IIS 7.5.
- Automated installation of 64bit versions of search indexing and query processing components.
- Improved crawler tolerance for timeouts on seed pages.
- Improved index 'warm up' scripts.
- Fixed sorting of results when early binding security is used.
- Added headers to CSV exports from the analytics dashboard.
- Added support for instant updates on TRIM collections.
- Improved ordering of collections in Funnelback's administration interface.
- Added tools for managing WARC archive files.
- Fixed collection configuration cache clearing under mod_perl.
Released: 25 May 2011
Selected improvements and bugfixes
- Improved crawler form interaction when handling cookie deletion.
- Rectified issue when installing Funnelback to a location other than C:\funnelback on Windows.
- Reduced connector dependencies for connector gathering
- Support for early binding security with non-ascii file names.
- Faster handling of large binary files which cannot be filtered.
- Assorted improvements to documentation.
Released: 8th March 2011
Selected improvements and bugfixes
- Fixed web crawler's title extraction when an HTML documents head tag contains attributes.
- Fixed incorrect web crawler accept header which prevented binary documents being returned by IIS.
- Fixed indexing of XML documents preceded by DOCHDR information.
Released: 7th December 2010
- Early binding secured search for Windows NTFS and Novell Netware fileshares.
- More flexible custom data sources and presentation for query completion.
- Scripted workflow allowing simple, efficient content manipulation during filtering.
- Crawler monitor graphs and URL information reporting.
- Display plugin for presenting additional search result sets with differing search parameters.
- click.cgi URLs can no longer be generated by external systems. click.cgi URLs now require an authorisation hash parameter, which Funnelback produces.
- Better validation of query and click logs may result in a small decrease in reported counts, by eliminating corrupt log entries.
- Web collections must be updated after upgrading for data displayed in the URL status and test data editor's info column to be correct.
Selected improvements and bugfixes
- Substantial improvements to TRIM data gathering and document filtering.
- Improved Microsoft Office document formatting, and support for older office file formats.
- PDF exporting of analytics dashboard.
- New search result quality test data creation interface.
- Query completion is now enabled for administration home page search boxes.
- Previously selected collections are now automatically retained when reopening the administration interface.
- Fixed view as HTML / Cached links when using crawler inline filtering
- The webcrawler's standard include/exclude system can now be extended after installation via Groovy StandardPolicy subclasses.
- Fixed possible infinite loop where ResIf form tag did not specify a name attribute.
- Corrected XML encoding of filetype attributes.
- Fixed possible HTML corruption when using Boldicize form tag.
- Improved handling of invalid/corrupt click and query log entries.
- Click tracking links are now produced in xml.cgi, within the click_tracking_link element.
Released: 14th February 2011
Selected improvements and bugfixes
- Several improvements to instant update processing to handle larger update volumes.
Released: 13th October 2010
- Query completion suggestions available on search forms.
- Quicklinks to sub pages available for top search results.
- Explore option for discovering related results from a single starting result.
- New .warc storage option for very large web collections.
- Simplified collection configuration files.
- Event search custom display mode.
- ActivePerl is no longer required to be installed separately to Funnelback, as it is now bundled with the Funnelback installation. You may wish to remove any prior version of ActivePerl as part of your upgrade process.
- Analytics, since version 9.0.0, have included requests where start_rank was greater than one (i.e. where something other than the first page of results was viewed). This change has been reversed in version 10, which may decrease the reported volume of queries.
- Please be aware that stemming is now enabled for new collections by default. Stemming can be disabled by removing the -stem2 query_processor_option.
- To simplify administration, a collection's collection.cfg file will now contain only settings which differ from the default values defined in $SEARCH_HOME/conf/collection.cfg.default or $SEARCH_HOME/conf/collection.cfg. Any setting may be copied into the collection's collection.cfg file to override the value set in one of the shared configuration files, and the administration user interface now provides a simple drop-down menu listing all of the inherited options and their default values.
Selected improvements and bugfixes
- Improved logging for analytics updates.
- Improved handling of sites with base hrefs.
- Better handling of passwords containing special characters during installation.
- Fixed halting of web collection instant updates.
- Handling of binary document URLs without file name extensions using in-line filtering and mime types.
- Display of 'Top Query' for contextual and faceted navigation reports.
- Improved handling of existing in-place data during web crawling.
- Improved quality of title fixing.
- New interface and syntax highlighting for collection.cfg editing.
- Improved TRIM metadata mapping.
- Click tracking for RSS feed results.
- Improved exclude pattern handling of special characters in URLs during web crawling.
- Better handling of phrase queries with s:Select tags.
- Logging for padre error messages and query logging permission errors.
- Inclusion of header row in CSV analytics exports.
- Sorting of administration interface user list.
- Fixed handling of query changes within a faceted navigation category.
- Improved interface for editing web collection include and exclude patterns.
- Activated basic search query stemming for new collections by default.
- Added support for a server-wide reporting blacklist in addition to collection specific ones.
- Improved date sorting for collections including dates in the future.
- Improved scrolling behaviour within form and long config file editing.
- Improved charting boundary conditions within analytics reports.
- Changed analytics to exclude queries with a start_rank greater than one (i.e. subsequent result pages).
Released: 27th August 2010
Selected improvements and bugfixes
- Improved date presentation in Funnelback analytics.
- Fixed perl path issue which caused errors when processing search queries via Apache.
- Fixed handling of multi-process query processing for hosted services.
Released: 17th August 2010
Selected improvements and bugfixes
- Internal changes to support hosted services (faceted navigation and access_alternate)
Released: 5th August 2010
- Integration with an enterprise repository connector system, supporting SharePoint and Lotus Domino gathering with high performance document level security.
- Improved query report presentation, including a dashboard and interactive charts.
- Incremental updating of query reports and ability to view reports while updating.
- Higher performance document filtering, multi-threaded filtering and in-crawl filtering.
- The "SEARCH_SERVICE" environment variable is no longer supported, having been replaced by the bureau_mode_enabled setting in global.cfg. In environments where SEARCH_SERVICE was set to BUREAU, bureau_mode_enabled should be set to true.
- The change in default filtering frameworks noted above results in much improved filtering performance, however please be aware that:
- Cached copies of Microsoft office documents will no longer reflect the original document formatting.
- Microsoft Word 95 and Microsoft Word 6 documents are no longer supported.
Selected improvements and bugfixes
- Feature renaming as follows :
- Fluster becomes Contextual Navigation.
- Query Spike Detection becomes Pattern Analyser.
- Featured Pages becomes Best Bets.
- Query Expansion becomes Synonyms.
- Ability to specify both inclusive and exclusive date constraints.
- Activation of faceted and contextual navigation by default on filecopy collections.
- Relocation of Jetty working directory to avoid conflict with tmpwatcher and similar.
- Assorted improvements to document title fixing to reduce undesirable title changes.
- Improved query reports performance on 64bit windows platforms.
- Form tag reordering to allow s:cgi access from within form plugins.
- Security plugin mechanism for padre, allowing custom early binding document level security.
- Case insensitivity option for best bets and synonyms.
- Better query handling of German umlauts and Maori macrons.
- Improved result summaries for Chinese, Japanese, Korean and Thai.
- Fixed leaking of file-handles when updating large pattern analyser data sets.
- Improved support for HTML base href elements within forms.
- Support for HP TRIM license numbers.
- Improved reporting PDF layout.
- Fixed handling of facets containing parentheses.
- Fixed handling of BASE HREFs in indexed documents.
- Fixed issue with stemming when processing document at a time queries.
- Improved customisation options for contextual navigation.
- Corrected handling of collections with missing metamap.cfg and xml.cfg.
- Better error reporting where primary and secondary indexes differ in format.
- Indexing performance and scalability improvements.
- Dynamic resizing and increased scalability of indexer memory structures.
- Support for complex queries in spelling suggestions and contextual navigation.
- Improved stemming algorithm catching more cases where stemming should apply.
- Corrected reporting counts for geolocated data and queries using include or exclude scopes.
Released: 10th May 2010
- Support for instant updates when using meta collections.
Selected improvements and bugfixes
- Fixed handling of comment and blank lines in htpasswd files.
- Fixed incorrect handling of faceted navigation with the '0' metadata class.
- Fixed removal of CGI parameters from spelling suggestion links.
- Improved handling of search log naming variants.
Released: 25th March 2010
- Handling of complex queries for contextual navigation (fluster).
Selected improvements and bugfixes
- Query and click log files now always include hostnames, to simplify multi-server setups.
- Improved query log parsing to support scopes with ampersands.
- Fixed hourly query spike detection task.
- Improved reliability for download support package feature with incorrect permissions.
- Fixed error in scheduling on Windows due to ActivePerl changes.
- Removed some debugging output under IIS when using contextual navigation.
- Improved backwards compatibility for s:Compare tags.
- Improved logging of analytics update processes.
- Display collection size and updated date in XML results.
- Fixed rendering of query trend graphs under Internet Explorer 8.
- Removal of temporary files created during query spike detection on Windows.
- Allow multiple collection admin email addresses to be specified from collection editing.
- More reliable perl workflow script launching on Windows.
- Support for query report emailing via SMTP servers not requiring authentication.
- Scheduling improvements for Windows Server 2008.
- Fixed indexing of documents using noindex tags around the first outgoing link.
- Fixed title and url sorting of results.
- Handling of profile-based collections with new contextual navigation system.
- Handling of profile-based users in analytics.
- Assorted documentation corrections and improvements.
Released: 3rd March 2010
Selected improvements and bugfixes
- Fixed issue causing long-running processes when performing query spike detection on February 29th (i.e. leap years).
Released: 11th February 2010
Selected improvements and bugfixes
- New s:CategoryQuery tag for faceted navigation
- Corrected handling of XML request logging
- Increased tolerance of busy systems with isolated filtering mode
Released: 5th February 2010
- Improved query volume graphing, including display of rolling averages.
Selected improvements and bugfixes
- Allow collection and report update scheduling in Windows 2008.
- Corrected date synchronisation issues with query spike reporting
- Fixed UTF-8 presentation with related queries plugin
- Allowed insertion of related.cfg entries for queries which have never been seen for a collection
- Improved handling of websites using base href elements
- Corrected handling of stemming when processing document at a time (DAAT) mode queries
- Assorted documentation improvements and clarifications
Released: 11th January 2010
- New analytics (reports) system: improved UI, geolocation, related queries, emailed reports and improved performance.
- Query spike detection (detect "spikes" in activity for particular queries).
- Improved spelling suggestions.
- Support for 64-bit Windows and Linux platforms.
- As of version 9.0.0 the Funnelback administration system cannot be run through Apache or IIS and must be run within the embedded Jetty web server. Apache and IIS continue to be supported for the public search interface.
- Funnelback now requires ActivePerl 5.8.8 or 5.8.9 on all platforms (ActivePerl was previously required only on Windows).
- Funnelback is no longer supported on Solaris based platforms.
- Funnelback administrator users (other than the one created during installation) previously created under an IIS based administration interface must be manually recreated, as their passwords are now defined by the jetty .htpasswd file.
- JDBC drivers must now be installed in
<install_root>/lib/java/as was previously supported. See db.jdbc_class for details.
- spell.cgi is no longer supported. Spelling suggestions are now returned as part of the results from xml.cgi, and these should be used instead.
Selected improvements and bugfixes
- Click feedback information now incorporated into ranking by default for new collections
- Improvements to the display of document titles when no specific title was present in the original document
- Near-duplicate detection to down-weight repeated occurrences of highly similar results
- Improved customization options for faceted navigation
- Support for crawling sitemap.xml files
- Improved performance for contextual navigation (Fluster)
- Numerous contextual navigation bugfixes and improvements
- Administration interface automatically refreshes when collection updates complete
- Query independent evidence support for meta collections and instant updates
- Improved support for numerical metadata values
- HostnameFill option for faceted navigation
- General availability of document at a time (DAAT) processing for large collection handling
- Redesigned default search forms, with integrated advanced search functionality
- Fix issue where adding a new file manager rule could delete existing rules.
- Fix filtering bug that created invalid XML files.
- Fix NullPointerException in funnelback-dbgather.jar.
- Fix funnelback-dbgather.jar to work when large amounts of memory are required.
- funnelback-dbgather.jar now only outputs valid XML.
- Robustness fixes for form parsing.
- QIE fixed to work with meta collections and instant updates.
- CPU usage when filtering is now throttled, to reduce the load on the server.
- Filecopy collections ignore temporary files.
Released: 2nd June 2009
Some Linux distributions (e.g. RedHat Enterprise 5.0) may not have a required Perl library needed for correct operation (Entities.pm). To ensure this library is installed, please run the following in a command shell:
sudo perl -MCPAN -e 'install HTML::Entities'
If you have not used CPAN before you may need to answer a set of questions before it will start downloading and installing the required package. You will also need to have gcc (or equivalent) present on the machine to compile the library after download. If the package installs correctly you should see a message like "HTML::Entities is up to date.".
Alternatively, if your environment offers the yum package manager, the following command will install the required Perl library:
yum install perl-HTML-Parser
- Support for crawling sites which use Windows Integrated Authentication (e.g. NTLM)
- New Lotus collection type with improved performance when gathering content from Lotus Notes
- Numeric metadata support in PADRE
<s:decode>tag for HTML decoding text (e.g. from other tags)
- URL prefix functionality has been removed - use of result_transforms is recommended instead
- Use of summary.xsl has been removed
Selected improvements and bugfixes
- Numerous fixes for UTF8 characters in search queries
- Support for crawling Sharepoint sites using a standard web collection
- Support for long file paths in filecopy collections (stored using MD5 sum of original path)
filecopy.max_files_stored parameter(specify maximum number of files to store in a filecopy collection)
- Facets now available via
- Database gatherer now supports specification of no primary key or a compound primary key
- Ability to specify a limit on number of results to check for Document Level Security (DLS)
- Fixed spelling suggestions for meta collections on Windows
- Fix the slow changing of permissions on upgrades from very large installations
- Click and tag ranking now fully supported for database collections
- Improved support for UTF-8 characters in database collections
- Fixed problems with 'URL Fill' based facets for filecopy collections
- Allow the thesaurus file to be unsorted
Released : 2nd April 2009
- Any existing tagging databases will need to be exported, and then reimported into the new database format.
- New CSS files will be copied to search.css.dist, instead of to search.css. Changes can be merged manually if desired.
- Allow throttling of filecopy requests (to reduce load on the target server).
- Allow ignoring of text copy protection on gathered PDF files (off by default).
Selected bugfixes and improvements
- Include updated Davisor text filtering framework (Publishor 5.2)
- Thesaurus tag expansion feature fixed.
- Use lower priority for text extraction on Windows to reduce load on search server.
- Filecopy user authentication was moved to the core filecopy settings.
- Documentation improvements.
- Fix filecopy gathering of documents with uppercase extensions.
- Fix fluster "by site" listing for filecopy collections.
- Fix XML parsing error with cached pages for docx files.
- Ignore Microsoft Office temporary files when filecopy gathering.
- Report accurate document sizes for gathered binary documents in filecopy collections.
- Fix click feedback for filecopy collections.
- More useful result summaries for filecopy collections.
- Neaten the display of filecopy URLs.
- Neaten the display of filecopy titles.
- Search and tagging look and feel updates.
- Improve summaries for docx documents.
- Do not display spelling suggestions for expanded queries.
- Fix warning output by updating shipped Regexp::Common.
- Fix reporting bug where data displayed was for a particular month only.
- Remove blank lines from CSV exported query reports.
- Prevent htpasswd.pl creating multiple accounts with the same name.
- Fix tag feedback for filecopy collections.
Released : 16th September 2008
- Upgraded the TRIM adapters integration with TRIM, allowing search results to link directly to the TRIM client software, and allowing integration with TRIM versions 6r2 and later.
- Improved faceted navigation support for categories automatically included from document metadata, including support for phrases, dates, and existing definitions of metadata fields.
- Improved metadata extraction from Office documents
- Fixed problem with tagging page not always returning back to search results page
- Fixed unclear error message in database collections that do not have the correct primary key set
- Fixed problems when doing inline filtering on Persistent Sharepoint collections
- Fixed faceted navigation configuration problem that didn't set the correct PADRE options
- Fixed facet category navigation not resetting the start_rank parameter
- Fixed occasional inaccurate PADRE RMC counts
- Fixed the hard coded windows paths in is_windows() checks
- Fixed incorrect SYSTEM_ROOT under Jetty
- Fixed problem with the 'clive' option with meta-collections
- Fixed problem with the ExternalFilterProvider closing filtered filehandles
- Prevented the upgrade process from removing some pre- and post- commands
- Removed file-lock caused by update.bat that could prevent multiple scheduled updates occurring at once
- Fixed problem with the TRIM gatherer ignoring the 'extracted filetypes' parameter
- Fixed problem with the upgrade process leaving old plugins on the file system
- Fixed timeout issues with unusual noindex_expressions
- Fixed user/group problem with Funnelback/Apache integration
- Fixed problem with complex regular expression featured pages
- Fixed problem with temp files not being deleted from Windows temp directory
- Fixed issue with Win32 Event logging not being properly initialised
- Fixed numerous minor query parsing issues in PADRE
- Fixed case of a userfile not being created on install
- Fixed numerous faceted navigation display problems
- Improved the detection of pre Funnelback 7 installations at install time, and included safety measures to prevent old installations from operating after upgrades.
- Minor documentation corrections
- Fixed featured pages upgrade issue from pre Funnelback 7 installs
- Fixed numerous input validation issues in the administration UI
- Fixed character encoding problem on the tagging login page that caused some users to be unable to log in
- Removed fluster scores from the default UI as they were being confused with counts
- Fixed problem with negation in 'Compare' search UI tags
- Changed default faceted navigation UI to search within the selected categories by default when entering new searches.
- Fixed display bug that caused duplicated tree elements in the fluster search UI.
- Added faceted navigation tags to the default advanced form.
Released : 30th May 2008
- Database collections have changed in layout, and now require an additional 'primary key' parameter. Please see the version 8 database collection upgrade guide for details.
- Perl 5.8.8 is strongly recommended for all platforms:
- Some features do not work out of the box under Perl 5.10 and Solaris.
- Perl 5.8.5 and earlier have a bug in
HTML::Entities, which may lead to incorrect encoding of apostrophes in the Funnelback system.
- Queries are now logged in their expanded form, not their pre-expansion form.
- Document gathering from Microsoft Sharepoint and Lotus Domino
- Faceted navigation
- User tagging of results
- User feedback on results
- Basic Chinese / Japanese / Korean / Thai (CJKT) support
- Feeds API
- Crawling of content behind web forms
- Automatically generated "support package"
- Allow pre/post commands to use collection.cfg parameters
- Broken link detection script for featured pages
- Capability for fetching resources at query time for multiple collection types (databases, filecopy, TRIM)
- Context sensitive help links open in new pages, not the current page
- Display real-time collection update status on the admin UI home page
- Import and export of featured pages and query expansions
- Instant updates support filecopy collections
- Instant update support for more collection types
- Java is bundled with Funnelback
- Logs for a collection go in a collection specific log dir, not the "system logs" dir
- Log text on the "view file" page is more readable
- Numerous improvements to form parsing (fixes for nested tags, res* tags that contain curly braces, etc)
- Option to remove all data during uninstall
- Reporting uses much less memory
- Reports are viewable while they are generating, and a reporting error will no longer leave the reports unusable
- Significantly improved database search, with "workflow" interface, incremental gathering and compressed storage
- Updates for all collection types may now be halted (the halt may not occur until the end of the current update phase for some collection types)
- When upgrading an installation, the license key is preserved
Selected bug fixes
- Add support for filtering .dot (MS Word Template) files
- Admin UI should include crawler.reject_files in its processing of the "file types to crawl" checkboxes
- Allow collection parameter editing security model (parameter whitelists) to be applied on a per collection basis
- Allow / ignore whitespace in various collection parameters
- Ampersands in query* parameters are not parsed correctly
- cache.cgi displays "XML parsing error" for pages in funnelback_documentation
- cache.cgi does not perform security checks
- cache.cgi links do not get properly URL encoded parameters
- cache.cgi should strip meta refresh from its displayed contents to avoid sending users to incorrect locations
- Cached XLS files don't display correctly in IE6
- Can enter empty featured page and query expansion
- Can't map the same xpath to multiple metadata classes
- Change crawler to use MIME type rather than URL suffix when storing binary files
- Check windows password is valid in installer
- .ckpt index files should be removed by default
- click.cgi links does not properly URL encode arguments
- Clicking on filecopy results displays text in the error log
- Click tracking not working by default
- collection.cfg settings not being updated to point at new locations on an upgrade
- Collection parameter whitelist not greying out fields
- Collection summary rows should show successful update (green tick) after a successful index upgrade
- Command line administration / Unix scheduling / Apache integration will not work if the Perl binary is not at /usr/bin/perl
- Command line updates fail if not started from the bin directory
- crawler_binaries parameter not being updated properly on an upgrade
- Creating local collections with an unfindable source directory displays a confusing error message
- _disabled__see_start_urls_file parameter being displayed in update log
- Documentation CSS is indexed in the funnelback_documentation collection
- Enable data reports for web collections on an upgrade
- Filters not picking up title metadata from some Word docs
- Fluster crashes when a query contains "(" or ")"
- Fluster links have redundant CGI parameters
- funnelback_documentation collection shouldn't be deletable from admin interface
- Funnelback installer should complain if empty input is given for some fields
- htpasswd_modify is not fixed in an upgrade
- Improved handling of URL case sensitivity in the crawler
- Incorrect handling of numeric entities in crawled URLs
- Investigate fallback for external filters
- Investigate how to make query expansion work with Fluster
- java_libraries contain duplicated path after upgrade
- Local collection url prefixes don't work as expected
- Long logs are difficult to scroll
- new-collection.pl does not create start.urls file
- Old Jetty HTTPS server not shut down during upgrade
- Padre displays result counts in minresults mode
- PADRE failing to parse XML with empty elements
- Padre date sorts don't work for documents in the 16th / 17th century
- Padre produces invalid XML for some documents that contain ampersands in their title
- Padre segfault under rare combinations of gscopes and metadata searches
- Parsing of meta parameters is broken
- PDF not extracted correctly but output file with binary content was created
- PDF results include shell error output
- Permission errors under IIS
- Remove trailing space in spelling suggestions
- Reporting date routines do not handle leap years
- Report links do not work under IIS
- rss.cgi crashes when xsltproc is not found
- RTF files filtered in trim collections do not have meaningful titles
- Schedule updates page on windows incorrectly handles invalid input
- Security violation displayed when empty filename is submitted for upload
- Start URL parameter in instant update add doesn't check for a protocol
- The "results can't be displayed because this collection has never updated" page looks awful
- Various .cgi files do not have execute permission
- Very rare hang caused by schtasks when upgrading from Funnelback 6.0.x to Funnelback 7.0.x
- Viewing data reports forces the user displayed on the header to "admin"
- Visual bugs when viewing administration under IIS
- When editing a collection, changes are lost when navigating between tabs
- Word expansion does not work with query_* parameters
Released : 22st January, 2008
- Fixed duplication of search_home path in new collections after upgrading
Released : 20th December, 2007
- Fixed resetting of search_home path in collection.cfg on upgrades
- Fixed upgrading of scheduled tasks when changing search_home path
- Fixed URL encoding in the doc parameter of cache URLs
- Fixed problem where ownership of /data directory would be changed on clean installations
Released : 15th November, 2007
- Resolved problems with web server binding to ports below 1024 on Unix platforms
- Resolved problems with scheduling of collection updates under IIS
- Installation now performs a more extensive search for any existing Java installation
- Incorporated patch to improve PowerPoint document filtering
- Additional warnings regarding expired license keys
- Better error messages for users with limited permissions
- Removal of non-functional 'extract all tables' option for database collections
- Improved user deletion process to remove from the web server authentication configuration in addition to Funnelback's
- Improved report charting with removal of lifetime values
- Added support for metadata queries to rss.cgi
- Simplified upgrade process, ensuring the step of upgrading indexes can not be skipped
- Improvements to the upgrade process to support changing the Funnelback user
- Better handling of invalid dates in reporting
- Removal of the funnelback_documentation collection from the public search interface
- Changes to ensure default (required) file rules can not be accidentally deleted
- Fixed problem with 'Show last 100' links which could discard the first line in a file
Released : 26th September, 2007
- Previous versions of Funnelback enforced the creation of a search user - This user may now be selected during the installation.
- Funnelback's installation no longer attempts to configure local web servers (Apache and IIS) to serve Funnelback. An embedded web server is installed, and instructions for configuring Apache and IIS are included in the Funnelback Installing and Upgrading guide.
Key new features and improvements
- Intuitive graphical installation
- A bundled web server for easy setup
- Fluster result clustering
- Update scheduling interface for Windows
- Date based reporting
- Instant updates
- Word 2007 text extraction
- Form security options for advanced configurations
- Improved TRIM support
- Improved filecopy support
- Improved document text extraction
- Improved administration user interface
- Improved default search interface
- Anchors.cgi analysis tool
- Options to control server duplicate detection
- New files created from the admin UI can now be based on a template
- Webcrawler support for specifying preferred site names
- Ability to crawl links within HTML comments
- TRIM live link serving script
- Use regular expressions to exclude parts of page from indexing
- Multiple start URLs
- Support for specifying a cookie to use during a web crawl
- Integrated crawl data statistics reports
- Meta-collection dictionary checking
- Integration of search term highlighting with padre thesaurus expansion
- Quiet mode for xml-splitter.pl
- Selection of IIS web site to configure for Funnelback
- The 'clive' parameter used for dynamic meta collections uses collection names
- Links in file-copy collections can be prefixed to make them valid
- Better report formatting
- Protection from padre-fl killing all documents
- Cache.cgi checks the offline view if the required file is not in the live view
- Stopped swizzling .vec files
- Display padre help on STDOUT rather than STDERR
- Remove window_size and table_size collection.cfg options
- Add timestamps to url_errors.log file in the Webcrawler
- Click tracking editable from the admin UI
- Support a search over the local installed Funnelback documentation
- Change default setting for crawler.check_case_sensitivity to false
- Update collection automatically after collection creation
- Check all/uncheck all option for manually building report database
- Add robots.txt file to prevent crawling of admin UI
- Crawl XML pages by default
- Avoid overloading server with a lot of virtual hosts
Selected bug fixes
- Installation fixed on Windows machine with no hostname
- Null query didn't respect kill bits
- Drop in crawler throughput on multi-site collection
- Max files stored was confusing when restarting from a checkpoint
- Spell.cgi only checked query
- Scheduler validation case problem resolved
- Padre no longer not assumes utf-8 input
- Spell.cgi's pos values were incorrect when multiple terms have the same spelling suggestion
- UTF-8 characters could be mangled in queries
- Featured pages with '#' weren't encoded properly
- File-copy collection, filtering failed when file contains leading spaces.
- Featured pages should not be considered as rank #1 for click tracking
- The HTTP password field in the admin UI was stored incorrectly
- Admin UI didn't allow a max_link_distance value of 0
- Illegal divide by zero in reports-load-queries-log.pl caused report load to fail
- Apostrophe in collection (internal) name did not work
- Some TRIM document extraction errors were not being reported
- Non admin user actions failed under IIS
- TRIM password field was a plain text field
- File permissions on Windows installs were not set correctly
- TRIM logs weren't flushed regularly
- Windows detection code in perl caught other OSes (Cygwin, Darwin)
- Xml.cgi didn't output spelling suggestions on Windows because the call to spell.cgi didn't work
- Deleting a collection did not delete its report data from the reports data directory
- Query_phrase parameter was broken
- Upper case metadata classes incorrectly displayed in forms
- .vmbx extraction in TRIM skipped document contents
- Admin ui showed the copy form section even if the user is not able to use it
- Admin UI stylesheets weren't displayed correctly under IE 7
- Padre output invalid XML when summary buffer overflows
- Empty description metadata showed empty summary field
- Cache.cgi displayed poorly on newer browsers
- Search.cgi fixed with POST requests
- Filemanger confirmation links used GET requests rather than POST
- Title didn't appear for MS word docs with titles
- BASE HREF as returned by cache.cgi was invalid in XHTML documents
- Indexing failed if external-metadata.cfg does not end in a newline
- File copy collections now have their own type
- Collection names returned by padre may have been incorrect
- Removed the basic collection view from the admin UI
- Query Expansion fixed on Windows
- Use of crawler.remove_parameters no longer harms ranking quality
- IUSR_$computername is given read access to C:\WINDOWS\Temp for rss.cgi
- Padre failed to chsize() on windows with index size > 2GB
- Xml.cgi didn't operate correctly meta_
- Improved include text for editing config files
- Spaces in the URL prefix broke the generated live URL.
- Left truncation operator (e.g. *ate) didn't produce any search results
- Padre produced invalid XML when a custom stop words list is used
- Using a custom stop words list with no new line character at end caused segfault
- Blocked users from creating a collection with no include pattern via admin ui
- Scope parameters are now passed with all report URLs
- HTML encode display of query logs to avoid malicious queries
- Create-collection.cgi allows protocol-less start URLs for a web collection
- The Funnelback public UI produced invalid HTML in some cases
- Update logs not always written on windows
- Query_expansion.cfg was required to end in a newline
- Word_expansion.cfg did not show up in the files section
- Padre failed with out of memory errors when told to index an empty or missing directory
- Collection display name was not HTML safe
- Cache.cgi was displaying strangely with query terms that use the '#' operator
- File manager rules for "Create Folders" did not check for invalid internal folder name
- Query/word expansion did not work with certain query_processor_option ordering
- A phrase search for "gibbet maker" did not find the text "gibbet-maker"
- Show-file.cgi now HTML escapes the files it shows
- Featured pages containing HTML now show up correctly in the reporting UI
- Deleting a collection on Windows didn't delete its scheduled updates
- Delete-rules.cgi and delete-folder.cgi now require confirmation through a form post
- Search.cgi didn't HTML escape ampersands in featured page URLs
- Formatting error in query processor options documentation
- Padre-iw did not recognise the combined form of ISO 8601 date/times
- Detect if IIS is not installed in configure_iis_for_funnelback.pl
- Invalid URLs were displayed for local collections in "Documents per Site" report
- Padre QIE didn't work on Solaris (does not upweight matching documents)
- Padre geospatial queries didn't work on Solaris