Generalised scopes (gscopes)
In some applications, it is useful to narrow down a search to particular sub-parts of a collection rather than searching over the entire collection. This can be done in Funnelback by marking a documents as matching or not matching a gscope (general scope), it is then possible to restrict a search to documents which match a gscope or a gscope expression.
For example, imagine a company website that had two major sections, a company news section and a careers section. By setting all the documents in the news section to have gscope 'news' set, and all the documents in the careers section to have gscope 'careers' set, you could enable (along with suitable UI customisation) search over only the news section, or only the careers section. This simple use case can be done directly using faceted navigation, gscopes should only be used directly if faceted navigation is not suitable.
The gscopes system is designed to be flexible in order to support a variety of use cases. Documents can be given multiple gscopes. For example, one document could be given the gscopes 'people', 'staff', 'professor' and 'ultimateFrisbeeMember'. Additionally, a search can be restricted to arbitrary boolean combinations of gscopes. For example, you can instruct the search engine to restrict results to those documents that have gscope 'people', OR have both gscopes 'people' AND 'ultimateFrisbeeMember', as long as they do NOT have gscope 'staff'.
To use the gscopes system, you must set up a gscopes definition file. All of which follow a format of:
<gscope name> <pattern or query that must match for the gscope to be set>
The gscope name is a alpha-numeric ASCII string no longer than 64 characters. White space and all other punctuation is not permitted. Additionally gscopes prefixed with
Fun in any upper or lower case form are reserved for internal use only.
The regex gscope definition file can be created from the administration interface by selecting the collection you wish to use gscopes on, selecting the 'Administer' tab, clicking on 'Browse Collection Configuration Files' and then using the drop down box on the configuration files screen to create a file called gscopes.cfg or query-gscopes.cfg.
Gscopes are automatically applied during the indexing process. You may also specify Gscopes options by setting gscopes.options in collection.cfg.
Gscopes are implemented by allocating a certain additional amount of space within the index for each document. Each document is given one bit for each possible gscope. This means that you must decide what the maximum number of gscopes is going to be beforehand. The default number of gscopes available is 64. If this is not sufficient, then the -GSB indexer option must be set. The -GSB option sets the number of bytes (not bits) that will be allocated for gscope information. The default setting of 64 gscope numbers is therefore equivalent to setting the indexer option -GSB 8. In other words, for the indexer option -GSB n, there will be 8 * n gscope numbers available.
Command line usage
URL pattern gscopes can be applied manually be running the following commands:
/opt/funnelback/bin/padre-gs /opt/funnelback/data/web/offline/idx/index /opt/funnelback/conf/web/gscopes.cfg
c:\funnelback\bin\padre-gs.exe c:\funnelback\data\db\offline\idx\index c:\funnelback\conf\db\gscopes.cfg
Changed gscopes are not autmatically applied to all generations in a push collection. Gscopes are applied to newly committed generations as well as merged generations. To re-apply gscopes to all generations you will need to trigger a Vacuum.
Searching with gscopes
To narrow down a search to a particular gscope, the appropriate query processor option must be set. This can either be done via the collection configuration (which will affect every search), or with a CGI parameter directly at search time (which will only affect one search).
To specify the query processor options in the
<gscope expression> is either:
- a single gscope e.g.
- a reverse Polish gscope expression (see below) e.g.
To use the CGI parameter add the following to your request URL:
<gscope expression> is defined in the same way as above.
The gscope expressions used are reverse Polish expressions. This means that all operands to a logical operation (such as AND, OR, NOT) precede the operator itself. This method helps avoid ambiguity and the need for brackets around complex logical expressions. However it can look quite odd to those unfamiliar with it. In Funnelback, '+' is used to represent the AND operation, '|' represents the OR operation and '!' represents the NOT operation. The best way to understand reverse Polish expressions is with some examples:
|staff||Matches documents which have gscope staff set.|
|staff,student+||Matches documents that have BOTH gscopes staff and student set.|
|56,4|||Matches documents that have gscope 56 OR 4 set.|
|3!||Matches documents that do not have gscope 3 set|
|1,2,3,4|||||Matches documents that have ANY of the gscopes 1,2,3,4|
|1,2,3,4+++||Matches documents that have ALL of the gscopes 1,2,3,4|
For more complex expressions than this, it is important to understand that the expression works as a stack. Reading from left to right, operands (gscope) are pushed onto the stack, while operators (e.g. !, +, |) take off one or two numbers from the stack (one for !, two for + or |) to operate on. To help explain this, here are some further examples:
|3,4!+||Matches documents that have gscope 3, but not 4|
|1,2,3,4|++||Matches documents that have gscope 1, 2 and one or both of 3 and 4.|
|12,23+4|7!+||Matches documents that have gscope '4', OR have both gscopes '12' AND '23', as long as they do NOT have gscope '7'.|