Query independent evidence (QIE)
Query independent evidence (QIE) is used to assign rank weightings to documents, without consideration of the user's query. For example, documents from a particular website may be up-weighted, while documents of a particular filetype may be down-weighted. QIE is configured through the qie.cfg file.
QIE can be used for a variety of purposes. For example:
- Externally computed page popularities, such as PageRanks can be used to promote popular pages.
- Weights can be associated with particular document formats — e.g. 1.0 for HTML, 0.5 for PDF and 0 for PPT. This can be used to discourage (but not prevent) the return of certain types of document.
- Weights associated with particular websites or part of sites can be used to bias results towards an important main site.
- Spam scores (0 for spam and 1 for non-spam) can be used to bias against (but not prevent) the display of spam results.
QIE configuration files
QIE configuration files are typically placed in:
The file-manager can be used to create and edit this file. Lines in the configuration file have the following format:
# comment line qie_score url_pattern
qie_score is a floating point number (assumed normalised to the range 0-1, specifying the qie score to be applied)
url_pattern is a perl5 syntax regular expression to be matched against name strings in the .urls file (usually URLs)
comment-line is an ignored line starting with a hash
# down-weight pages from all states bar Western Australia 0.25 ^(https://)?[^/]*nsw\.gov\.au/ 1.0 ^(https://)?[^/]*wa\.gov\.au/ 0.25 ^(https://)?[^/]*sa\.gov\.au/ 0.25 ^(https://)?[^/]*nt\.gov\.au/
Each indexed URL is matched against every URL pattern, stopping at the first match. If none match, the default score passed to padre-qi is used. Note that PADRE strips "http:// " from URLs which start with that. Consequently, don't include "http:// " in your URL patterns.
QIE configuration file processing
The QIE configuration file is used by the utility program padre-qi to create a PADRE readable binary file, used at query time. The QIE configuration file cannot be used directly by padre-sw at query time, and must be pre-processed with padre-qi.
At indexing time, should a collection have a qie.cfg file, Funnelback will automatically run padre-qi and process the qie.cfg file into a PADRE readable binary file. During this process, documents which are not matched by patterns in the qie.cfg file will be assigned a default score of 0.5. Matched documents with a QIE score less than this will effectively be down-weighted, and documents with a QIE score greater than this will effectively be up-weighted.
QIE configurations are not autmatically applied to all generations in a push collection. QIE configurations are applied to newly committed generations as well as merged generations. To re-apply QIE configurations to an entire Push collection you will need to trigger a Vacuum.
Advanced QIE configuration file processing
Expert users may run padre-qi in a post-index action over any QIE configuration file that they choose. This can be used to change the default QIE score assigned to documents, or for more advanced functionality. In this case, users may disable "normal" QIE processing by removing the default qie.cfg file through the file-manager.
padre-qi index_stem file_of_url_patterns default_score
the collections index stem, typically "live/idx/index"
a QIE configuration file
the default QIE score to assign to documents which are not matched by the configuration filee.g.:
padre-qi $SEARCH_HOME/data/collection/live/idx/index $SEARCH_HOME/conf/collection/qie.cfg 0.25
The binary file created will be called:
How do I configure padre-sw to use the binary QIE file?
padre-sw must be configured to use the generated binary QIE file at query time. To do this set the cool.4 query processor option as outlined in the cooler ranking options or set the cool.4 CGI parameter.