Padre Cooler Options
Description
This page describes the possible options for tuning the ranking using the cool
query processor option. For more information about how raking works, see Funnelback_Ranking_Algorithms.
Those options can either be set in Query processor options (collection.cfg) or using CGI parameters (e.g. ...&cool.2=12&cool.3=34...
).
List of cooler options
Number | Description |
---|---|
0 | content: content weight |
1 | onlink: onsite link weight |
2 | offlink: offsite link weight |
3 | urllen: URL length weight |
4 | qie: external evidence (qie) weight |
5 | date_proximity: proximity to current date weight |
6 | urltype: URL attractiveness (Homepages favoured. Copyright pages and URLS with lots of punctuation deprecated.) |
7 | annie: annotation weight (annie) |
8 | domain_weight: weight associated with this domain |
9 | geoprox: geographical proximity to origin |
10 | nonbin: non-binariness (1 for html, xml, txt, 0 otherwise) |
11 | no_ads: freedom from ads |
12 | imp_phrase: implicit phrase match score |
13 | consistency: consistency of evidence. (Extra reward for docs with non-zero scores on both content and annie.) |
14 | log_annie: logarithm of annotation weight (log(annie)) |
15 | anlog_annie: absolute-normalised logarithm of annotation weight. |
16 | annie_rank: annotation rank = (k - rank)/ k. where k = 2 x highest rank requested - if rank > k, rank = k |
17 | BM25F: field-weighted Okapi score |
18 | an_okapi: absolute-normalised Okapi score. |
19 | BM25F_rank: field-weighted Okapi rank. |
20 | mainhosts: bias in favour of principal servers (web search only). |
21 | comp_wt: component collection weighting. (meta collections only). |
22 | document_number: document number in the crawl. An early position in the crawl may correlate with importance |
23 | host_incoming_link_score |
24 | host_click_score |
25 | host_linking_hosts_score |
26 | host_linked_hosts_score |
27 | host_rank_in_crawl_order_score |
28 | host_domain_shallowness_score |
29 | doc_matches_regex: document matches administrator supplied regex |
30 | doc_does_not_match_regex: document does not match administrator supplied regex |
31 | titleWords: number of words in title |
32 | contentWords: number of indexed words in document |
33 | compressionFactor: compressibility of document text |
34 | entropy: entropy of document |
35 | stopwordFraction: fraction of stopwords in the document |
36 | stopwordCover: fraction of stopword list present in the document |
37 | averageTermLen: average term length |
38 | distinctWords: number of distinct words in the document |
39 | maxFreq: frequency of most frequently occurring term |
40 | titleWords_neg: Neg number of words in title |
41 | contentWords_neg: Neg number of indexed words in document |
42 | compressionFactor_neg: Neg compressibility of document text |
43 | entropy_neg: Neg entropy of document |
44 | stopwordFraction_neg: Neg fraction of stopwords in the document |
45 | stopwordCover_neg: Neg fraction of stopword list present in the document |
46 | averageTermLen_neg: Neg average term length |
47 | distinctWords_neg: Neg number of distinct words in the document |
48 | maxFreq_neg: Neg frequency of most frequently occurring term |
49 | titleWords_abs: Abs number of words in title |
50 | contentWords_abs: Abs number of indexed words in document |
51 | compressionFactor_abs: Abs compressibility of document text |
52 | entropy_abs: Abs entropy of document |
53 | stopwordFraction_abs: Abs fraction of stopwords in the document |
54 | stopwordCover_abs: Abs fraction of stopword list present in the document |
55 | averageTermLen_abs: Abs average term length |
56 | distinctWords_abs: Abs number of distinct words in the document |
57 | maxFreq_abs: Abs frequency of most frequently occurring term |
58 | titleWords_abs_neg: Abs number of words in title |
59 | contentWords_abs_neg: Neg abs number of indexed words in document |
60 | compressionFactor_abs_neg: Neg abs compressibility of document text |
61 | entropy_abs_neg: Neg abs entropy of document |
62 | stopwordFraction_abs_neg: Neg abs fraction of stopwords in the document |
63 | stopwordCover_abs_neg: Neg abs fraction of stopword list present in the document |
64 | averageTermLen_abs_neg: Neg abs average term length |
65 | distinctWords_abs_neg: Neg abs number of distinct words in the document |
66 | maxFreq_abs_neg: Neg abs frequency of most frequently occurring term |
67 | lexical_span_score |
68 | doc_matches_cgscope1: Documents which match gscope defined by -cgscope1 (if defined) |
69 | doc_matches_cgscope2: Documents which match gscope defined by -cgscope2 (if defined) |
70 | doc_does_not_match_cgscope1: Documents which do not match gscope defined by -cgscope1 (if defined) |
71 | doc_does_not_match_cgscope2: Documents which do not match gscope defined by -cgscope2 (if defined) |
72 | raw_annie: Untransformed annie score linealry scaled to 0..1 |
Values
Values are unbounded, but typical weights range from 0-100.
Example
To set the query processor to ignore URL length, but give a high weight to phrase matches implied by the query:
query_processor_options=-cool.3=0 -cool.12=100