Skip to content

Numerical metadata

Introduction

Funnelback supports search over numeric data stored in metadata fields. These fields can be defined in either:

Numeric fields can be queried using CGI parameters.

The CGI parameters are:

CGI ParameterValue TypeDescription
lt_classfloatPerforms a "Less than" operation on metadata class
le_classfloatPerforms a "Less than or equals" operation on metadata class
gt_classfloatPerforms a "Greater than" operation on metadata class
ge_classfloatPerforms a "Greater than or equals" operation on metadata class
eq_classfloatPerforms an "Equals" operation on metadata class
ne_classfloatPerforms a "Not Equals" operation on metadata class

Assumptions

The following assumptions are made by the indexer and query processor:

  • a numeric field will not contain any characters other than whitespace before the numeric quantity.
  • all numeric quantities are stored as an 8-byte double. It is assumed that this is sufficiently accurate.
  • there is no understanding of the semantics of numeric quantities and no conversion of units. If the raw data mixes litres, cubic inches and cubic centimetres, the data will have to be converted prior to indexing.
  • The lt_x and gt_x operators compare against the exact value specified. Other operators allow a small tolerance, enforced by the accuracy of 8-byte doubles.

How to index numerical data

The numerical range metadata can be represented in three different ways:

  1. via meta elements in HTML (or XML),
  2. via XML elements,
  3. via attributes of XML elements.

Example

Example metamap.cfg

# Numerical metadata fields relating to cars
weight,3,weight
acceleration,3,acceleration
capacity,3,engine_capacity
price,3,price

Example xml.cfg

PADRE XML Mapping Version: 2
# Supports numerical metadata either through elements or attributes.
document,/car
docurl,/car/url
t,1,,//title
description,0,,//description
weight,3,,//weight
acceleration,3,,//acceleration
capacity,3,,//engine_capacity
price,3,,//price
weight,3,,/car@weight
acceleration,3,,/car@acceleration
capacity,3,,/car@engine_capacity
price,3,,/car@price

No special settings are needed for indexing, but the appropriate query processor options (-SF=<numeric metadata classes> and -SM=both) will need to be set in collection.cfg to ensure that the numeric fields appear in the result packet. For the example above:

query_processor_options=-SM=meta -SF=[weight,capacity,acceleration,price] -SBL=2000

Example XML document which the above xml.cfg applies to:

<car>
    <url>http://www.bmw.com.au/scripts/main.asp?PageID=11768&ModelID=1000079&ModelCategoryID=10</url>
    <title>BMW model X95</title>
    <meta name='description' content='The only BMW sports car with the ability to plough a field!'/>
    <weight>1056.9</weight>
    <acceleration>30.9</acceleration>
    <engine_capacity>5500</engine_capacity>
    <price>165300</price>
</car>
<car weight='1312.8' acceleration='15.2' engine_capacity='2293' price='65800'>
    <url>http://www.bmw.com.au/scripts/main.asp?PageID=11768&ModelID=1000116&ModelCategoryID=10&Screen=LaunchPage</url>
    <title>BMW model X100</title>
    <meta name='description' content='The only BMW sports car which does not seem out of place when shopping for groceries.'/>
</car>

To find all the BMW cars costing less than or equal to one hundred thousand dollars with acceleration between 10 and 20, you would require a CGI query string as follows:

query=BMW&le_price=100000&ge_acceleration=10&le_acceleration=20

Caveats

  • This capability is not currently available via the >, < and = operators in the query language (e.g. query=price>20)
  • The CGI parameters currently work only as scoping operators. There must be a query to define a result set which is then scoped by lt_x etc. If there is no query there will be no results.

See also

top

Funnelback logo
v15.12.0