Metadata class types
Introduction
Funnelback supports five types of metadata classes:
- Text: The content of this class is a string of text.
- Number: The content of this field is a numeric value. Funnelback will interpret this as a number. This type should only be used if there is a need to use numeric operators when performing a search (e.g.
X > 2050
). If the field is only required for display within the search results a text type metadata class is sufficient. - Date: Funnelback supports a single date class and will use the values mapped to this class to determine a date for the document for the purpose of ranking, sorting and also date range search. If additional dates are required they should be configured as either text (e.g.
2017-09-24
) or number (e.g.20170924
) type metadata classes. - Geospatial x/y coordinate: The content of this field is a decimal latlong value in the following format: geo-x;geo-y (e.g. 40.6976684;-74.260555) This type should only be used if there is a need to perform a geospatial search (e.g. This point is within X km of another point). If the geospatial coordinate is only required for plotting items on a map then a text type metadata class is sufficient.
- Document permissions: The content of this field is a security lock string defining the document permissions. This type should only be used when working with an enterprise collection that includes document level security and specifies the requirement of a document permissions metadata field.
Metadata class types: text
A text type metadata class has the values interpreted as a text string.
The text can include code such as HTML tags and these will be returned as is by Funnelback. It is the responsibility of the user interface layer to interpret or escape the field content.
Searching textual metadata
Funnelback includes a number of query language and CGI parameters that can be used to search a text type metadata field
CGI parameter | Query language operator | Description |
---|---|---|
meta_CLASSNAME=value |
CLASSNAME:value |
Matching result will contain the term value within the CLASSNAME class. |
meta_CLASSNAME_and=value1+value2 |
+CLASSNAME:value1 +CLASSNAME:value2 |
Matching result will contain the terms value1 AND value2 within the CLASSNAME class. |
meta_CLASSNAME_or=value1+value2 |
[CLASSNAME:value1 CLASSNAME:value2] |
Matching results will contain value1 OR value2 within the CLASSNAME class. |
meta_CLASSNAME_not=value1+value2 |
-CLASSNAME:value1 -CLASSNAME:value2 |
Matching result will not contain the terms value1 AND value2 within the CLASSNAME class. |
meta_CLASSNAME_sand=value1+value2 |
|CLASSNAME:value2 |CLASSNAME:value2 |
The result set will be scoped to items containing value1 AND value2 within the CLASSNAME class before other query constraints are applied. Partially matching results will always include both of these terms in the CLASSNAME class. |
meta_CLASSNAME_orsand=value1+value2 |
|[CLASSNAME:value] |
The result set will be scoped to items containing value1 OR value2 within the CLASSNAME class before other query constraints are applied. Partially matching results will always include either or both of these terms in the CLASSNAME class. |
meta_CLASSNAME_phrase=value1+value2 |
"CLASSNAME:value" |
Matching results will contain the phrase "value1 value2". |
meta_CLASSNAME_prox=value1+value2 |
`CLASSNAME:value` |
Matching results will contain value1 and value2 within 15 words of each other. |
Text metadata can also be sorted alphabetically using the sort=metaCLASSNAME
or sort=dmetaCLASSNAME
parameters. See: sort options for more information on sorting search results.
Metadata class types: date
Funnelback supports a single date-type metadata class using the reserved d
metadata class. The value of this field is interpreted as a date and is assigned as the document's date for the purposes of recency in the ranking algorithm, and also for sort and presentation.
Only a single date value will be assigned to the document. If multiple date metadata fields exist in the document the assigned date is chosen based on the date precedence rules below.
Supported date formats
Name | Format | Example | Notes |
---|---|---|---|
RFC1123 | See RFC1123 and RFC2822 | Wed Mar 08 14:11:00 EST 2000 | |
ISO-8601 | YYYY-MM-DD | 2001-01-31 or 2001-31-01 12:53:01Z or 2001-31-01T12:53:01Z | January 31st 2001 |
14 digits | YYYYMMDDHHmmss | 20091110083016 | November 10th 2009, 8:30:16 am |
6 digits | YYMMDD | 010131 | January 31st 2001 |
Short ISO-8601 | YYYY-MM | 2001-01 | January 2001 |
Very short ISO-8601 | YYYY | 2001 | 2001 |
Non compliant ISO-8601 | YYYY-DD-MM | 2001-31-01 | Although this format is not standards compliant, dates with a middle component greater than 12 are treated this way. Take care though, ambiguous dates (eg January 1st) will be interpreted in YYYY-MM-DD format. |
Abbreviated date | YYMMMDD | 31jan01 | January 31st 2001 |
Long form date | DD MMMM YYYY | 31st january, 2001 or 31 Jan 2001 | Long or short form months accepted, punctuation and 'st' 'nd' optional - "31 January 2001" is also acceptable. |
Long form date, month first | MMMM DD YYYY | January 31st, 2001 | Long or short form months accepted, punctuation and 'st' 'nd' optional - "January 31 2001" is also acceptable. |
Pre-2000 dates | DD MM YY | 31/1/01 or 31-01-01 | Punctuation ignored. The indexer interprets years less than 80 as post 2000, and years greater or equal to 80 as 1980 onwards. It is not recommended. |
A TRIM format | DD/MM/YYYY at h:mm a | 13/6/2007 at 6:51 AM, or 06/12/2007 at 4:51 PM | Used by TRIM record management system |
Non-standard | DD-MM-YYYY | 13-06-2007 or 13/06/2007 | Avoid if possible |
Non-standard | Day, DD Mon YYYY | Wed, 13 Jun 2007 17:26:08 +1000 | At least there is no ambiguity here. |
19 character UTC | yyyyMMddHHmmss.SSSZ | 19970705071122.123Z | The indexer will convert this date from UTC to the server's local time zone. |
Notes:
- All date formats are case insensitive.
- There is no locale support for dates. Month names and abbreviations must be in English.
Date precedence order
When multiple dates are encountered for a document the following precedence order applies:
- External metadata (highest priority)
- The first occurrence in the document of
dc.date
or any metadata source mapped to thed
metadata class. dc.date.modified
dc.date.created
dc.date.issued
- HTTP last modified date (lowest priority)
Searching date metadata
A number of special date parameters are supported via CGI parameters and the query language.
Dates must be specified as DMMMYYYY format. e.g. 1Jan2015
, 5Sep2001
.
CGI parameter | Query language operator | Description |
---|---|---|
meta_d=1Jan2015 |
d=1Jan2015 |
Exact match to the specified date. |
meta_d1=1Jan2015 |
d>1Jan2015 |
Matches all dates greater than the supplied date (after). |
meta_d2=1Jan2015 |
d<1Jan2015 |
Matches all dates less than the supplied date (before). |
meta_d3=1Jan2015 |
d=1Jan2015 d>1Jan2015 |
mMtches all dates greater than or equal to the supplied date (from). |
meta_d4=1Jan2015 |
d=1Jan2015 d<1Jan2015 |
Matches all dates less than or equal to the supplied date (to). |
Parameters can be combined to create date range queries. e.g. the query below would match results with dates after 28th July, 1914 and before 11th November, 1918:
meta_d1=28Jul1914&meta_d2=11Nov1918
Additional day, month and year variants are available for each of the above CGI parameters to facilitate easy form integration. The parameters can be modified further by appending
- day
- month
- year
The example below would match results with dates matching 25th April 1915:
meta_dday=25 meta_dmonth=Apr meta_dyear=1915
The example below would match results with dates from 1st September, 1939 to 2nd September, 1945:
meta_d3day=01 meta_d3month=Sep meta_d3year=1939 meta_d4day=11 meta_d4month=Sep meta_d4year=1945
Note:
d3
andd4
require all three components (day
,month
andyear
) to be providedd
,d1
andd2
do not require all three components. e.g. just the year could be specified.
Date metadata can also be sorted by date by using the sort=date
or sort=adate
parameters. See: sort options for more information on sorting search results.
Metadata class types: number
Defining a metadata class as a number tells Funnelback to interpret the contents of the field as a number. This allows numeric comparisons (==, !=, >=, >, <, <=) to be run against the field, and for numeric ranges to be defined as faceted navigation using the class.
Numeric metadata is only required if you wish to make use of these range comparisons or for numeric range facets. Numbers for the purpose of display in the search results should be defined as text metadata.
The value of a numeric field will contain an integer or float, and this number is interpreted by Funnelback as an 8-byte double. This affects the precision of large and small numerical values when applying range searches against a specific number. The lt_x
and gt_x
operators compare against the exact value specified. Other operators allow a small tolerance, enforced by the accuracy of 8-byte doubles.
Searching numeric metadata
Numeric fields can be queried using CGI parameters. There are no equivalent query language operators for numeric metadata search.
The CGI parameters are:
CGI parameter | Value type | Description |
---|---|---|
lt_CLASS |
float | Performs a "Less than" operation on metadata class |
le_CLASS |
float | Performs a "Less than or equals" operation on metadata class |
gt_CLASS |
float | Performs a "Greater than" operation on metadata class |
ge_CLASS |
float | Performs a "Greater than or equals" operation on metadata class |
eq_CLASS |
float | Performs an "Equals" operation on metadata class |
ne_CLASS |
float | Performs a "Not Equals" operation on metadata class |
Note: The CGI parameters currently work only as scoping operators. There must be a query
to define a result set which is then scoped by lt_x
etc. If there is no query
there will be no results.
Numeric metadata can also be sorted using the sort=metaCLASSNAME
or sort=dmetaCLASSNAME
parameters. See: sort options for more information on sorting search results.
Metadata class types: geospatial x/y coordinate
Defining a field as geospatial type metadata tells Funnelback to interpret the contents of the field as a decimal lat/long coordinate. (e.g. -31.95516;115.85766
). This is used by Funnelback to assign a geospatial coordinate to an indexed item (effectively pinning it to a single point on a map). A geospatial metadata field is useful if you wish to add any location-based search constraints such as show me items within a specified distance to a specified origin point, or sort the results by proximity (closeness) to a specific point.
A geospatial metadata coordinate is not required if you just want to plot the item onto a map in the search results (a text type value will be fine as it's just a text value you are passing to the mapping API service that will generate the map).
Searching geospatial metadata
A number of geospatial CGI parameters are available when searching geospatial metadata. These parameters can be used to scope the search to items with a geospatial coordinate within a specific distance of an origin point.
This allows for a show results near me search when used in conjunction with a user's GPS or browser-derived location coordinates.
CGI parameter | Description |
---|---|
origin=X,Y |
Specifies a coordinate (formatted as x,y e.g. origin=24.543,-2.331 ) that will be used as the reference point for geospatial calculations. |
maxdist=DISTANCE |
Can be used to restrict a search to a DISTANCE (in km) from the origin. (e.g. maxdist=20 ) |
Geospatial metadata can also be sorted by proximity to the origin point by using the sort=prox
or sort=dprox
parameters. See: sort options for more information on sorting search results.
Metadata class types: document permissions
Funnelback interprets the value contained in a document permissions type metadata class as a document lock string describing the access controls that apply to the document.
This is used for enterprise search collections that enforce document level security.
The format of the lockstring is determined by the connector that is used for the repository that is being indexed.
Defining a document permissions type metadata field will prevent all results from the index from being returned unless an appropriate security plugin has been defined. This is to enforce a miniminum level of security over the collection when document level security is enabled. For this reason metadata fields of this type should only be defined when indexing a supported repository type that requires a document permissions metadata field to be defined.
See: document level security for further information.