Document Metadata

Metadata is information about a document, such as who wrote it, who published it etc.

HTML allows you to embed metadata about a document within the document itself. Browsers do not display the metadata information but you can see it if you select view page source in your web browser. Metadata may also be represented externally.

Funnelback allows pages to be indexed with metadata taken from a file of external metadata.

There are many schemes for representing metadata. Funnelback supports the following metadata formats:

It also treats the URL of the web page and links, mailtos, referring anchortext etc. as useful metadata.

A general web search engine like Funnelback should allow metadata search over pages which use different metadata schemata. Funnelback does this by mapping multiple ways of representing particular metadata into a single metadata search class, represented by a lowercase letter. For example the Netscape author and the Dublin Core DC.Creator fields are mapped to class "a".

The Funnelback query language supports search over metadata classes. For example the query t:vice-chancellor means, "look for documents with vice-chancellor in their title metadata, regardless of how the title metadata is actually represented".

