Skip to content

Built-in filters: JSON to XML filter

Introduction

The JSONToXML filter converts JSON documents into XML documents.

Enabling

To enable the filter add JSONToXML to the filter chain. Documents will only be filtered if the document has the mime type application/json. You may need to write a custom filter to alter the document type.

If all documents being gathered are JSON documents you can use the ForceJSONMime filter before this filter (e.g. ForceJSONMime:JSONToXML) to set the MIME type to JSON to filter all documents as JSON.

As JSON is a data format and XML is a document format the filter is forced to make minor modifications in an attempt to create valid XML. In general keys in JSON that are turned into XML elements may be modified such that:

  • characters that can not be in in a XML element name are replaced with underscore e.g. "a b" would become <a_b>.
  • if the element does not start with a letter or underscore a underscore will be prepended e.g. "123" would become <_123>. Element content is also be modified such that characters that can not be in XML are removed for example foo: "count\u0000down" would become <foo>countdown</foo>. In general the XML produced will try to be valid version 1.0 XML.

Examples

To add the json-to-xml conversion to the existing filter chain:

filter.classes=<default_filter_chain>:JSONToXML

where <default_filter_chain> is the default value for filter.classes.

Sometimes the remote server sends an incorrect MIME type and JSON documents are not correctly identified. Most JSON based collections index only JSON so an additional filter has been provided to force the JSON mime type on all files that are crawled.

To force Funnelback to treat all documents as JSON and convert all entries to XML:

filter.classes=<default_filter_chain>:ForceJSONMime:JSONToXML

Downloading JSON on web collections

To allow the web crawler to download JSON documents you may need to add json to the crawler.non_html option.

JSON to XML conversion example

The JSONToXML filter uses the field names from the JSON file when generating the XML for indexing.

For example the JSON file:

{
  "items": [
    {
      "title":"value",
      "subject":"value2",
      "url":"http://mysite/item45745.html"
    },
    {
      "title":"value3",
      "subject":"value4",
      "url":"http://mysite/item12544.html"
    }
  ]
}

is converted to:

<json>
  <items>
    <title>value</title>
    <subject>value2</subject>
    <url>http://mysite/item45745.html</url>
  </items>
  <items>
    <title>value3</title>
    <subject>value4</subject>
    <url>http://mysite/item12544.html</url>
  </items>
</json>

Example: An un-named array of objects is placed inside an <array> element.

{
  [
    {
      "title":"value",
      "subject":"value2",
      "url":"http://mysite/item45745.html"
    },
    {
      "title":"value3",
      "subject":"value4",
      "url":"http://mysite/item12544.html"
    }
  ]
}

is converted to:

<json>
  <array>
    <title>value</title>
    <subject>value2</subject>
    <url>http://mysite/item45745.html</url>
  </array>
  <array>
    <title>value3</title>
    <subject>value4</subject>
    <url>http://mysite/item12544.html</url>
  </array>
</json>

The fields can be mapped to metadata using the normal rules for XML field mapping.

Metadata class configuration:

Metadata class name Metadata class type Source fields Source type
itemTitle text //title XML
itemDescriptors text //subject XML

The following additional XML special configuration can optionally be set if one of the fields contains a URL that should be the target URL when a result for row is clicked.

See also

top

Funnelback logo
v15.24.0