Data reports
Introduction
The data reports provide statistics and reports on data gathered during the update process. These reports are only available for web collections.
Reports
Broken link reports
These reports give a breakdown of which URLs contain broken links in the crawled dataset. It is possible to drill down into the report by clicking on the "site" link. CSV exports of the data are also available.
Note: broken links are only checked for pages that were downloaded by the web crawler, and only links to documents matching the include/exclude patterns are checked. This means that external links, and internal links to linked resources (such as images, JS or CSS files) or file types not included in the crawl are not checked.
Generating data reports
Data reports are enabled by default for web collections, and are based on statistics files (named *.stat.md) generated by the web crawler in the log directory. These files are processed by a script called make_report.pl which outputs a set of static HTML reports in $SEARCH_HOME/admin/data_report/<collection>/