URLFilter which discards URLs based on the robots.txt directives. This is
meant to be used on small, limited crawls where the number of hosts is
finite. Using this on a larger or open crawl would have a negative impact on
performance as the filter would try to retrieve the robots.txt files for any
host found, unless fromCacheOnly is set to true, in which case the
performance will be preserved at the cost of coverage.
The filter is configured like so"