Extracts URLs from feeds
A multithreaded, queue-based fetcher adapted from Apache Nutch.
Parser for HTML documents only which uses ICU4J to detect the charset encoding.
A simple fetcher with no internal queues.
Extracts URLs from a sitemap file.
Provides common functionalities for Bolts which emit tuples to the status stream, e.g.
Generates a partition key for a given URL based on the hostname, domain or IP address.
Copyright © 2021 DigitalPebble Ltd. All rights reserved.