nutch
apache nutch™ is a highly extensible, highly scalable, matured, production-ready web crawler. nutch enables fine grained configuration, relying on [apache hadoop™](https://hadoop.apache.org) data structures, which are great for batch processing.
ADS