Information retrieval

Web crawler

The web crawler is a program that automatically finds and downloads web pages.

Open source web crawlers

Java crawlers

  • Heritrix -  Internet Archive’s web crawler