Hostharvest

Hostharvest is a simple-minded crawler that wanders the web harvesting hostnames as it walks around. This is in itself not interesting, but the set of harvested hostnames can form a basis for other programs generating statistics based on hostnames.

If you have been visited by Hostharvest and have questions concerning the visit, please read the frequently asked questions.

Please note, that since the number of hostnames assigned globally is so huge, the crawler has been configured to prefer wandering down urls in the .dk toplevel domain, so do not expect that many non .dk hostnames to be available. However, this is simply a configuration issue, so if you are interested in another toplevel domain, you can always run your own instance of the crawler locally.

I have not yet released an official version of the crawler, but I will gladly mail you a tar-ball containing it, if you want a peek. The crawler is licensed under the GNU GPL.

Feel free to contact me at .


 

DocBook

Updated: Mon, 6 Jun 2005

Home