An Introduction to Compassionate Screen Scraping

August 11, 2008 One of the most common quickie projects on the web is to screenscrape a website and play around with its data. These projects are a lot of fun, and can allow for inventive mashups, but often the screepscraping scripts cause unnecessary load on the site's servers due to inconsiderate technique. This is an introduction to the art of compassionate screenscraping.

Filed under pythonscreen-scraping

Scalable Scraping in Clojure

November 24, 2009 A fairly indepth tutorial which takes a look at using Clojure to extract data from webpages, using agents to process data, and a few other knickknacks.

Filed under screen-scrapingclojureagentsconcurrency

Python Content Scraper for OneManga.com

August 8, 2008 I spent a while today writing a fairly kind content scraper for OneManga.com, which shows how to use Python's httplib2 and BeautifulSoup to scrape data with a flexible api and minimal http connections.

Filed under pythonscreen-scraping