Scraping Techniques in Python @ PyCon3 Italy

thumbnail for this post

On May 9th 2009 at 12:20 I was going to present some techniques to scratch data from the web in front of a classroom full of people interested in this subject: it was really a great experience!

 

PDF icon   Slides - Scraping Techniques in Python

 

Abstract

Scraping is a technique with which it is possible to extract informations from websites, used, for example, by search engines to index the contents of the Internet.

Python is well suited to perform operations of this type: methods for the parsing of complex web pages was illustrated and how it is possible to perform automatic logins to websites when authentication forms of changeable structure are present.

Problems deriving from the use of the HTTPS protocol and the correct use of regular expressions was also presented.

Using Python you can automate the browsing of the websites you visit every day!

More informations on the talk page at PyCon3 Italy (webarchive).