Joel Reymont wrote:
> Are there any screen-scraping packages for OCaml?
> 
> I'm looking for something that would let me analyze the contents of a 
> web page and extract, for example, all the image tags.

I don't think of this as screen scraping.  Spidering might be a better word.

I've done a good bit of this in OCaml.  I use the curl package for 
downloading web pages and the netstring package for parsing them.

I'm going to attach a couple of files that I use for this sort of stuff. 
  The file htmltreeutils.ml has a bunch of functions for working with 
the results of a nethtml parse tree.

So your program would look something like this.. and this hasn't been 
tested:

open Htmltreeutils


     let result = Buffer.create 2000 in
     let connection = Curl.init () in
     Curl.set_httpget connection true;
     Curl.set_url connection "http://www.yahoo.com/randompage.html";
     Curl.set_writefunction connection (fun s -> Buffer.add_string 
result s);
     Curl.set_headerfunction connection (fun s -> ());
     Curl.perform connection;
     Curl.cleanup connection;

     let dom = get_parsed_html_from_string result in
     let img_tags = list_tags "img" dom in
     .... do something with img tags here like pull out their src
       attributes


Here are the two helper files: