From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id A4FD1BB84 for ; Tue, 1 Aug 2006 02:06:53 +0200 (CEST) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.174]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7106r9W030258 for ; Tue, 1 Aug 2006 02:06:53 +0200 Received: by ug-out-1314.google.com with SMTP id e2so1078209ugf for ; Mon, 31 Jul 2006 17:06:53 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:mime-version:content-transfer-encoding:message-id:content-type:to:from:subject:date:x-mailer; b=amG1ImYYagk6vhgN3f3dqdiOUML8vJu4CC4II7AWrxzxzpqh6M7GAANaRjLO9CdFwl3APLkyRIjiAQkDjeprD0I9a6UpLocO4ntc57B9/jZR5cLbC1hD59EtneByC2CqiQZSVwnKbjaYmzqphaf9kc2rIcGUMMTisRj8W4FNPg4= Received: by 10.78.147.3 with SMTP id u3mr73257hud; Mon, 31 Jul 2006 17:06:52 -0700 (PDT) Received: from ?192.168.0.101? ( [88.3.13.8]) by mx.gmail.com with ESMTP id 4sm2033641hue.2006.07.31.17.06.51; Mon, 31 Jul 2006 17:06:52 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.2) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: caml-list From: Joel Reymont Subject: Web page scraping packages Date: Tue, 1 Aug 2006 01:06:52 +0100 X-Mailer: Apple Mail (2.752.2) X-j-chkmail-Score: MSGID : 44CE9B1D.000 on nez-perce : j-chkmail score : XXXX : 5/20 2 X-Miltered: at nez-perce with ID 44CE9B1D.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocaml:01 let:03 somewhat:05 folks:07 i'm:08 i'm:08 example:10 ruby:11 packages:12 packages:12 image:87 but:13 slow:13 something:14 something:14 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_BY_IP autolearn=disabled version=3.0.3 Folks, Are there any screen-scraping packages for OCaml? I'm looking for something that would let me analyze the contents of a web page and extract, for example, all the image tags. I'm using Ruby for this at work and something like hpricot [1] is very neat but also somewhat slow. Thanks, Joel [1] http://code.whytheluckystiff.net/hpricot/ -- http://wagerlabs.com/