From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id C0F82BBAF for ; Tue, 1 Aug 2006 11:11:04 +0200 (CEST) Received: from furbychan.cocan.org (furbychan.cocan.org [80.68.91.176]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id k719AwJu002973 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 1 Aug 2006 11:11:04 +0200 Received: from rich by furbychan.cocan.org with local (Exim 3.35 #1 (Debian)) id 1G7qGm-0002Ai-00; Tue, 01 Aug 2006 10:10:32 +0100 Date: Tue, 1 Aug 2006 10:10:32 +0100 To: Joel Reymont Cc: caml-list Subject: Re: [Caml-list] Web page scraping packages Message-ID: <20060801091031.GA764@furbychan.cocan.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i From: Richard Jones X-j-chkmail-Score: MSGID : 44CF1AA3.000 on concorde : j-chkmail score : X : 0/20 1 X-Miltered: at concorde with ID 44CF1AA3.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocaml:01 bindings:01 notepad:01 2006:98 wrote:01 caml-list:01 caml:02 caml:02 let:03 library:03 tue:06 aug:06 marketing:93 i'm:08 perl:08 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Tue, Aug 01, 2006 at 01:06:52AM +0100, Joel Reymont wrote: > Are there any screen-scraping packages for OCaml? > > I'm looking for something that would let me analyze the contents of a > web page and extract, for example, all the image tags. We did some web scraping using WWW::Mechanize + perl4caml. As a result, perl4caml contains pretty complete bindings for the WWW::Mechanize library. http://merjis.com/developers/perl4caml http://resources.merjis.com/developers/perl4caml/Pl_WWW_Mechanize.www_mechanize.html Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com