From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id AAA05902; Wed, 23 Oct 2002 00:31:39 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id AAA06897 for ; Wed, 23 Oct 2002 00:31:38 +0200 (MET DST) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g9MMVb504448; Wed, 23 Oct 2002 00:31:38 +0200 (MET DST) Received: from [212.227.126.160] (helo=mrelayng0.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1847Yn-0003mT-00; Wed, 23 Oct 2002 00:31:37 +0200 Received: from [80.129.104.67] (helo=gate.gerd-stolpmann.de) by mrelayng0.kundenserver.de with asmtp (Exim 3.35 #1) id 1847Yn-0005yz-00; Wed, 23 Oct 2002 00:31:37 +0200 Received: from ice.gerd-stolpmann.de (ice.gerd-stolpmann.de [192.168.0.13]) by gate.gerd-stolpmann.de (Postfix) with ESMTP id DE9BDCCDD; Wed, 23 Oct 2002 00:31:35 +0200 (CEST) Received: from ice (localhost [127.0.0.1]) by ice.gerd-stolpmann.de (Postfix) with ESMTP id E6C56B0B7; Wed, 23 Oct 2002 00:31:34 +0200 (CEST) Date: Wed, 23 Oct 2002 00:31:34 +0200 From: Gerd Stolpmann To: Pierre Weis Cc: vincent@leleu.info, caml-list@inria.fr Subject: Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc Message-ID: <20021022223134.GA6028@ice.gerd-stolpmann.de> References: <200210222116.XAA05792@pauillac.inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200210222116.XAA05792@pauillac.inria.fr>; from pierre.weis@inria.fr on Die, Okt 22, 2002 at 23:16:19 +0200 X-Mailer: Balsa 1.4.0 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Am 2002.10.22 23:16 schrieb(en) Pierre Weis: > > Version Francaise a la fin > > ------------------------------ > > > > Hello, > > > > I'm writting an ocamllex/ocamlyacc based application that extracts a > list> of emails embedded in a text/html file. > > Would anyone of you know of any available implementation I could get > > inspiration from (and save some time!). > > Really precise parsing of email messages requires implementing the > RFC822 (more precisely RFC2822 nowadays), which is not a trivial > task. I started to do it but gave up due to the absence of a scanf > facility. I launched a thread to implement scanf, and 5 years after I > understood how to do it in the Caml system! > > Now that we have scanf, I could go on to implement RFC(2)822. > > But don't hold your breath: if you don't need a full parser for mail > messages the simpler way is to write a (false but trivial) > approximation with a lexer... > > There may be such a program into Xaviers's spamoracle ? Well, O'caml programming is so much fun that everybody wants to reinvent the wheel. I really understand that, I'm also tempted every day. My wheel came into the world in the spring of 2000, and has grown since that a lot. It is now called "ocamlnet" after the fusion with Patrick Doane's wheel, and includes not only a parser for RFC(2)822 messages, but supports also the MIME RFCs (2045-47), RFC 2231, parsing of dates, the ability to parse from pipelines chunk by chunk, and last but not least even printers for these (partly brain-dead) formats. You also find an HTML parser, and a lot of other useful stuff. It is now more a mobile construction set than a wheel. By the way: if anybody has something to contribute, any addition that is useful, works, and will be maintained is still accepted. You find it here: http://sourceforge.net/projects/ocamlnet Gerd ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners