caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Pierre Weis <pierre.weis@inria.fr>
Cc: vincent@leleu.info, caml-list@inria.fr
Subject: Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
Date: Wed, 23 Oct 2002 00:31:34 +0200	[thread overview]
Message-ID: <20021022223134.GA6028@ice.gerd-stolpmann.de> (raw)
In-Reply-To: <200210222116.XAA05792@pauillac.inria.fr>; from pierre.weis@inria.fr on Die, Okt 22, 2002 at 23:16:19 +0200


Am 2002.10.22 23:16 schrieb(en) Pierre Weis:
> > Version Francaise a la fin
> > ------------------------------
> > 
> > Hello,
> > 
> > I'm writting an ocamllex/ocamlyacc based application that extracts a <string
> > list> of emails embedded in a text/html file.
> > Would anyone of you know of any available implementation I could get
> > inspiration from (and save some time!).
> 
> Really precise parsing of email messages requires implementing the
> RFC822 (more precisely RFC2822 nowadays), which is not a trivial
> task. I started to do it but gave up due to the absence of a scanf
> facility. I launched a thread to implement scanf, and 5 years after I
> understood how to do it in the Caml system!
> 
> Now that we have scanf, I could go on to implement RFC(2)822.
>
> But don't hold your breath: if you don't need a full parser for mail
> messages the simpler way is to write a (false but trivial)
> approximation with a lexer...
> 
> There may be such a program into Xaviers's spamoracle ?

Well, O'caml programming is so much fun that everybody wants to
reinvent the wheel. I really understand that, I'm also tempted
every day.

My wheel came into the world in the spring of 2000, and has grown
since that a lot. It is now called "ocamlnet" after the fusion
with Patrick Doane's wheel, and includes not only a parser for RFC(2)822 
messages, but supports also the MIME RFCs (2045-47), RFC 2231, 
parsing of dates, the ability to parse from pipelines chunk by 
chunk, and last but not least even printers for these (partly 
brain-dead) formats. You also find an HTML parser, and a lot of
other useful stuff. It is now more a mobile construction set than
a wheel.

By the way: if anybody has something to contribute, any addition
that is useful, works, and will be maintained is still accepted.

You find it here:

http://sourceforge.net/projects/ocamlnet
 
Gerd
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2002-10-22 22:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-22 14:27 Vincent Leleu
2002-10-22 21:16 ` Pierre Weis
2002-10-22 22:31   ` Gerd Stolpmann [this message]
2002-10-22 22:43     ` Stefano Zacchiroli
2002-10-22 23:29       ` Gerd Stolpmann
2002-10-23  7:32         ` Stefano Zacchiroli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021022223134.GA6028@ice.gerd-stolpmann.de \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=pierre.weis@inria.fr \
    --cc=vincent@leleu.info \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).