caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Query: email parser in ocamllex/ocamlyacc
@ 2002-10-22 14:27 Vincent Leleu
  2002-10-22 21:16 ` Pierre Weis
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Leleu @ 2002-10-22 14:27 UTC (permalink / raw)
  To: caml-list

Version Francaise a la fin
------------------------------

Hello,

I'm writting an ocamllex/ocamlyacc based application that extracts a <string
list> of emails embedded in a text/html file.
Would anyone of you know of any available implementation I could get
inspiration from (and save some time!).

Thanks a lot,

Vincent Leleu

-------------------------------

Bonjour,

Je suis en train d'ecrire une application basee sur ocamllex/ocamlyacc.
L'application est destinee a extraire les emails (vers une structure <string
list>) contenus dans un texte ou document html.

Quelqu'un sait-il si une implementation de ceci existe deja afin que je
puisse m'en inspirer (et economiser mon temps!).

D'avance merci,

Vincent Leleu
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
  2002-10-22 14:27 [Caml-list] Query: email parser in ocamllex/ocamlyacc Vincent Leleu
@ 2002-10-22 21:16 ` Pierre Weis
  2002-10-22 22:31   ` Gerd Stolpmann
  0 siblings, 1 reply; 6+ messages in thread
From: Pierre Weis @ 2002-10-22 21:16 UTC (permalink / raw)
  To: vincent; +Cc: caml-list

> Version Francaise a la fin
> ------------------------------
> 
> Hello,
> 
> I'm writting an ocamllex/ocamlyacc based application that extracts a <string
> list> of emails embedded in a text/html file.
> Would anyone of you know of any available implementation I could get
> inspiration from (and save some time!).

Really precise parsing of email messages requires implementing the
RFC822 (more precisely RFC2822 nowadays), which is not a trivial
task. I started to do it but gave up due to the absence of a scanf
facility. I launched a thread to implement scanf, and 5 years after I
understood how to do it in the Caml system!

Now that we have scanf, I could go on to implement RFC(2)822.

But don't hold your breath: if you don't need a full parser for mail
messages the simpler way is to write a (false but trivial)
approximation with a lexer...

There may be such a program into Xaviers's spamoracle ?

Best regards,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/

> -------------------------------
> 
> Bonjour,
> 
> Je suis en train d'ecrire une application basee sur ocamllex/ocamlyacc.
> L'application est destinee a extraire les emails (vers une structure <string
> list>) contenus dans un texte ou document html.
> 
> Quelqu'un sait-il si une implementation de ceci existe deja afin que je
> puisse m'en inspirer (et economiser mon temps!).

L'analyse syntaxique précise des messages électroniques nécessite
l'implémentation de la RFC822 (plus précisément la RFC2822
maintenant), ce qui n'est pas trivial. J'ai essayé une fois mais j'ai
arrêté à cause de l'absence d'une fonction scanf. J'ai alors lancé une
sous-tâche: implémenter scanf, et 5 ans après j'ai enfin compris
comment le faire en Caml!

Maintenant que nous avons scanf, je devrais revenir d'interruption et
me remettre à implémenter la RFC(2)822.

Mais n'attendez pas une distribution rapide: si vous n'avez pas besoin
d'un analyseur très précis le plus simple est d'en écrire une
approximation (fausse mais triviale) à l'aide d'un lexeur...

Il y a sans doute un tel programme dans le filtre spamoracle de Xavier...

Cordialement,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
  2002-10-22 21:16 ` Pierre Weis
@ 2002-10-22 22:31   ` Gerd Stolpmann
  2002-10-22 22:43     ` Stefano Zacchiroli
  0 siblings, 1 reply; 6+ messages in thread
From: Gerd Stolpmann @ 2002-10-22 22:31 UTC (permalink / raw)
  To: Pierre Weis; +Cc: vincent, caml-list


Am 2002.10.22 23:16 schrieb(en) Pierre Weis:
> > Version Francaise a la fin
> > ------------------------------
> > 
> > Hello,
> > 
> > I'm writting an ocamllex/ocamlyacc based application that extracts a <string
> > list> of emails embedded in a text/html file.
> > Would anyone of you know of any available implementation I could get
> > inspiration from (and save some time!).
> 
> Really precise parsing of email messages requires implementing the
> RFC822 (more precisely RFC2822 nowadays), which is not a trivial
> task. I started to do it but gave up due to the absence of a scanf
> facility. I launched a thread to implement scanf, and 5 years after I
> understood how to do it in the Caml system!
> 
> Now that we have scanf, I could go on to implement RFC(2)822.
>
> But don't hold your breath: if you don't need a full parser for mail
> messages the simpler way is to write a (false but trivial)
> approximation with a lexer...
> 
> There may be such a program into Xaviers's spamoracle ?

Well, O'caml programming is so much fun that everybody wants to
reinvent the wheel. I really understand that, I'm also tempted
every day.

My wheel came into the world in the spring of 2000, and has grown
since that a lot. It is now called "ocamlnet" after the fusion
with Patrick Doane's wheel, and includes not only a parser for RFC(2)822 
messages, but supports also the MIME RFCs (2045-47), RFC 2231, 
parsing of dates, the ability to parse from pipelines chunk by 
chunk, and last but not least even printers for these (partly 
brain-dead) formats. You also find an HTML parser, and a lot of
other useful stuff. It is now more a mobile construction set than
a wheel.

By the way: if anybody has something to contribute, any addition
that is useful, works, and will be maintained is still accepted.

You find it here:

http://sourceforge.net/projects/ocamlnet
 
Gerd
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
  2002-10-22 22:31   ` Gerd Stolpmann
@ 2002-10-22 22:43     ` Stefano Zacchiroli
  2002-10-22 23:29       ` Gerd Stolpmann
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Zacchiroli @ 2002-10-22 22:43 UTC (permalink / raw)
  To: caml-list

On Wed, Oct 23, 2002 at 12:31:34AM +0200, Gerd Stolpmann wrote:
> My wheel came into the world in the spring of 2000, and has grown
> since that a lot. It is now called "ocamlnet" after the fusion

BTW, ocamlnet IMO is lacking documentation.
All .mli are really well commented but there is no out-of-band
documentation like the really goog pxp manual or examples for the
various ocamlnet modules.

Are you planning to write something like that?

TIA,
Cheers.

-- 
Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy
zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro
"I know you believe you understood what you think I said, but I am not
sure you realize that what you heard is not what I meant!" -- G.Romney
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
  2002-10-22 22:43     ` Stefano Zacchiroli
@ 2002-10-22 23:29       ` Gerd Stolpmann
  2002-10-23  7:32         ` Stefano Zacchiroli
  0 siblings, 1 reply; 6+ messages in thread
From: Gerd Stolpmann @ 2002-10-22 23:29 UTC (permalink / raw)
  To: Stefano Zacchiroli; +Cc: caml-list


Am 2002.10.23 00:43 schrieb(en) Stefano Zacchiroli:
> On Wed, Oct 23, 2002 at 12:31:34AM +0200, Gerd Stolpmann wrote:
> > My wheel came into the world in the spring of 2000, and has grown
> > since that a lot. It is now called "ocamlnet" after the fusion
> 
> BTW, ocamlnet IMO is lacking documentation.
> All .mli are really well commented but there is no out-of-band
> documentation like the really goog pxp manual or examples for the
> various ocamlnet modules.
> 
> Are you planning to write something like that?

Yes, a manual is really needed. I have currently not enough time
to do it. Maybe I find time for certain special themes... I could
imagine an introduction to netchannels with some references to
examples would already do most of the job.

Gerd

------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Query: email parser in ocamllex/ocamlyacc
  2002-10-22 23:29       ` Gerd Stolpmann
@ 2002-10-23  7:32         ` Stefano Zacchiroli
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Zacchiroli @ 2002-10-23  7:32 UTC (permalink / raw)
  To: caml-list

On Wed, Oct 23, 2002 at 01:29:02AM +0200, Gerd Stolpmann wrote:
> Yes, a manual is really needed. I have currently not enough time
> to do it. Maybe I find time for certain special themes... I could
> imagine an introduction to netchannels with some references to
> examples would already do most of the job.

Yes this would surely be good, but I'm also thinking about an
introduction to the CGI module with some examples.

This can be helpful in improving ocaml visibility on the server side
scripting world.

Cheers.

-- 
Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy
zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro
"I know you believe you understood what you think I said, but I am not
sure you realize that what you heard is not what I meant!" -- G.Romney
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-10-23  7:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-22 14:27 [Caml-list] Query: email parser in ocamllex/ocamlyacc Vincent Leleu
2002-10-22 21:16 ` Pierre Weis
2002-10-22 22:31   ` Gerd Stolpmann
2002-10-22 22:43     ` Stefano Zacchiroli
2002-10-22 23:29       ` Gerd Stolpmann
2002-10-23  7:32         ` Stefano Zacchiroli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).