From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 177F6BC6B for ; Sat, 28 Apr 2007 01:12:25 +0200 (CEST) Received: from einhorn.in-berlin.de (einhorn.in-berlin.de [192.109.42.8]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3RNCOvw020682 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Sat, 28 Apr 2007 01:12:24 +0200 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from first (dslb-088-073-111-021.pools.arcor-ip.net [88.73.111.21]) (authenticated bits=0) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id l3RNCNXL002761 for ; Sat, 28 Apr 2007 01:12:23 +0200 Received: by first (Postfix, from userid 501) id DCC4D3AE27F; Sat, 28 Apr 2007 01:12:20 +0200 (CEST) Date: Sat, 28 Apr 2007 01:12:20 +0200 From: Oliver Bandel To: caml-list@inria.fr Subject: Re: [Caml-list] mboxlib reloaded ;-) Message-ID: <20070427231220.GA1507@first.in-berlin.de> References: <20070427135425.GA1161@first.in-berlin.de> <20070427162911.GA10099@furbychan.cocan.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070427162911.GA10099@furbychan.cocan.org> User-Agent: Mutt/1.5.6i X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 X-Miltered: at concorde with ID 46328358.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; bandel:01 in-berlin:01 reloaded:01 0100,:01 0200,:01 bandel:01 in-berlin:01 threading:01 threading:01 abstraction:01 native-code:01 lexer:01 tobe:98 wrote:01 wrote:01 Hi, only a short note, because I tonight will not explore it in detail... On Fri, Apr 27, 2007 at 05:29:11PM +0100, Richard Jones wrote: > On Fri, Apr 27, 2007 at 03:54:25PM +0200, Oliver Bandel wrote: > > Hello, > > > > after two years of doing nothing on it, > > I today found my mboxlib, I started to > > write in 2005. > > > > I have put the mli-file on the web and > > maybe the library itself will follow > > during the next time. > > > > Any feedback, questions and suggestions are welcome. > > > > http://me.in-berlin.de/~first/software/libraries/mboxlib/ > > The source for COCANWIKI[1] contains extensive support for threading > of mail messages, based on JWZ's algorithm: > > http://www.jwz.org/doc/threading.html Nice... you speak of an optimized algorithm for threading. I didn't explored your solution nor did I explored your paper in detail (tomorrow I think I have the time to do it), but IMHO the best thing for handling message-threads is to use tries-datastructure with messgae-id's as identifers (instead of char's, as they are used normally). So: did you reimplemented the tries-datastructure as abstraction on message ID's, or did you made it different? > > You are of course welcome to copy this. If there are any license > issues let me know & I can fix them. > > I'd also like to point you to another useful JWZ doc: > > http://www.jwz.org/doc/mailsum.html Well, the same here: tomorrow I can look at itin more detail; but the problem of fast mbox-usage I today also found out as a problem, as I first time used a test-mbox of about 100 MB. Normally I would use some MB's of size, because I think ths is the normal size; but I had some dscussions on the berlin Linux user group, and some people were anbnoyed that mutt needs some seconds to read in mbox-files of about 80 MB's. So, I then checked my mboxlib and saw that it is quite slow, compared to what I expected ( expect! I did not tried it on my development machine because I have nomutt installed there) and even if native-code smuch faster, it's nevertheless slow... ...so I thought I have to redesign my scanner-stage. (I use Str-module and ocamnllex mixed together; maybe using a plain selfwritten OCaml-scanner might be better here). Ciao, Oliver P.S.: 12 seconds for 100 MB seems tobe quite slow... I very often call the lexer, and that might be done smarter. Maybe your pages will show some useful attempts.