From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 9CA50BB91 for ; Tue, 11 Jan 2005 18:55:40 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j0BHteOq024263 for ; Tue, 11 Jan 2005 18:55:40 +0100 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA16866 for ; Tue, 11 Jan 2005 18:55:39 +0100 (MET) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j0BHtbUh004376 for ; Tue, 11 Jan 2005 18:55:38 +0100 Received: from [192.168.1.200] (ppp201-179.lns1.syd3.internode.on.net [203.122.201.179]) by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id j0BHtQjA060916; Wed, 12 Jan 2005 04:25:27 +1030 (CST) Subject: Re: [Caml-list] Thread safe Str From: skaller Reply-To: skaller@users.sourceforge.net To: Gerd Stolpmann Cc: Chris King , "O'Caml Mailing List" In-Reply-To: <1105445337.15581.82.camel@ice.gerd-stolpmann.de> References: <1105415669.3534.55.camel@pelican.wigram> <875c7e0705011023033fa114ef@mail.gmail.com> <1105430813.3534.116.camel@pelican.wigram> <1105445337.15581.82.camel@ice.gerd-stolpmann.de> Content-Type: text/plain Message-Id: <1105466124.2574.48.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 12 Jan 2005 04:55:26 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 41E4131C.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 41E41319.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 sourceforge:01 gerd:01 stolpmann:01 wrote:01 frisch:01 cardelli:01 frisch:01 dfa:01 cduce:01 cduce:01 iterator:01 dfa:01 iterator:01 glebe:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.0 X-Spam-Level: On Tue, 2005-01-11 at 23:08, Gerd Stolpmann wrote: > Capturing just for the purpose of string extraction is not problematic. Unfortunately you're wrong. I had in fact hoped this was not the case, but having read a couple of paper on it, I now know that, unfortunately, they are. (See below). Alain Frisch is one of the leaders in this area, perhaps he can explain it better. You can get Cardelli and Frisch paper here: http://felix.sf.net/papers/greedy.pdf another by Ville Laurikari here: http://felix.sf.net/papers/spire2000-tnfa.ps and his Master thesis on the topic here: http://felix.sf.net/papers/regex-submatch.ps [and if you can figure out how to actually build a tagged DFA after reading any of that please let me know .. ] Frisch's algorithm used in CDuce works with a forward pass that ignores captures, but records the automaton states, and then a backwards pass the extracts the information (CDuce actually builds trees). Unfortunately this has a problem: it requires a bidirectional iterator, whereas a DFA only requires an input iterator. (NFA's require forward iterators) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net