From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id A8BEFBC69 for ; Thu, 19 Jul 2007 13:38:59 +0200 (CEST) Received: from furbychan.cocan.org (furbychan.cocan.org [80.68.91.176]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6JBcxg7000676 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Thu, 19 Jul 2007 13:38:59 +0200 Received: from rich by furbychan.cocan.org with local (Exim 3.35 #1 (Debian)) id 1IBULO-0005li-00; Thu, 19 Jul 2007 12:38:54 +0100 Date: Thu, 19 Jul 2007 12:38:54 +0100 To: Luca de Alfaro Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Fast XML parser Message-ID: <20070719113854.GA19281@furbychan.cocan.org> References: <28fa90930707181458p26eac6e6y7b45018b7c91ca65@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <28fa90930707181458p26eac6e6y7b45018b7c91ca65@mail.gmail.com> User-Agent: Mutt/1.5.9i From: Richard Jones X-Miltered: at concorde with ID 469F4D53.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parser:01 wiki:01 sandbox:01 lib:01 -like:01 syntax:01 wiki:01 syntax:01 indents:01 ...,:98 wrote:01 parsing:01 caml-list:01 parse:02 string:02 On Wed, Jul 18, 2007 at 02:58:35PM -0700, Luca de Alfaro wrote: > I am interested in parsing Wiki markup language that has a few tags, like >
...
, ...,. > These tags are sparse, meaning that the ratio of number of tags / number of > bytes is low. > I would like, given a string (or a stream) with such tags, to parse it as > fast as possible. Efficiency is a primary consideration, and so is > simplicity of the implementation. > Do you have any advice about the library I should be using? There's some code in COCANWIKI which does exactly this: http://sandbox.merjis.com/release Look at the file scripts/lib/wikilib.ml. It's not a particularly clever implementation, but it has a great deal of testing in the real world. As well as -like syntax it also does a lot of standard wiki syntax like '* ' for bullet points, paragraphs, indents for preformatted sections and so on. And it outputs pure unadulterated XHTML. Rich. -- Richard Jones Red Hat