From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=5.0 tests=AWL,DNS_FROM_RFC_POST, SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id D7896BBAF for ; Tue, 21 Apr 2009 09:19:34 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArQCAH4Q7UnRVdyuimdsb2JhbACVfj8BAQEKCQwHDwWsB49iAQMBA4N6BoYx X-IronPort-AV: E=Sophos;i="4.38,431,1233529200"; d="scan'208";a="28011192" Received: from mail-fx0-f174.google.com ([209.85.220.174]) by mail1-smtp-roc.national.inria.fr with ESMTP; 21 Apr 2009 09:19:34 +0200 Received: by fxm22 with SMTP id 22so2254608fxm.9 for ; Tue, 21 Apr 2009 00:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=6aAKXTnJYL6QZzGqY1FnekF9uE0pRXdiV/5WVM6ZkfU=; b=qYFLnl7By1ZrZ9YoVcKdDkXYfoCoH+YDEy49iWgcroZ1XKEDxEj/kQMf4InHVWWa+G NeC1Al8cgKqhVOciLFvexVEN51VDIJBuk48U9+t2XoZi+7JrRNJzKjPrCaVJNX9g2E6K /QdrbLH1zixLwE05q6g3eOscibdPRgKHuEe70= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=W/ucjyPXslbFGPkmXZSSHHWqPNX+SLUPXQuQMgap/yFE8a8xfqg1xrwzIhMXyfNcSg 02erfTGlS0rGMwW6nQm+Q44rHXGVYVkq0ED544dhzrzxhH3MMtx1052M9Be8/uf0H/Fd tCTeLZi1T/YkFsdF5eXYKKb2QJKZNBxtr1Brw= MIME-Version: 1.0 Sender: david.mentre@gmail.com Received: by 10.103.226.10 with SMTP id d10mr3602281mur.35.1240298374097; Tue, 21 Apr 2009 00:19:34 -0700 (PDT) In-Reply-To: <200904202215.27735.jon@ffconsultancy.com> References: <200904202215.27735.jon@ffconsultancy.com> Date: Tue, 21 Apr 2009 09:19:33 +0200 X-Google-Sender-Auth: 02f7945bf939ae0a Message-ID: <3d13dcfc0904210019x1f624bebv82855519b0885050@mail.gmail.com> Subject: Re: [Caml-list] Parallelized parsing From: David MENTRE To: Jon Harrop Cc: caml-list@inria.fr Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; syntax:01 integers:01 lexing:01 tokens:01 grammars:01 speedup:01 wikipedia:01 wiki:01 speedup:01 20,:98 2009:98 parsed:01 wrote:01 parsing:01 parsing:01 Hello Jon, On Mon, Apr 20, 2009 at 23:15, Jon Harrop wrote: > For example, Mathematica syntax for nested lists of integers looks like: > > =A0{{{1, 2}}, {{3, 4}, {4, 5}}, ..} > > and there are obvious divide-and-conquer approaches to lexing and parsing= that > grammar. You can recursively subdivide the string (e.g. memory mapped fro= m a > file) to build a tree of where the tokens { , and } appear by index and t= hen > recursively convert the tree into an AST. > > What other grammars can be lexed and/or parsed efficiently in parallel? Is it of any use? The overhead of parsing a single file in parallel is so high that you won't have any speedup, especially compared to the much simpler approach of parsing *several* files in parallel. It reminds me of parallel approaches used for 3D movies: a lot of research has been done to parallelize the rendering of a single picture[1] while companies like Pixar are using a much simpler approach in real life: render a whole picture per computer or core. And don't forget the Amdahl's law : http://en.wikipedia.org/wiki/Amdahl%27s_law Where is your real bottleneck? Yours, david [1] Hopefully, some of those algorithms have brought speedup in serialized setting, i.e. on a single core or computer, for example by optimizing cache use.