From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.3 required=5.0 tests=AWL,DNS_FROM_RFC_POST, SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 9169DBC37 for ; Wed, 30 Sep 2009 12:49:13 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqUBAHvUwkrRVdvfkGdsb2JhbACaQD8BAQEBCQkMBxMDrQ2QDAEDAwWEIgSIHQ X-IronPort-AV: E=Sophos;i="4.44,480,1249250400"; d="scan'208";a="47588770" Received: from mail-ew0-f223.google.com ([209.85.219.223]) by mail4-smtp-sop.national.inria.fr with ESMTP; 30 Sep 2009 12:49:13 +0200 Received: by ewy23 with SMTP id 23so990550ewy.26 for ; Wed, 30 Sep 2009 03:49:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=5CSbCGUZWQY1IVCZz95caspZqz2eZNpHm4ZaQH3XzJw=; b=xs/IsstsiiWNP1uy5VbfqQNYrAE40P0jGGVJHm5qcOusBVPgFolUfEUfldNVaCZPeW I5NLw6Biq5XBlzQLs7T2NrpTPcb94pCTCsGrPyY9AVovoN53IdrayXo8XM0Gewxdu6gs Ca3M2nENiLxXyySLjDAXaamFLtCT/uF3kbE1k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=xUT6IqClk0zDHU+NRqS2IslUXP6BjpjHhyLR4e+cXyhCftBPdq/IGkfJjChL6/yx5w o33bn+0ePmBSvv7U25pAhg9bA5/mHs0iOFdPmv5ysFJwpL48cR/CJG9tjznRG32fX+S9 C7t0r5bmMDxhGiKCgDoh7vCROjb/6N7JIdQcE= MIME-Version: 1.0 Sender: mikkelfj@gmail.com Received: by 10.216.49.65 with SMTP id w43mr1285358web.211.1254307752932; Wed, 30 Sep 2009 03:49:12 -0700 (PDT) In-Reply-To: <20090930101622.GA15517@annexia.org> References: <20090928121745.GA19803@annexia.org> <9d3ec8300909280806me88dccbm2a6e4d98c9de191@mail.gmail.com> <20090930101622.GA15517@annexia.org> Date: Wed, 30 Sep 2009 12:49:12 +0200 X-Google-Sender-Auth: e966f090496a27d0 Message-ID: Subject: Re: [Caml-list] xpath or alternatives From: =?UTF-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= To: Richard Jones Cc: Till Varoquaux , Yaron Minsky , "caml-list@inria.fr" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; mikkel:01 rgensen:01 mikkel:01 0200,:01 rgensen:01 yaron:01 combinator:01 parser:01 struct:01 bool:01 bool:01 iter:01 subtrees:01 parser:01 2009:98 2009/9/30 Richard Jones : > On Wed, Sep 30, 2009 at 01:00:15AM +0200, Mikkel Fahn=C3=B8e J=C3=B8rgens= en wrote: >> In line with what Yaron suggests, you can use a combinator parser. > It's interesting you mention xmlm, because I couldn't write > the code using xmlm at all. If you can manage to convert an xml document into a json like tagged tree structure, then a simple solution like module Value =3D struct 56 type value_type =3D 57 Object of (string * value_type) list 58 | Array of value_type list 59 | String of string 60 | Int of int 61 | Float of float 62 | Bool of bool 63 | Null 64 end 65=09 .. 665 let get_object v =3D match v with Object x -> x 666 | _ -> fail "json object expected" .. 685 let pattern_path value names =3D 686 let rec again value =3D function 687 | "*" :: names -> List.iter (fun (n, v) -> try again v names 688 with Invalid_argument _ | Not_found -> ()) (get_object value) 689 | name :: names -> again (List.assoc name (get_object value)) nam= es 690 | [] -> raise (Found value) 691 in try again value names; raise Not_found with Found value -> value 692=09 combined with a path split function 22 let split c s =3D 23 let n =3D String.length s in 24 let rec again i lst =3D 25 begin try let k =3D String.rindex_from s i c in 26 again (k - 1) ((if i =3D k then "" else (String.sub s (k + 1) (i - k))) :: lst) 27 with _ -> (String.sub s 0 (i + 1)) :: lst 28 end 29 in again (n - 1) [] will do almost exactly what you are asking for - notice the "*" searches broadly in all subtrees. You can add your own xpath like functions as you discover a need for them. I believe that the xmlm examples has a tree transformation operation that would easily be adapted to produce a json like tree, if modified a little. let out_tree o t =3D let frag =3D function | E (tag, childs) -> `El (tag, childs) | D d -> `Data d in Xmlm.output_doc_tree frag o t > My best effort, using xml-light, is around 40 lines: If you spend those 40 lines on a layer on top of a lightweight xml parser, you might get away with 3 lines the next time.