From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 990AEBC37 for ; Wed, 28 Oct 2009 03:22:23 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlEGAGtH50pKfU4bkGdsb2JhbACEc40JiQU/AQEBAQkJDAcTA6xmgWOGcohoAQMDBYEtgjpTBIFfb4Z+ X-IronPort-AV: E=Sophos;i="4.44,636,1249250400"; d="scan'208";a="39067420" Received: from ey-out-2122.google.com ([74.125.78.27]) by mail1-smtp-roc.national.inria.fr with ESMTP; 28 Oct 2009 03:22:23 +0100 Received: by ey-out-2122.google.com with SMTP id 9so99296eyd.7 for ; Tue, 27 Oct 2009 19:22:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=Te4S9f6IARzBLr/vYd+mg18MPDmMhbNzF/9acv59dZw=; b=SBai5MjYhm92SPmM2NSFujKxGvlbOf9CNqsZRMTsfjMEdh7dLaEt5Q+wk1o7mEeLUo liwsI4gqbowZ54YjKijTsFjZUgvRmm92clsqDd6vRziO+qsz3G4HxIpjZtIlw5x4S5nI vk17UAflwb8ScD2rLk33LVIT8nKIbKQSLpqF0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=qF9ll9oUiTacfnSsiO/5fXQpIbIKAB9DyqZnAIzOHfwPT5gFC7aKWEqbijIhVSMwyA 2IiR5/5RhDSTGT0cofm7YDEqhxFRgJvSVVrA1nHWRB6lCT5bYGAwPLhqeoWrwMyO+OPj 6UlBVxP6RrCtm5FAf7G8UJxJoTCnFuzcmcaxA= MIME-Version: 1.0 Sender: daniel.c.buenzli@gmail.com Received: by 10.211.157.11 with SMTP id j11mr2962184ebo.63.1256696540843; Tue, 27 Oct 2009 19:22:20 -0700 (PDT) In-Reply-To: <20090930101622.GA15517@annexia.org> References: <20090928121745.GA19803@annexia.org> <9d3ec8300909280806me88dccbm2a6e4d98c9de191@mail.gmail.com> <20090930101622.GA15517@annexia.org> Date: Wed, 28 Oct 2009 10:22:20 +0800 X-Google-Sender-Auth: 4a762676119c2fb5 Message-ID: <91a3da520910271922j277470c6gc800773036c9de0e@mail.gmail.com> Subject: Re: [Caml-list] xpath or alternatives From: =?UTF-8?Q?Daniel_B=C3=BCnzli?= To: Richard Jones , =?UTF-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= Cc: "caml-list@inria.fr" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; buenzli:01 0200,:01 mikkel:01 rgensen:01 recursive:01 mismatch:01 mismatch:01 atts:01 atts:01 path':01 path':01 failwith:01 well-formed:01 failwith:01 printf:01 Sorry for the late reply. On Wed, Sep 30, 2009 at 01:00:15AM +0200, Mikkel Fahn=C3=B8e J=C3=B8rgensen= wrote: > Otherwise there is xmlm which is self-contained in single xml file, > and as I recall, has some sort of zipper navigator. (I initially > intended to use it before deciding on the json format): The cursor api was removed from the library in 1.0.0. On Wed, Sep 30, 2009 at 6:16 PM, Richard Jones wrote: > It's interesting you mention xmlm, because I couldn't write > the code using xmlm at all. Why ? That doesn't feel like an insurmontable task. Below is a function that extracts from a (sub)tree's sequence of signals the attributes' data of an absolute path (i.e. the particular xpath pattern you're after if I understand correctly). Each attribute's data is stored in a separate list. The function is simpler than it looks, in essence it's just a recursive case analysis on signals. In the function [aux], [pos] maintains the current path in the parse tree. [mismatch] counts the level of mismatch w.r.t. the [path] we are looking for. let absolute_path_atts i path atts =3D let rec aux i pos mismatch path accs =3D match Xmlm.input i with | `El_start (tag, atts) -> if mismatch > 0 then aux i (tag :: pos) (mismatch + 1) path accs else begin match path with | n :: path' when n =3D tag -> if path' <> [] then aux i (tag :: pos) 0 path' accs else let update_acc ((att, acc) as v) =3D try att, (List.assoc att atts) :: acc with Not_found -> v in aux i (tag :: pos) 0 [] (List.map update_acc accs) | _ -> aux i (tag :: pos) (mismatch + 1) path accs end | `El_end -> begin match pos with | _ :: [] -> List.rev_map (fun (att, acc) -> List.rev acc) accs | tag :: pos' -> if mismatch > 0 then aux i pos' (mismatch - 1) path accs else aux i pos' 0 (tag :: path) accs | [] -> assert false end | `Data _ -> aux i pos mismatch path accs | `Dtd _ -> assert false in let accs =3D List.rev_map (fun att -> att, []) atts in begin match Xmlm.peek i with | `El_start _ -> aux i [] 0 path accs | `Dtd _ | `El_end | `Data _ -> invalid_arg "no subtree here" end Now your function becomes something like this : let get_devices_from_xml xml =3D try let i =3D Xmlm.make_input (`String (0, xml)) in ignore (Xmlm.input i); (* `Dtd signal *) let path =3D ["", "domain"; "","devices"; "", "disk"; "", "source"] in match absolute_path_atts i path ["", "dev"; "", "file"] with | [devs; files] when Xmlm.eoi i -> devs @ files | _ -> failwith "xml document not well-formed" with | Xmlm.Error ((l,c), e) -> failwith (Printf.sprintf "%d:%d: %s" l c (Xmlm.error_message e)) I know this is still more effort than you'd like, but Xmlm is purposedly low-level and will remain. It provides only a robust xmlm parser convenient (I believe) to develop higher-level abstractions to process the insane uses of this standard. It would be nice to develop a module using xmlm to provide a (non-camlp4) dsl for xml queries. Unfortunately I do not have the time for that at the moment (unless someone wants to fund me to do that...). Best, Daniel