caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Daniel Bünzli" <daniel.buenzli@erratique.ch>
To: "Richard Jones" <rich@annexia.org>,
	"Mikkel Fahnøe Jørgensen" <mikkel@dvide.com>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] xpath or alternatives
Date: Wed, 28 Oct 2009 10:22:20 +0800	[thread overview]
Message-ID: <91a3da520910271922j277470c6gc800773036c9de0e@mail.gmail.com> (raw)
In-Reply-To: <20090930101622.GA15517@annexia.org>

Sorry for the late reply.

On Wed, Sep 30, 2009 at 01:00:15AM +0200, Mikkel Fahnøe Jørgensen wrote:

> Otherwise there is xmlm which is self-contained in single xml file,
> and as I recall, has some sort of zipper navigator. (I initially
> intended to use it before deciding on the json format):

The cursor api was removed from the library in 1.0.0.


On Wed, Sep 30, 2009 at 6:16 PM, Richard Jones <rich@annexia.org> wrote:

> It's interesting you mention xmlm, because I couldn't write
> the code using xmlm at all.

Why ? That doesn't feel like an insurmontable task.

Below is a function that extracts from a (sub)tree's sequence of
signals the attributes' data of an absolute path (i.e. the particular
xpath pattern you're after if I understand correctly). Each
attribute's data is stored in a separate list. The function is simpler
than it looks, in essence it's just a recursive case analysis on
signals. In the function [aux], [pos] maintains the current path in
the parse tree.  [mismatch] counts the level of mismatch w.r.t. the
[path] we are looking for.

let absolute_path_atts i path atts =
  let rec aux i pos mismatch path accs = match Xmlm.input i with
  | `El_start (tag, atts) ->
      if mismatch > 0 then aux i (tag :: pos) (mismatch + 1) path accs else
      begin match path with
      | n :: path' when n = tag ->
	  if path' <> [] then aux i (tag :: pos) 0 path' accs else
	  let update_acc ((att, acc) as v) =
	    try att, (List.assoc att atts) :: acc with Not_found -> v
	  in
	  aux i (tag :: pos) 0 [] (List.map update_acc accs)
      | _ -> aux i (tag :: pos) (mismatch + 1) path accs
      end
  | `El_end ->
      begin match pos with
      | _ :: [] -> List.rev_map (fun (att, acc) -> List.rev acc) accs
      | tag :: pos' ->
	  if mismatch > 0 then aux i pos' (mismatch - 1) path accs else
	  aux i pos' 0 (tag :: path) accs
      | [] -> assert false
      end
  | `Data _ -> aux i pos mismatch path accs
  | `Dtd _ -> assert false
  in
  let accs = List.rev_map (fun att -> att, []) atts in
  begin match Xmlm.peek i with
  | `El_start _ -> aux i [] 0 path accs
  | `Dtd _ | `El_end | `Data _ -> invalid_arg "no subtree here"
  end

Now your function becomes something like this :

let get_devices_from_xml xml =
  try
    let i = Xmlm.make_input (`String (0, xml)) in
    ignore (Xmlm.input i); (* `Dtd signal *)
    let path = ["", "domain"; "","devices"; "", "disk"; "", "source"] in
    match absolute_path_atts i path ["", "dev"; "", "file"] with
    | [devs; files] when Xmlm.eoi i -> devs @ files
    | _ -> failwith "xml document not well-formed"
  with
  | Xmlm.Error ((l,c), e) ->
      failwith (Printf.sprintf "%d:%d: %s" l c (Xmlm.error_message e))

I know this is still more effort than you'd like, but
Xmlm is purposedly low-level and will remain. It provides only a
robust xmlm parser convenient (I believe) to develop higher-level
abstractions to process the insane uses of this standard. It would be
nice to develop a module using xmlm to provide a (non-camlp4) dsl for
xml queries. Unfortunately I do not have the time for that at the
moment (unless someone wants to fund me to do that...).

Best,

Daniel


  parent reply	other threads:[~2009-10-28  2:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-28 12:17 Richard Jones
2009-09-28 12:48 ` [Caml-list] " Yaron Minsky
2009-09-28 15:06   ` Till Varoquaux
2009-09-29 23:00     ` Mikkel Fahnøe Jørgensen
2009-09-30 10:16       ` Richard Jones
2009-09-30 10:36         ` Sebastien Mondet
2009-09-30 10:49         ` Mikkel Fahnøe Jørgensen
2009-09-30 11:05         ` Dario Teixeira
2009-09-30 11:57           ` Richard Jones
2009-09-30 12:59             ` Richard Jones
2009-09-30 13:33               ` Till Varoquaux
2009-09-30 14:01                 ` Richard Jones
2009-09-30 14:28                   ` Till Varoquaux
2009-09-30 14:51                   ` Alain Frisch
2009-09-30 15:09                     ` Richard Jones
2009-09-30 15:18                       ` Alain Frisch
2009-10-28  2:22         ` Daniel Bünzli [this message]
2009-09-30 13:39 ` Stefano Zacchiroli
2009-09-30 14:49   ` Gerd Stolpmann
2009-09-30 15:12     ` Stefano Zacchiroli
2009-09-30 15:22       ` Jordan Schatz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91a3da520910271922j277470c6gc800773036c9de0e@mail.gmail.com \
    --to=daniel.buenzli@erratique.ch \
    --cc=caml-list@inria.fr \
    --cc=mikkel@dvide.com \
    --cc=rich@annexia.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).