caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Till Varoquaux <till@pps.jussieu.fr>
To: Yaron Minsky <yminsky@gmail.com>
Cc: Richard Jones <rich@annexia.org>,
	"caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] xpath or alternatives
Date: Mon, 28 Sep 2009 11:06:06 -0400	[thread overview]
Message-ID: <9d3ec8300909280806me88dccbm2a6e4d98c9de191@mail.gmail.com> (raw)
In-Reply-To: <D772B66B-1C73-4B11-9E9D-A1C373AFDB38@gmail.com>

There are a few projects out here:

xtisp
http://www.xtisp.org

xstream
http://yquem.inria.fr/~frisch/xstream/

and of course the good old cduce/xduce/ocamlduce. All in all naive
querying is not hard and tree automata:

(e.g.) http://www.grappa.univ-lille3.fr/~filiot/tata/

can provide a good middle ground between efficiency and simplicity.
The problem you might run into is that XML is a tricky format to deal
with and some of these tools will choke up on complex files
(namespaces,switching character encoding, weird entities in the DTD
etc..).

Till

P.S.: Alain has a good paper on how to compile queries (as done in
cduce). I am just too lazy to look for it.


On Mon, Sep 28, 2009 at 8:48 AM, Yaron Minsky <yminsky@gmail.com> wrote:
> I don't have the code in front of me, but I've done something like this
> using the list monad. i.e., using bind (= concat-map) and map chained
> together, along with a couple operators I wrote for lifting bits of XML
> documents into lists, by say returning the subnodes of the present node as a
> list.
>
> It was quite effective.  I got the inspiration from a similar tool we have
> for navigating s-expressions, which we should release at some point...
>
> Yaron Minsky
>
> On Sep 28, 2009, at 8:17 AM, Richard Jones <rich@annexia.org> wrote:
>
>>
>> I need to do some relatively simple extraction of fields from an XML
>> document.  In Perl I would use xpath, very specifically if $xml was an
>> XML document[1] stored as a string, then:
>>
>>   my $p = XML::XPath->new (xml => $xml);
>>   my @disks = $p->findnodes ('//devices/disk/source/@dev');
>>   push (@disks, $p->findnodes ('//devices/disk/source/@file'));
>>
>> This isn't type safe or pretty, but it is very easy to use for quick
>> and dirty extraction.
>>
>> What is the OCaml equivalent for this sort of code?
>>
>> Alain Frisch has a library called Xpath
>> (http://alain.frisch.fr/soft.html#xpath), but unfortunately this
>> relies on the now obsolete wlex program.
>>
>> Is there a completely alternative way to do this?  Better still, in 3
>> lines of code??
>>
>> Rich.
>>
>> [1] for XML doc, see: http://libvirt.org/formatdomain.html
>>
>> --
>> Richard Jones
>> Red Hat
>>
>> _______________________________________________
>> Caml-list mailing list. Subscription management:
>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>> Archives: http://caml.inria.fr
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


  reply	other threads:[~2009-09-28 15:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-28 12:17 Richard Jones
2009-09-28 12:48 ` [Caml-list] " Yaron Minsky
2009-09-28 15:06   ` Till Varoquaux [this message]
2009-09-29 23:00     ` Mikkel Fahnøe Jørgensen
2009-09-30 10:16       ` Richard Jones
2009-09-30 10:36         ` Sebastien Mondet
2009-09-30 10:49         ` Mikkel Fahnøe Jørgensen
2009-09-30 11:05         ` Dario Teixeira
2009-09-30 11:57           ` Richard Jones
2009-09-30 12:59             ` Richard Jones
2009-09-30 13:33               ` Till Varoquaux
2009-09-30 14:01                 ` Richard Jones
2009-09-30 14:28                   ` Till Varoquaux
2009-09-30 14:51                   ` Alain Frisch
2009-09-30 15:09                     ` Richard Jones
2009-09-30 15:18                       ` Alain Frisch
2009-10-28  2:22         ` Daniel Bünzli
2009-09-30 13:39 ` Stefano Zacchiroli
2009-09-30 14:49   ` Gerd Stolpmann
2009-09-30 15:12     ` Stefano Zacchiroli
2009-09-30 15:22       ` Jordan Schatz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d3ec8300909280806me88dccbm2a6e4d98c9de191@mail.gmail.com \
    --to=till@pps.jussieu.fr \
    --cc=caml-list@inria.fr \
    --cc=rich@annexia.org \
    --cc=yminsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).