From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.8 required=5.0 tests=DNS_FROM_RFC_POST,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 0DF7FBC37 for ; Mon, 28 Sep 2009 17:06:26 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnQEAC9uwErRVd+2mWdsb2JhbACPXIIKiGA/AQEBAQEICwoHE609jhUBAwIFhBkFgViHMw X-IronPort-AV: E=Sophos;i="4.44,466,1249250400"; d="scan'208";a="33705390" Received: from mail-iw0-f182.google.com ([209.85.223.182]) by mail2-smtp-roc.national.inria.fr with ESMTP; 28 Sep 2009 17:06:08 +0200 Received: by iwn12 with SMTP id 12so2267241iwn.1 for ; Mon, 28 Sep 2009 08:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=bN3dT31as/99Zksk8yWOb8laXoqrhaDiMYbI1bIYYzY=; b=LOqG5AsZCHP6uI4oFjhB2HHjsSxRdFx/A2os/56Jdi1wIejAE2h03aP8gIXaVW0vA1 /1HChY/e+qPfYT/DGXl+WkSxGiLOqrs2nAHTU5F+w6OZYMxdlBkKhJqOzYWSuzOyQWfb w1Czff4WC5YJUfe5MTPSpEQ8fnvA75Fl8u5cU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=xqwNxwa+YWLk5NxnjDg8TFyk767rTdpdaXM7TPYytCTKIwT9/0+zYJqqq4HVM75hPj MHrTo2u7MfgaFgqeGMKArZEAh4iWjUbuAWy5wkCTks/ZBregM69slXgFumLKf/IHX3Rg 97Q8BpO7DRS/wrh2fkw/nLdgJUdRsjInTooAg= MIME-Version: 1.0 Sender: till.varoquaux@gmail.com Received: by 10.231.124.166 with SMTP id u38mr6322371ibr.17.1254150366860; Mon, 28 Sep 2009 08:06:06 -0700 (PDT) In-Reply-To: References: <20090928121745.GA19803@annexia.org> Date: Mon, 28 Sep 2009 11:06:06 -0400 X-Google-Sender-Auth: c15b982488f17da1 Message-ID: <9d3ec8300909280806me88dccbm2a6e4d98c9de191@mail.gmail.com> Subject: Re: [Caml-list] xpath or alternatives From: Till Varoquaux To: Yaron Minsky Cc: Richard Jones , "caml-list@inria.fr" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; frisch:01 cduce:01 xduce:01 ocamlduce:01 namespaces:01 cduce:01 yaron:01 minsky:01 yminsky:01 subnodes:01 node:01 yaron:01 minsky:01 ocaml:01 frisch:01 There are a few projects out here: xtisp http://www.xtisp.org xstream http://yquem.inria.fr/~frisch/xstream/ and of course the good old cduce/xduce/ocamlduce. All in all naive querying is not hard and tree automata: (e.g.) http://www.grappa.univ-lille3.fr/~filiot/tata/ can provide a good middle ground between efficiency and simplicity. The problem you might run into is that XML is a tricky format to deal with and some of these tools will choke up on complex files (namespaces,switching character encoding, weird entities in the DTD etc..). Till P.S.: Alain has a good paper on how to compile queries (as done in cduce). I am just too lazy to look for it. On Mon, Sep 28, 2009 at 8:48 AM, Yaron Minsky wrote: > I don't have the code in front of me, but I've done something like this > using the list monad. i.e., using bind (=3D concat-map) and map chained > together, along with a couple operators I wrote for lifting bits of XML > documents into lists, by say returning the subnodes of the present node a= s a > list. > > It was quite effective. =C2=A0I got the inspiration from a similar tool w= e have > for navigating s-expressions, which we should release at some point... > > Yaron Minsky > > On Sep 28, 2009, at 8:17 AM, Richard Jones wrote: > >> >> I need to do some relatively simple extraction of fields from an XML >> document. =C2=A0In Perl I would use xpath, very specifically if $xml was= an >> XML document[1] stored as a string, then: >> >> =C2=A0 my $p =3D XML::XPath->new (xml =3D> $xml); >> =C2=A0 my @disks =3D $p->findnodes ('//devices/disk/source/@dev'); >> =C2=A0 push (@disks, $p->findnodes ('//devices/disk/source/@file')); >> >> This isn't type safe or pretty, but it is very easy to use for quick >> and dirty extraction. >> >> What is the OCaml equivalent for this sort of code? >> >> Alain Frisch has a library called Xpath >> (http://alain.frisch.fr/soft.html#xpath), but unfortunately this >> relies on the now obsolete wlex program. >> >> Is there a completely alternative way to do this? =C2=A0Better still, in= 3 >> lines of code?? >> >> Rich. >> >> [1] for XML doc, see: http://libvirt.org/formatdomain.html >> >> -- >> Richard Jones >> Red Hat >> >> _______________________________________________ >> Caml-list mailing list. Subscription management: >> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list >> Archives: http://caml.inria.fr >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >