From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/47563 Path: news.gmane.org!not-for-mail From: luigi scarso Newsgroups: gmane.comp.tex.context Subject: Re: question for the xml-experts Date: Fri, 20 Feb 2009 16:35:31 +0100 Message-ID: References: <4C416126-1F10-4206-BD3F-9377AC7C81CC@uni-bonn.de> <0EFC87B1-9EF1-4DF6-A9B7-0A34A7BCCD6F@uni-bonn.de> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1235144240 6636 80.91.229.12 (20 Feb 2009 15:37:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 20 Feb 2009 15:37:20 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Fri Feb 20 16:38:35 2009 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1LaXS1-0005jj-Ft for gctc-ntg-context-518@m.gmane.org; Fri, 20 Feb 2009 16:38:05 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id D1C651FB77; Fri, 20 Feb 2009 16:36:42 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 21099-01-2; Fri, 20 Feb 2009 16:36:01 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id E32ED1FB14; Fri, 20 Feb 2009 16:36:00 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 8E8CD1FA4C for ; Fri, 20 Feb 2009 16:35:58 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 21099-01 for ; Fri, 20 Feb 2009 16:35:32 +0100 (CET) Original-Received: from filter1-ams.mf.surf.net (filter1-ams.mf.surf.net [192.87.102.69]) by ronja.ntg.nl (Postfix) with ESMTP id 4E8051FB1D for ; Fri, 20 Feb 2009 16:35:32 +0100 (CET) Original-Received: from mail-bw0-f165.google.com (mail-bw0-f165.google.com [209.85.218.165]) by filter1-ams.mf.surf.net (8.13.8/8.13.8/Debian-3) with ESMTP id n1KFZVRh001120 for ; Fri, 20 Feb 2009 16:35:31 +0100 Original-Received: by bwz9 with SMTP id 9so2707579bwz.2 for ; Fri, 20 Feb 2009 07:35:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=l+H7SDxb30K1ZErfPFHPdOUkSEsTNTMXGt5tT3imECg=; b=YTBE376PtjEF3/xzqwNYtia8Fb1wmqb///iUUKR6WoQ7iW30QFpYZj9IIXKm891eJP gMM9RaPbs+1b5DA8g/z9OrxzfCSrMONdte8S1JnNPir11jdGy0re/H2ip199oH9gYfAj yJ6vAjlyyR/Wjdnil6BNwLAe84Pi/CQwprCVM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=AWm+BlhIuweJYLPvw0zeavTwssMVEtq8yyCXgF7spRCM6PKal5tOo8C6v0cUaAxXlW sRa0BaTu980Y9xIBi6JcMr2+kYGHGPxPjF/DeN6MQaDiVLGmdNb1DIWCR1jM7MZtLLhx mHA1tsZg3ISZ7qpdjONfduwqmkd+ZnRCdHdCA= Original-Received: by 10.180.249.4 with SMTP id w4mr313813bkh.162.1235144131109; Fri, 20 Feb 2009 07:35:31 -0800 (PST) In-Reply-To: X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=209.85.218.165; country=US; region=CA; city=Mountain View; postalcode=94043; latitude=37.4192; longitude=-122.0574; metrocode=807; areacode=650; http://maps.google.com/maps?q=37.4192,-122.0574&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 183033519 - c2e4380871c6 X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.69 X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.11 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:47563 Archived-At: On Fri, Feb 20, 2009 at 4:09 PM, Thomas A. Schmitz wrote: > > On Feb 19, 2009, at 3:10 PM, luigi scarso wrote: > >> see >> http://codespeak.net/lxml/tutorial.html#namespaces > > Luigi, > > thanks so much for your patient replies. I have now begun to play with > python's lxml. It offers a lot, maybe too much for a beginner. One advantage > for my immediate needs that I see is that it offers the possibility to use > Python's regular expressions and control structures, so this may make coding > easier to maintain and adapt that in the rather clumsy xslt syntax; it may > be a big help for the rather messy OpenOffice xml that I want to process. also Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> URI_OFFICE = "urn:oasis:names:tc:opendocument:xmlns:office:1.0" URI_STYLE = "urn:oasis:names:tc:opendocument:xmlns:style:1.0" URI_TEXT = "urn:oasis:names:tc:opendocument:xmlns:text:1.0" URI_TABLE = "urn:oasis:names:tc:opendocument:xmlns:table:1.0" URI_DRAW = "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" URI_FO = "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" URI_XLINK = "http://www.w3.org/1999/xlink" URI_DC = "http://purl.org/dc/elements/1.1/" URI_META = "urn:oasis:names:tc:opendocument:xmlns:meta:1.0" URI_NUMBER = "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" URI_PRESENTATION = "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" URI_SVG = "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" URI_CHART = "urn:oasis:names:tc:opendocument:xmlns:chart:1.0" URI_DR3D = "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" URI_MATH = "http://www.w3.org/1998/Math/MathML" URI_FORM = "urn:oasis:names:tc:opendocument:xmlns:form:1.0" URI_SCRIPT = "urn:oasis:names:tc:opendocument:xmlns:script:1.0" URI_OOO = "http://openoffice.org/2004/office" URI_OOOW = "http://openoffice.org/2004/writer" URI_OOOC = "http://openoffice.org/2004/calc" URI_DOM = "http://www.w3.org/2001/xml-events" URI_XFORMS = "http://www.w3.org/2002/xforms" URI_XSD = "http://www.w3.org/2001/XMLSchema" URI_XSI = "http://www.w3.org/2001/XMLSchema-instance" URI_FIELD = "urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0" >>> NSMAP_OO = { "office" : URI_OFFICE, "style" : URI_STYLE, "text" : URI_TEXT, "table" : URI_TABLE, "draw" : URI_DRAW, "fo" : URI_FO, "xlink" : URI_XLINK, "dc" : URI_DC, "meta" : URI_META, "number" : URI_NUMBER, "presentation" : URI_PRESENTATION, "svg" : URI_SVG, "chart" : URI_CHART, "dr3d" : URI_DR3D, "math" : URI_MATH, "form" : URI_FORM, "script" : URI_SCRIPT, "ooo" : URI_OOO, "ooow" : URI_OOOW, "oooc" : URI_OOOC, "dom" : URI_DOM, "xforms" : URI_XFORMS, "xsd" : URI_XSD, "xsi" : URI_XSI, "field" : URI_FIELD, } >>> from lxml import etree >>> tree = etree.parse(file('t.xml')) >>> >>> foo = tree.getroot() >>> [child.tag for child in foo.iterdescendants(tag = '{%s}span'%URI_TEXT ) ] ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span'] give a look at http://opendocumentfellowship.com/projects/odfpy too -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________