Re: question for the xml-experts

From: luigi scarso <luigi.scarso@gmail.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: question for the xml-experts
Date: Thu, 19 Feb 2009 11:39:57 +0100	[thread overview]
Message-ID: <fe8d59da0902190239h4596a64fq686e0832560e3691@mail.gmail.com> (raw)
In-Reply-To: <E4D75231-7998-4A77-9716-CB691CB7F5B9@uni-bonn.de>

On Thu, Feb 19, 2009 at 9:54 AM, Thomas A. Schmitz
<thomas.schmitz@uni-bonn.de> wrote:
>
> On Feb 17, 2009, at 11:07 PM, luigi scarso wrote:
>
>> (sorry x my laziness)
>> If I have a good xml , then mkiv is a good choice. As far I know, mkiv
>> ~ xslt by lpeg, so
>> "traditional"
>> xml--( xslt )-->tex--( mkiv )-->pdf
>> is  like
>> xml-->( mkiv )-->pdf
>> Note that in the last chain one mixes xml+tex: if xml become complex,
>> this can end in a messy situation.
>>
>>
> Yes, you're right of course. I have a similar situation here: the xml
> produced by ooo is too messy, so I want to preprocess it to something that
> is easier to maintain and modify (e.g., I will, at some point, add index
> entries and a TOC); that's why I use xslt here. But I still produce xml
> which I process with mkiv.
>
>> But some  documents  need heavy preprocessing:
>> for example, I have one that come from  java classes serialization,
>> and I need the power of python (lxml) to do a clean work .
>> Also, if xml changes , I 've found that lxml is more flexible than xslt.
>> In this case I have
>> xml--( lxml )-->tex--( mkiv )-->pdf
>>
>> The fact is that python and lua are not so differents,
>> so I've to manage two languages
>> (python+lua) and tex;
>> with 'traditional' workflow you have to manage 3 languages
>> xslt,lua and tex
>> and subdivide responsability is not so easy as the former .
>
> Interesting. I have tried to play around with python-lxml, but am having
> some problems to understand it. Just to give me an idea: how would you
> transform this:
>
> <text:span text:style-name="T3">foo</text:span>
>
> to this
>
> <emph>foo</emph>
>
> with lxml? lxml seems to object to the ":" in the tag, even though it's
> declared in the document.
>
> Thomas

t.xml:
<foo xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<text:span  text:style-name="T3">foo</text:span>
</foo>


# python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> tree = etree.parse(file('t.xml'))
>>> foo = tree.getroot()
>>> foo.tag
'foo'
>>>
>>> [child.tag for child in foo.iterdescendants() ]
['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
>>> print foo.iterdescendants.__doc__
iterdescendants(self, tag=None)

        Iterate over the descendants of this element in document order.

        As opposed to ``el.iter()``, this iterator does not yield the element
        itself.  The generated elements can be restricted to a specific tag
        name with the 'tag' keyword.

>>>
>>> FOO = etree.Element("FOO")
>>> emph =  etree.Element("emph")
>>> [child.tag for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ]
['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
>>> span = [child for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ][0]
>>> emph.text = span.text
>>> FOO.append(emph)
>>> etree.tostring(FOO)
'<FOO><emph>foo</emph></FOO>'
>>>


http://codespeak.net/lxml/tutorial.html
http://codespeak.net/lxml/api.html


-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________