ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* question for the xml-experts
@ 2009-02-14 17:40 Thomas A. Schmitz
  2009-02-14 18:25 ` Wolfgang Schuster
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-14 17:40 UTC (permalink / raw)
  To: mailing ConTeXt users list for

Hi all,

this is not a question about direct technical details, but more of a  
conceptual problem, and I would love to have your input and ideas on  
this. I will be editing several edited volumes in my field  
(humanities, classics). From experience, I know that it's impossible  
to make scholars in the humanities adhere to standards. Each and every  
one of them will turn in a paper (most of them written in half a dozen  
different versions of Word) with its own idiosyncracies. At my last  
conference, I asked them to please use Unicode for their Greek  
passages, and I got blank looks and the question "What the hell is  
Unicode?"

So: I want to extract the content of these papers and process it with  
ConTeXt. I thought the easiest route might be convert them to  
OpenOffice odt and then use the content.xml as a starting point. Since  
the formatting will be unusable anyways, it doesn't make sense to  
process the odt directly; instead, I want to transform the xml via  
xslt to a simplified format and then process that with ConTeXt. I have  
just discovered the tool xalan ( http://xml.apache.org/xalan-c/index.html 
  ) which allows me to use an xslt style sheet and direct the output  
to a new file. I will then need to clean up these xml files and write  
a mkiv xml setup for them.

So for those who know much more about this sort of workflow: does that  
make sense? Is there any better way to achieve these results, i.e.,  
have the content of a couple of papers in Word and/or rtf format and  
typeset it in a consistent ConTeXt environment? Is there any tool  
better than xslt to convert the OpenOffice xml than xslt (anything in  
lua that can parse xml)? Anything better than xalan to convert xm ->  
xml? I'm just beginning to plan this, so I'd be most grateful for any  
pointers.

Thanks for reading this long message, all best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz
@ 2009-02-14 18:25 ` Wolfgang Schuster
  2009-02-14 18:37   ` Thomas A. Schmitz
  2009-02-15  9:39   ` luigi scarso
  2009-02-14 18:31 ` Patrick Gundlach
  2009-02-15 10:14 ` Khaled Hosny
  2 siblings, 2 replies; 17+ messages in thread
From: Wolfgang Schuster @ 2009-02-14 18:25 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi Thomas,

why don't you take a look at the OpenOffice export function, I saw it's
possible to convert a document to xhtml and this could be a start for  
you.

Wolfgang

Am 14.02.2009 um 18:40 schrieb Thomas A. Schmitz:

> Hi all,
>
> this is not a question about direct technical details, but more of a  
> conceptual problem, and I would love to have your input and ideas on  
> this. I will be editing several edited volumes in my field  
> (humanities, classics). From experience, I know that it's impossible  
> to make scholars in the humanities adhere to standards. Each and  
> every one of them will turn in a paper (most of them written in half  
> a dozen different versions of Word) with its own idiosyncracies. At  
> my last conference, I asked them to please use Unicode for their  
> Greek passages, and I got blank looks and the question "What the  
> hell is Unicode?"
>
> So: I want to extract the content of these papers and process it  
> with ConTeXt. I thought the easiest route might be convert them to  
> OpenOffice odt and then use the content.xml as a starting point.  
> Since the formatting will be unusable anyways, it doesn't make sense  
> to process the odt directly; instead, I want to transform the xml  
> via xslt to a simplified format and then process that with ConTeXt.  
> I have just discovered the tool xalan ( http://xml.apache.org/xalan-c/index.html 
>  ) which allows me to use an xslt style sheet and direct the output  
> to a new file. I will then need to clean up these xml files and  
> write a mkiv xml setup for them.
>
> So for those who know much more about this sort of workflow: does  
> that make sense? Is there any better way to achieve these results,  
> i.e., have the content of a couple of papers in Word and/or rtf  
> format and typeset it in a consistent ConTeXt environment? Is there  
> any tool better than xslt to convert the OpenOffice xml than xslt  
> (anything in lua that can parse xml)? Anything better than xalan to  
> convert xm -> xml? I'm just beginning to plan this, so I'd be most  
> grateful for any pointers.
>
> Thanks for reading this long message, all best
>
> Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz
  2009-02-14 18:25 ` Wolfgang Schuster
@ 2009-02-14 18:31 ` Patrick Gundlach
  2009-02-14 19:06   ` Thomas A. Schmitz
  2009-02-15 10:14 ` Khaled Hosny
  2 siblings, 1 reply; 17+ messages in thread
From: Patrick Gundlach @ 2009-02-14 18:31 UTC (permalink / raw)
  To: ntg-context

Hi Thomas,

> process the odt directly; instead, I want to transform the xml via
> xslt to a simplified format and then process that with ConTeXt. I have
> just discovered the tool xalan (
> http://xml.apache.org/xalan-c/index.html ) which allows me to use an
> xslt style sheet and direct the output  to a new file. I will then
> need to clean up these xml files and write  a mkiv xml setup for them.
>
> So for those who know much more about this sort of workflow: does that
> make sense?

Yes, it does. At my company we clean up (and reorganize) XML data with
XSLT all the time. We are happy users of saxon 9
(http://saxon.sourceforge.net/) which is an xslt 2.0 engine. Learning
XSLT is not trivial (but not too hard either), but once you get an
understanding of it nobody can stop you using XSLT for 'everything'.


Patrick
-- 
ConTeXt wiki and more: http://contextgarden.net
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-14 18:25 ` Wolfgang Schuster
@ 2009-02-14 18:37   ` Thomas A. Schmitz
  2009-02-15  9:39   ` luigi scarso
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-14 18:37 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 14, 2009, at 7:25 PM, Wolfgang Schuster wrote:

> Hi Thomas,
>
> why don't you take a look at the OpenOffice export function, I saw  
> it's
> possible to convert a document to xhtml and this could be a start  
> for you.
>
> Wolfgang

Hi Wolfgang,

thanks for the suggestion! I had, in fact, tried the export functions  
(docbook and xhtml), but both drop too much formating: all italics  
etc. are silently dropped, and dynamical references are replaced with  
their values. So unless I can manage to hack the export xslt files,  
this doesn't seem possible.

All best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-14 18:31 ` Patrick Gundlach
@ 2009-02-14 19:06   ` Thomas A. Schmitz
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-14 19:06 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 14, 2009, at 7:31 PM, Patrick Gundlach wrote:

> Yes, it does. At my company we clean up (and reorganize) XML data with
> XSLT all the time. We are happy users of saxon 9
> (http://saxon.sourceforge.net/) which is an xslt 2.0 engine. Learning
> XSLT is not trivial (but not too hard either), but once you get an
> understanding of it nobody can stop you using XSLT for 'everything'.
>
>
> Patrick

Great, I will look into saxon and xslt!

Best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-14 18:25 ` Wolfgang Schuster
  2009-02-14 18:37   ` Thomas A. Schmitz
@ 2009-02-15  9:39   ` luigi scarso
  2009-02-15 17:17     ` Thomas A. Schmitz
  1 sibling, 1 reply; 17+ messages in thread
From: luigi scarso @ 2009-02-15  9:39 UTC (permalink / raw)
  To: mailing list for ConTeXt users

If you know python
http://wiki.services.openoffice.org/wiki/PyUNO_bridge
http://opendocumentfellowship.com/projects/odfpy
For xml the choice is
http://codespeak.net/lxml/

A native xml db, with XQuery and python binding
http://www.oracle.com/technology/products/berkeley-db/xml/index.html



And this is my experience :
I'm programming in TeX (with context) , lua / python (they are
similar) and xslt .
For every project  if I can I use lxml to manage xml sources, because
it includes xslt but not viceversa.
The goal is to translate xml in tex in the quickest way, and let mkiv
to do the hard word.
I have not a good feeling with xslt, because is not so powerful as
lxml, and clearly is not a competitor of TeX .

If I need storage, dbxml is good, and XQuery+lxml is powerful enought .

OO has also docbook exporter
http://www.docbook.org/
docbook
is rich and with a good collection of xsl stylesheets to translate xml to html
but maybe is ...too much .

-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz
  2009-02-14 18:25 ` Wolfgang Schuster
  2009-02-14 18:31 ` Patrick Gundlach
@ 2009-02-15 10:14 ` Khaled Hosny
  2 siblings, 0 replies; 17+ messages in thread
From: Khaled Hosny @ 2009-02-15 10:14 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2822 bytes --]

You may consider giving dbcontext a look, it is written in python and
seems to use xsl to translate DocBook's xml into TeX files to be typeset
by ConTeXt. http://dblatex.sourceforge.net/doc/pt02.html

Regards,
 Khaled

On Sat, Feb 14, 2009 at 06:40:51PM +0100, Thomas A. Schmitz wrote:
> Hi all,
>
> this is not a question about direct technical details, but more of a  
> conceptual problem, and I would love to have your input and ideas on  
> this. I will be editing several edited volumes in my field (humanities, 
> classics). From experience, I know that it's impossible to make scholars 
> in the humanities adhere to standards. Each and every one of them will 
> turn in a paper (most of them written in half a dozen different versions 
> of Word) with its own idiosyncracies. At my last conference, I asked them 
> to please use Unicode for their Greek passages, and I got blank looks and 
> the question "What the hell is Unicode?"
>
> So: I want to extract the content of these papers and process it with  
> ConTeXt. I thought the easiest route might be convert them to OpenOffice 
> odt and then use the content.xml as a starting point. Since the 
> formatting will be unusable anyways, it doesn't make sense to process the 
> odt directly; instead, I want to transform the xml via xslt to a 
> simplified format and then process that with ConTeXt. I have just 
> discovered the tool xalan ( http://xml.apache.org/xalan-c/index.html ) 
> which allows me to use an xslt style sheet and direct the output to a new 
> file. I will then need to clean up these xml files and write a mkiv xml 
> setup for them.
>
> So for those who know much more about this sort of workflow: does that  
> make sense? Is there any better way to achieve these results, i.e., have 
> the content of a couple of papers in Word and/or rtf format and typeset 
> it in a consistent ConTeXt environment? Is there any tool better than 
> xslt to convert the OpenOffice xml than xslt (anything in lua that can 
> parse xml)? Anything better than xalan to convert xm -> xml? I'm just 
> beginning to plan this, so I'd be most grateful for any pointers.
>
> Thanks for reading this long message, all best
>
> Thomas
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : https://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
 Khaled Hosny
 Arabic localizer and member of Arabeyes.org team

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 487 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-15  9:39   ` luigi scarso
@ 2009-02-15 17:17     ` Thomas A. Schmitz
  2009-02-17 22:07       ` luigi scarso
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-15 17:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Luigi and Khaled,

thanks a lot for your replies! Luigi: I had a look at python lxml; it  
looks very powerful and interesting, and I will try and see if can  
make use of it. Why do you translate your xml sources into tex instead  
of using the mkiv mechanism for processing xml, is it because of speed?

Khaled: I have to see if I can tweak the OpenOffice docbook converter  
to keep more of the formatting; in its default state, it drops too  
much important stuff...

Right now, I have followed Patrick's advice. I've installed saxon9 and  
am writing a xslt stylesheet to translate the openoffice xml into a  
cleaner and easier to handle format. I'm making progress... Maybe we  
should put something like this on the wiki and make it a collaborative  
effort - I can only write rules for stuff that occurs in my documents,  
and that is of course only a subset of what OpenOffice has, so it  
would be good to add rules as people find interesting features.

All best

Thomas

On Feb 15, 2009, at 10:39 AM, luigi scarso wrote:

> If you know python
> http://wiki.services.openoffice.org/wiki/PyUNO_bridge
> http://opendocumentfellowship.com/projects/odfpy
> For xml the choice is
> http://codespeak.net/lxml/
>
> A native xml db, with XQuery and python binding
> http://www.oracle.com/technology/products/berkeley-db/xml/index.html
>
>
>
> And this is my experience :
> I'm programming in TeX (with context) , lua / python (they are
> similar) and xslt .
> For every project  if I can I use lxml to manage xml sources, because
> it includes xslt but not viceversa.
> The goal is to translate xml in tex in the quickest way, and let mkiv
> to do the hard word.
> I have not a good feeling with xslt, because is not so powerful as
> lxml, and clearly is not a competitor of TeX .
>
> If I need storage, dbxml is good, and XQuery+lxml is powerful  
> enought .
>
> OO has also docbook exporter
> http://www.docbook.org/
> docbook
> is rich and with a good collection of xsl stylesheets to translate  
> xml to html
> but maybe is ...too much .
>
> -- 
> luigi
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an  
> entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : https://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-15 17:17     ` Thomas A. Schmitz
@ 2009-02-17 22:07       ` luigi scarso
  2009-02-19  8:54         ` Thomas A. Schmitz
  0 siblings, 1 reply; 17+ messages in thread
From: luigi scarso @ 2009-02-17 22:07 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Sun, Feb 15, 2009 at 6:17 PM, Thomas A. Schmitz
<thomas.schmitz@uni-bonn.de> wrote:
> Luigi and Khaled,
>
> thanks a lot for your replies! Luigi: I had a look at python lxml; it looks
> very powerful and interesting, and I will try and see if can make use of it.
> Why do you translate your xml sources into tex instead of using the mkiv
> mechanism for processing xml, is it because of speed?
(sorry x my laziness)
If I have a good xml , then mkiv is a good choice. As far I know, mkiv
~ xslt by lpeg, so
"traditional"
xml--( xslt )-->tex--( mkiv )-->pdf
is  like
xml-->( mkiv )-->pdf
Note that in the last chain one mixes xml+tex: if xml become complex,
this can end in a messy situation.




But some  documents  need heavy preprocessing:
for example, I have one that come from  java classes serialization,
and I need the power of python (lxml) to do a clean work .
Also, if xml changes , I 've found that lxml is more flexible than xslt.
In this case I have
xml--( lxml )-->tex--( mkiv )-->pdf

The fact is that python and lua are not so differents,
so I've to manage two languages
(python+lua) and tex;
with 'traditional' workflow you have to manage 3 languages
xslt,lua and tex
and subdivide responsability is not so easy as the former .

BTW, I have no test that say "this one is quickly than that one" .

-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-17 22:07       ` luigi scarso
@ 2009-02-19  8:54         ` Thomas A. Schmitz
  2009-02-19  9:24           ` luigi scarso
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-19  8:54 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 17, 2009, at 11:07 PM, luigi scarso wrote:

> (sorry x my laziness)
> If I have a good xml , then mkiv is a good choice. As far I know, mkiv
> ~ xslt by lpeg, so
> "traditional"
> xml--( xslt )-->tex--( mkiv )-->pdf
> is  like
> xml-->( mkiv )-->pdf
> Note that in the last chain one mixes xml+tex: if xml become complex,
> this can end in a messy situation.
>
>
Yes, you're right of course. I have a similar situation here: the xml  
produced by ooo is too messy, so I want to preprocess it to something  
that is easier to maintain and modify (e.g., I will, at some point,  
add index entries and a TOC); that's why I use xslt here. But I still  
produce xml which I process with mkiv.

> But some  documents  need heavy preprocessing:
> for example, I have one that come from  java classes serialization,
> and I need the power of python (lxml) to do a clean work .
> Also, if xml changes , I 've found that lxml is more flexible than  
> xslt.
> In this case I have
> xml--( lxml )-->tex--( mkiv )-->pdf
>
> The fact is that python and lua are not so differents,
> so I've to manage two languages
> (python+lua) and tex;
> with 'traditional' workflow you have to manage 3 languages
> xslt,lua and tex
> and subdivide responsability is not so easy as the former .

Interesting. I have tried to play around with python-lxml, but am  
having some problems to understand it. Just to give me an idea: how  
would you transform this:

<text:span text:style-name="T3">foo</text:span>

to this

<emph>foo</emph>

with lxml? lxml seems to object to the ":" in the tag, even though  
it's declared in the document.

Thomas

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-19  8:54         ` Thomas A. Schmitz
@ 2009-02-19  9:24           ` luigi scarso
  2009-02-19 10:39           ` luigi scarso
  2009-02-19 17:02           ` luigi scarso
  2 siblings, 0 replies; 17+ messages in thread
From: luigi scarso @ 2009-02-19  9:24 UTC (permalink / raw)
  To: mailing list for ConTeXt users

> Yes, you're right of course.
> I have a similar situation here: the xml
> produced by ooo is too messy, so I want to preprocess it to something that
> is easier to maintain and modify (e.g., I will, at some point, add index
> entries and a TOC); that's why I use xslt here. But I still produce xml
> which I process with mkiv.
so you have
xml --( xslt )-->xml--( mkiv ) --> pdf
where the second xml is no normative, while the first yes.

In yor situation I  prefear
xml --( xslt )-->tex--( mkiv ) --> pdf
because there is no much differences   between stylesheets of
xml --( xslt )-->xml
and
xml --( xslt )-->tex
and there is a clear distinction of roles: xml carries the semantic,
tex the presentation .


This chain
xml --( xslt )-->xml--( mkiv ) --> pdf
can be reasonable
if the first xml come out from a db extraction
(you  must be quick and make the correct queries, so this xml is
typically in a row major fashion. ie like a table),
and the second xml is book-oriented and it is  simple .



BTW
"always choose whatever is right for you needs"

>. Just to give me an idea: how would you
> transform this:
>
> <text:span text:style-name="T3">foo</text:span>
>
> to this
>
> <emph>foo</emph>
>
> with lxml? lxml seems to object to the ":" in the tag, even though it's
> declared in the document.
I will give it a look

-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-19  8:54         ` Thomas A. Schmitz
  2009-02-19  9:24           ` luigi scarso
@ 2009-02-19 10:39           ` luigi scarso
  2009-02-19 11:53             ` Thomas A. Schmitz
  2009-02-19 17:02           ` luigi scarso
  2 siblings, 1 reply; 17+ messages in thread
From: luigi scarso @ 2009-02-19 10:39 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Thu, Feb 19, 2009 at 9:54 AM, Thomas A. Schmitz
<thomas.schmitz@uni-bonn.de> wrote:
>
> On Feb 17, 2009, at 11:07 PM, luigi scarso wrote:
>
>> (sorry x my laziness)
>> If I have a good xml , then mkiv is a good choice. As far I know, mkiv
>> ~ xslt by lpeg, so
>> "traditional"
>> xml--( xslt )-->tex--( mkiv )-->pdf
>> is  like
>> xml-->( mkiv )-->pdf
>> Note that in the last chain one mixes xml+tex: if xml become complex,
>> this can end in a messy situation.
>>
>>
> Yes, you're right of course. I have a similar situation here: the xml
> produced by ooo is too messy, so I want to preprocess it to something that
> is easier to maintain and modify (e.g., I will, at some point, add index
> entries and a TOC); that's why I use xslt here. But I still produce xml
> which I process with mkiv.
>
>> But some  documents  need heavy preprocessing:
>> for example, I have one that come from  java classes serialization,
>> and I need the power of python (lxml) to do a clean work .
>> Also, if xml changes , I 've found that lxml is more flexible than xslt.
>> In this case I have
>> xml--( lxml )-->tex--( mkiv )-->pdf
>>
>> The fact is that python and lua are not so differents,
>> so I've to manage two languages
>> (python+lua) and tex;
>> with 'traditional' workflow you have to manage 3 languages
>> xslt,lua and tex
>> and subdivide responsability is not so easy as the former .
>
> Interesting. I have tried to play around with python-lxml, but am having
> some problems to understand it. Just to give me an idea: how would you
> transform this:
>
> <text:span text:style-name="T3">foo</text:span>
>
> to this
>
> <emph>foo</emph>
>
> with lxml? lxml seems to object to the ":" in the tag, even though it's
> declared in the document.
>
> Thomas

t.xml:
<foo xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
<text:span  text:style-name="T3">foo</text:span>
</foo>


# python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> tree = etree.parse(file('t.xml'))
>>> foo = tree.getroot()
>>> foo.tag
'foo'
>>>
>>> [child.tag for child in foo.iterdescendants() ]
['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
>>> print foo.iterdescendants.__doc__
iterdescendants(self, tag=None)

        Iterate over the descendants of this element in document order.

        As opposed to ``el.iter()``, this iterator does not yield the element
        itself.  The generated elements can be restricted to a specific tag
        name with the 'tag' keyword.

>>>
>>> FOO = etree.Element("FOO")
>>> emph =  etree.Element("emph")
>>> [child.tag for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ]
['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
>>> span = [child for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ][0]
>>> emph.text = span.text
>>> FOO.append(emph)
>>> etree.tostring(FOO)
'<FOO><emph>foo</emph></FOO>'
>>>


http://codespeak.net/lxml/tutorial.html
http://codespeak.net/lxml/api.html


-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-19 10:39           ` luigi scarso
@ 2009-02-19 11:53             ` Thomas A. Schmitz
  2009-02-19 14:10               ` luigi scarso
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-19 11:53 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 19, 2009, at 11:39 AM, luigi scarso wrote:

>>>>
>>>>
>>>> FOO = etree.Element("FOO")
>>>> emph =  etree.Element("emph")
>>>> [child.tag for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0 
>>>> }span' ) ]
> ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
>>>> span = [child for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0 
>>>> }span' ) ][0]
>>>> emph.text = span.text
>>>> FOO.append(emph)
>>>> etree.tostring(FOO)
> '<FOO><emph>foo</emph></FOO>'
>>>>
>

Excuse me for being dense: you mean all namespaces have to be  
explicitly expanded?

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-19 11:53             ` Thomas A. Schmitz
@ 2009-02-19 14:10               ` luigi scarso
  2009-02-20 15:09                 ` Thomas A. Schmitz
  0 siblings, 1 reply; 17+ messages in thread
From: luigi scarso @ 2009-02-19 14:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Thu, Feb 19, 2009 at 12:53 PM, Thomas A. Schmitz
<thomas.schmitz@uni-bonn.de> wrote:
>
> On Feb 19, 2009, at 11:39 AM, luigi scarso wrote:
>
>>>>>
>>>>>
>>>>> FOO = etree.Element("FOO")
>>>>> emph =  etree.Element("emph")
>>>>> [child.tag for child in foo.iterdescendants(tag =
>>>>> '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ]
>>
>> ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']
>>>>>
>>>>> span = [child for child in foo.iterdescendants(tag =
>>>>> '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ][0]
>>>>> emph.text = span.text
>>>>> FOO.append(emph)
>>>>> etree.tostring(FOO)
>>
>> '<FOO><emph>foo</emph></FOO>'
>>>>>
>>
>
> Excuse me for being dense: you mean all namespaces have to be explicitly
> expanded?
see
http://codespeak.net/lxml/tutorial.html#namespaces




-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-19  8:54         ` Thomas A. Schmitz
  2009-02-19  9:24           ` luigi scarso
  2009-02-19 10:39           ` luigi scarso
@ 2009-02-19 17:02           ` luigi scarso
  2 siblings, 0 replies; 17+ messages in thread
From: luigi scarso @ 2009-02-19 17:02 UTC (permalink / raw)
  To: mailing list for ConTeXt users

> Yes, you're right of course. I have a similar situation here: the xml
> produced by ooo is too messy, so I want to preprocess it to something that
> is easier to maintain and modify (e.g., I will, at some point, add index
> entries and a TOC); that's why I use xslt here. But I still produce xml
> which I process with mkiv.
also this
http://www.hj-gym.dk/~hj/writer2latex/


-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-19 14:10               ` luigi scarso
@ 2009-02-20 15:09                 ` Thomas A. Schmitz
  2009-02-20 15:35                   ` luigi scarso
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas A. Schmitz @ 2009-02-20 15:09 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Feb 19, 2009, at 3:10 PM, luigi scarso wrote:

> see
> http://codespeak.net/lxml/tutorial.html#namespaces

Luigi,

thanks so much for your patient replies. I have now begun to play with  
python's lxml. It offers a lot, maybe too much for a beginner. One  
advantage for my immediate needs that I see is that it offers the  
possibility to use Python's regular expressions and control  
structures, so this may make coding easier to maintain and adapt that  
in the rather clumsy xslt syntax; it may be a big help for the rather  
messy OpenOffice xml that I want to process.

I had already tried w2latex a while ago. I found it very limited and  
lacking documentation, so I haven't pursued this track.

Again, thanks for getting me started!

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: question for the xml-experts
  2009-02-20 15:09                 ` Thomas A. Schmitz
@ 2009-02-20 15:35                   ` luigi scarso
  0 siblings, 0 replies; 17+ messages in thread
From: luigi scarso @ 2009-02-20 15:35 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Fri, Feb 20, 2009 at 4:09 PM, Thomas A. Schmitz
<thomas.schmitz@uni-bonn.de> wrote:
>
> On Feb 19, 2009, at 3:10 PM, luigi scarso wrote:
>
>> see
>> http://codespeak.net/lxml/tutorial.html#namespaces
>
> Luigi,
>
> thanks so much for your patient replies. I have now begun to play with
> python's lxml. It offers a lot, maybe too much for a beginner. One advantage
> for my immediate needs that I see is that it offers the possibility to use
> Python's regular expressions and control structures, so this may make coding
> easier to maintain and adapt that in the rather clumsy xslt syntax; it may
> be a big help for the rather messy OpenOffice xml that I want to process.

also


Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> URI_OFFICE = "urn:oasis:names:tc:opendocument:xmlns:office:1.0"
URI_STYLE = "urn:oasis:names:tc:opendocument:xmlns:style:1.0"
URI_TEXT = "urn:oasis:names:tc:opendocument:xmlns:text:1.0"
URI_TABLE = "urn:oasis:names:tc:opendocument:xmlns:table:1.0"
URI_DRAW = "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
URI_FO = "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
URI_XLINK = "http://www.w3.org/1999/xlink"
URI_DC = "http://purl.org/dc/elements/1.1/"
URI_META = "urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
URI_NUMBER = "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
URI_PRESENTATION =
"urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
URI_SVG = "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
URI_CHART = "urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
URI_DR3D = "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
URI_MATH = "http://www.w3.org/1998/Math/MathML"
URI_FORM = "urn:oasis:names:tc:opendocument:xmlns:form:1.0"
URI_SCRIPT = "urn:oasis:names:tc:opendocument:xmlns:script:1.0"
URI_OOO = "http://openoffice.org/2004/office"
URI_OOOW = "http://openoffice.org/2004/writer"
URI_OOOC = "http://openoffice.org/2004/calc"
URI_DOM = "http://www.w3.org/2001/xml-events"
URI_XFORMS = "http://www.w3.org/2002/xforms"
URI_XSD = "http://www.w3.org/2001/XMLSchema"
URI_XSI = "http://www.w3.org/2001/XMLSchema-instance"
URI_FIELD = "urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0"

>>> NSMAP_OO = {
"office" :              URI_OFFICE,
"style" : 		URI_STYLE,
"text" : 		URI_TEXT,
"table" : 		URI_TABLE,
"draw" : 		URI_DRAW,
"fo" : 			URI_FO,
"xlink" : 		URI_XLINK,
"dc" : 			URI_DC,
"meta" : 		URI_META,
"number" : 		URI_NUMBER,
"presentation" : 	URI_PRESENTATION,
"svg" : 		URI_SVG,
"chart" : 		URI_CHART,
"dr3d" : 		URI_DR3D,
"math" : 		URI_MATH,
"form" : 		URI_FORM,
"script" : 		URI_SCRIPT,
"ooo" : 		URI_OOO,
"ooow" : 		URI_OOOW,
"oooc" : 		URI_OOOC,
"dom" : 		URI_DOM,
"xforms" : 		URI_XFORMS,
"xsd" : 		URI_XSD,
"xsi" : 		URI_XSI,
"field" : 		URI_FIELD,
}


>>> from lxml import etree

>>> tree = etree.parse(file('t.xml'))

>>>

>>> foo = tree.getroot()

>>> [child.tag for child in foo.iterdescendants(tag = '{%s}span'%URI_TEXT ) ]
['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span']





give a look at
http://opendocumentfellowship.com/projects/odfpy
too




-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-02-20 15:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz
2009-02-14 18:25 ` Wolfgang Schuster
2009-02-14 18:37   ` Thomas A. Schmitz
2009-02-15  9:39   ` luigi scarso
2009-02-15 17:17     ` Thomas A. Schmitz
2009-02-17 22:07       ` luigi scarso
2009-02-19  8:54         ` Thomas A. Schmitz
2009-02-19  9:24           ` luigi scarso
2009-02-19 10:39           ` luigi scarso
2009-02-19 11:53             ` Thomas A. Schmitz
2009-02-19 14:10               ` luigi scarso
2009-02-20 15:09                 ` Thomas A. Schmitz
2009-02-20 15:35                   ` luigi scarso
2009-02-19 17:02           ` luigi scarso
2009-02-14 18:31 ` Patrick Gundlach
2009-02-14 19:06   ` Thomas A. Schmitz
2009-02-15 10:14 ` Khaled Hosny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).