* question for the xml-experts @ 2009-02-14 17:40 Thomas A. Schmitz 2009-02-14 18:25 ` Wolfgang Schuster ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-14 17:40 UTC (permalink / raw) To: mailing ConTeXt users list for Hi all, this is not a question about direct technical details, but more of a conceptual problem, and I would love to have your input and ideas on this. I will be editing several edited volumes in my field (humanities, classics). From experience, I know that it's impossible to make scholars in the humanities adhere to standards. Each and every one of them will turn in a paper (most of them written in half a dozen different versions of Word) with its own idiosyncracies. At my last conference, I asked them to please use Unicode for their Greek passages, and I got blank looks and the question "What the hell is Unicode?" So: I want to extract the content of these papers and process it with ConTeXt. I thought the easiest route might be convert them to OpenOffice odt and then use the content.xml as a starting point. Since the formatting will be unusable anyways, it doesn't make sense to process the odt directly; instead, I want to transform the xml via xslt to a simplified format and then process that with ConTeXt. I have just discovered the tool xalan ( http://xml.apache.org/xalan-c/index.html ) which allows me to use an xslt style sheet and direct the output to a new file. I will then need to clean up these xml files and write a mkiv xml setup for them. So for those who know much more about this sort of workflow: does that make sense? Is there any better way to achieve these results, i.e., have the content of a couple of papers in Word and/or rtf format and typeset it in a consistent ConTeXt environment? Is there any tool better than xslt to convert the OpenOffice xml than xslt (anything in lua that can parse xml)? Anything better than xalan to convert xm -> xml? I'm just beginning to plan this, so I'd be most grateful for any pointers. Thanks for reading this long message, all best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz @ 2009-02-14 18:25 ` Wolfgang Schuster 2009-02-14 18:37 ` Thomas A. Schmitz 2009-02-15 9:39 ` luigi scarso 2009-02-14 18:31 ` Patrick Gundlach 2009-02-15 10:14 ` Khaled Hosny 2 siblings, 2 replies; 17+ messages in thread From: Wolfgang Schuster @ 2009-02-14 18:25 UTC (permalink / raw) To: mailing list for ConTeXt users Hi Thomas, why don't you take a look at the OpenOffice export function, I saw it's possible to convert a document to xhtml and this could be a start for you. Wolfgang Am 14.02.2009 um 18:40 schrieb Thomas A. Schmitz: > Hi all, > > this is not a question about direct technical details, but more of a > conceptual problem, and I would love to have your input and ideas on > this. I will be editing several edited volumes in my field > (humanities, classics). From experience, I know that it's impossible > to make scholars in the humanities adhere to standards. Each and > every one of them will turn in a paper (most of them written in half > a dozen different versions of Word) with its own idiosyncracies. At > my last conference, I asked them to please use Unicode for their > Greek passages, and I got blank looks and the question "What the > hell is Unicode?" > > So: I want to extract the content of these papers and process it > with ConTeXt. I thought the easiest route might be convert them to > OpenOffice odt and then use the content.xml as a starting point. > Since the formatting will be unusable anyways, it doesn't make sense > to process the odt directly; instead, I want to transform the xml > via xslt to a simplified format and then process that with ConTeXt. > I have just discovered the tool xalan ( http://xml.apache.org/xalan-c/index.html > ) which allows me to use an xslt style sheet and direct the output > to a new file. I will then need to clean up these xml files and > write a mkiv xml setup for them. > > So for those who know much more about this sort of workflow: does > that make sense? Is there any better way to achieve these results, > i.e., have the content of a couple of papers in Word and/or rtf > format and typeset it in a consistent ConTeXt environment? Is there > any tool better than xslt to convert the OpenOffice xml than xslt > (anything in lua that can parse xml)? Anything better than xalan to > convert xm -> xml? I'm just beginning to plan this, so I'd be most > grateful for any pointers. > > Thanks for reading this long message, all best > > Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-14 18:25 ` Wolfgang Schuster @ 2009-02-14 18:37 ` Thomas A. Schmitz 2009-02-15 9:39 ` luigi scarso 1 sibling, 0 replies; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-14 18:37 UTC (permalink / raw) To: mailing list for ConTeXt users On Feb 14, 2009, at 7:25 PM, Wolfgang Schuster wrote: > Hi Thomas, > > why don't you take a look at the OpenOffice export function, I saw > it's > possible to convert a document to xhtml and this could be a start > for you. > > Wolfgang Hi Wolfgang, thanks for the suggestion! I had, in fact, tried the export functions (docbook and xhtml), but both drop too much formating: all italics etc. are silently dropped, and dynamical references are replaced with their values. So unless I can manage to hack the export xslt files, this doesn't seem possible. All best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-14 18:25 ` Wolfgang Schuster 2009-02-14 18:37 ` Thomas A. Schmitz @ 2009-02-15 9:39 ` luigi scarso 2009-02-15 17:17 ` Thomas A. Schmitz 1 sibling, 1 reply; 17+ messages in thread From: luigi scarso @ 2009-02-15 9:39 UTC (permalink / raw) To: mailing list for ConTeXt users If you know python http://wiki.services.openoffice.org/wiki/PyUNO_bridge http://opendocumentfellowship.com/projects/odfpy For xml the choice is http://codespeak.net/lxml/ A native xml db, with XQuery and python binding http://www.oracle.com/technology/products/berkeley-db/xml/index.html And this is my experience : I'm programming in TeX (with context) , lua / python (they are similar) and xslt . For every project if I can I use lxml to manage xml sources, because it includes xslt but not viceversa. The goal is to translate xml in tex in the quickest way, and let mkiv to do the hard word. I have not a good feeling with xslt, because is not so powerful as lxml, and clearly is not a competitor of TeX . If I need storage, dbxml is good, and XQuery+lxml is powerful enought . OO has also docbook exporter http://www.docbook.org/ docbook is rich and with a good collection of xsl stylesheets to translate xml to html but maybe is ...too much . -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-15 9:39 ` luigi scarso @ 2009-02-15 17:17 ` Thomas A. Schmitz 2009-02-17 22:07 ` luigi scarso 0 siblings, 1 reply; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-15 17:17 UTC (permalink / raw) To: mailing list for ConTeXt users Luigi and Khaled, thanks a lot for your replies! Luigi: I had a look at python lxml; it looks very powerful and interesting, and I will try and see if can make use of it. Why do you translate your xml sources into tex instead of using the mkiv mechanism for processing xml, is it because of speed? Khaled: I have to see if I can tweak the OpenOffice docbook converter to keep more of the formatting; in its default state, it drops too much important stuff... Right now, I have followed Patrick's advice. I've installed saxon9 and am writing a xslt stylesheet to translate the openoffice xml into a cleaner and easier to handle format. I'm making progress... Maybe we should put something like this on the wiki and make it a collaborative effort - I can only write rules for stuff that occurs in my documents, and that is of course only a subset of what OpenOffice has, so it would be good to add rules as people find interesting features. All best Thomas On Feb 15, 2009, at 10:39 AM, luigi scarso wrote: > If you know python > http://wiki.services.openoffice.org/wiki/PyUNO_bridge > http://opendocumentfellowship.com/projects/odfpy > For xml the choice is > http://codespeak.net/lxml/ > > A native xml db, with XQuery and python binding > http://www.oracle.com/technology/products/berkeley-db/xml/index.html > > > > And this is my experience : > I'm programming in TeX (with context) , lua / python (they are > similar) and xslt . > For every project if I can I use lxml to manage xml sources, because > it includes xslt but not viceversa. > The goal is to translate xml in tex in the quickest way, and let mkiv > to do the hard word. > I have not a good feeling with xslt, because is not so powerful as > lxml, and clearly is not a competitor of TeX . > > If I need storage, dbxml is good, and XQuery+lxml is powerful > enought . > > OO has also docbook exporter > http://www.docbook.org/ > docbook > is rich and with a good collection of xsl stylesheets to translate > xml to html > but maybe is ...too much . > > -- > luigi > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an > entry to the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : https://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-15 17:17 ` Thomas A. Schmitz @ 2009-02-17 22:07 ` luigi scarso 2009-02-19 8:54 ` Thomas A. Schmitz 0 siblings, 1 reply; 17+ messages in thread From: luigi scarso @ 2009-02-17 22:07 UTC (permalink / raw) To: mailing list for ConTeXt users On Sun, Feb 15, 2009 at 6:17 PM, Thomas A. Schmitz <thomas.schmitz@uni-bonn.de> wrote: > Luigi and Khaled, > > thanks a lot for your replies! Luigi: I had a look at python lxml; it looks > very powerful and interesting, and I will try and see if can make use of it. > Why do you translate your xml sources into tex instead of using the mkiv > mechanism for processing xml, is it because of speed? (sorry x my laziness) If I have a good xml , then mkiv is a good choice. As far I know, mkiv ~ xslt by lpeg, so "traditional" xml--( xslt )-->tex--( mkiv )-->pdf is like xml-->( mkiv )-->pdf Note that in the last chain one mixes xml+tex: if xml become complex, this can end in a messy situation. But some documents need heavy preprocessing: for example, I have one that come from java classes serialization, and I need the power of python (lxml) to do a clean work . Also, if xml changes , I 've found that lxml is more flexible than xslt. In this case I have xml--( lxml )-->tex--( mkiv )-->pdf The fact is that python and lua are not so differents, so I've to manage two languages (python+lua) and tex; with 'traditional' workflow you have to manage 3 languages xslt,lua and tex and subdivide responsability is not so easy as the former . BTW, I have no test that say "this one is quickly than that one" . -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-17 22:07 ` luigi scarso @ 2009-02-19 8:54 ` Thomas A. Schmitz 2009-02-19 9:24 ` luigi scarso ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-19 8:54 UTC (permalink / raw) To: mailing list for ConTeXt users On Feb 17, 2009, at 11:07 PM, luigi scarso wrote: > (sorry x my laziness) > If I have a good xml , then mkiv is a good choice. As far I know, mkiv > ~ xslt by lpeg, so > "traditional" > xml--( xslt )-->tex--( mkiv )-->pdf > is like > xml-->( mkiv )-->pdf > Note that in the last chain one mixes xml+tex: if xml become complex, > this can end in a messy situation. > > Yes, you're right of course. I have a similar situation here: the xml produced by ooo is too messy, so I want to preprocess it to something that is easier to maintain and modify (e.g., I will, at some point, add index entries and a TOC); that's why I use xslt here. But I still produce xml which I process with mkiv. > But some documents need heavy preprocessing: > for example, I have one that come from java classes serialization, > and I need the power of python (lxml) to do a clean work . > Also, if xml changes , I 've found that lxml is more flexible than > xslt. > In this case I have > xml--( lxml )-->tex--( mkiv )-->pdf > > The fact is that python and lua are not so differents, > so I've to manage two languages > (python+lua) and tex; > with 'traditional' workflow you have to manage 3 languages > xslt,lua and tex > and subdivide responsability is not so easy as the former . Interesting. I have tried to play around with python-lxml, but am having some problems to understand it. Just to give me an idea: how would you transform this: <text:span text:style-name="T3">foo</text:span> to this <emph>foo</emph> with lxml? lxml seems to object to the ":" in the tag, even though it's declared in the document. Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-19 8:54 ` Thomas A. Schmitz @ 2009-02-19 9:24 ` luigi scarso 2009-02-19 10:39 ` luigi scarso 2009-02-19 17:02 ` luigi scarso 2 siblings, 0 replies; 17+ messages in thread From: luigi scarso @ 2009-02-19 9:24 UTC (permalink / raw) To: mailing list for ConTeXt users > Yes, you're right of course. > I have a similar situation here: the xml > produced by ooo is too messy, so I want to preprocess it to something that > is easier to maintain and modify (e.g., I will, at some point, add index > entries and a TOC); that's why I use xslt here. But I still produce xml > which I process with mkiv. so you have xml --( xslt )-->xml--( mkiv ) --> pdf where the second xml is no normative, while the first yes. In yor situation I prefear xml --( xslt )-->tex--( mkiv ) --> pdf because there is no much differences between stylesheets of xml --( xslt )-->xml and xml --( xslt )-->tex and there is a clear distinction of roles: xml carries the semantic, tex the presentation . This chain xml --( xslt )-->xml--( mkiv ) --> pdf can be reasonable if the first xml come out from a db extraction (you must be quick and make the correct queries, so this xml is typically in a row major fashion. ie like a table), and the second xml is book-oriented and it is simple . BTW "always choose whatever is right for you needs" >. Just to give me an idea: how would you > transform this: > > <text:span text:style-name="T3">foo</text:span> > > to this > > <emph>foo</emph> > > with lxml? lxml seems to object to the ":" in the tag, even though it's > declared in the document. I will give it a look -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-19 8:54 ` Thomas A. Schmitz 2009-02-19 9:24 ` luigi scarso @ 2009-02-19 10:39 ` luigi scarso 2009-02-19 11:53 ` Thomas A. Schmitz 2009-02-19 17:02 ` luigi scarso 2 siblings, 1 reply; 17+ messages in thread From: luigi scarso @ 2009-02-19 10:39 UTC (permalink / raw) To: mailing list for ConTeXt users On Thu, Feb 19, 2009 at 9:54 AM, Thomas A. Schmitz <thomas.schmitz@uni-bonn.de> wrote: > > On Feb 17, 2009, at 11:07 PM, luigi scarso wrote: > >> (sorry x my laziness) >> If I have a good xml , then mkiv is a good choice. As far I know, mkiv >> ~ xslt by lpeg, so >> "traditional" >> xml--( xslt )-->tex--( mkiv )-->pdf >> is like >> xml-->( mkiv )-->pdf >> Note that in the last chain one mixes xml+tex: if xml become complex, >> this can end in a messy situation. >> >> > Yes, you're right of course. I have a similar situation here: the xml > produced by ooo is too messy, so I want to preprocess it to something that > is easier to maintain and modify (e.g., I will, at some point, add index > entries and a TOC); that's why I use xslt here. But I still produce xml > which I process with mkiv. > >> But some documents need heavy preprocessing: >> for example, I have one that come from java classes serialization, >> and I need the power of python (lxml) to do a clean work . >> Also, if xml changes , I 've found that lxml is more flexible than xslt. >> In this case I have >> xml--( lxml )-->tex--( mkiv )-->pdf >> >> The fact is that python and lua are not so differents, >> so I've to manage two languages >> (python+lua) and tex; >> with 'traditional' workflow you have to manage 3 languages >> xslt,lua and tex >> and subdivide responsability is not so easy as the former . > > Interesting. I have tried to play around with python-lxml, but am having > some problems to understand it. Just to give me an idea: how would you > transform this: > > <text:span text:style-name="T3">foo</text:span> > > to this > > <emph>foo</emph> > > with lxml? lxml seems to object to the ":" in the tag, even though it's > declared in the document. > > Thomas t.xml: <foo xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"> <text:span text:style-name="T3">foo</text:span> </foo> # python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> tree = etree.parse(file('t.xml')) >>> foo = tree.getroot() >>> foo.tag 'foo' >>> >>> [child.tag for child in foo.iterdescendants() ] ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span'] >>> print foo.iterdescendants.__doc__ iterdescendants(self, tag=None) Iterate over the descendants of this element in document order. As opposed to ``el.iter()``, this iterator does not yield the element itself. The generated elements can be restricted to a specific tag name with the 'tag' keyword. >>> >>> FOO = etree.Element("FOO") >>> emph = etree.Element("emph") >>> [child.tag for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ] ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span'] >>> span = [child for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ][0] >>> emph.text = span.text >>> FOO.append(emph) >>> etree.tostring(FOO) '<FOO><emph>foo</emph></FOO>' >>> http://codespeak.net/lxml/tutorial.html http://codespeak.net/lxml/api.html -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-19 10:39 ` luigi scarso @ 2009-02-19 11:53 ` Thomas A. Schmitz 2009-02-19 14:10 ` luigi scarso 0 siblings, 1 reply; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-19 11:53 UTC (permalink / raw) To: mailing list for ConTeXt users On Feb 19, 2009, at 11:39 AM, luigi scarso wrote: >>>> >>>> >>>> FOO = etree.Element("FOO") >>>> emph = etree.Element("emph") >>>> [child.tag for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0 >>>> }span' ) ] > ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span'] >>>> span = [child for child in foo.iterdescendants(tag = '{urn:oasis:names:tc:opendocument:xmlns:text:1.0 >>>> }span' ) ][0] >>>> emph.text = span.text >>>> FOO.append(emph) >>>> etree.tostring(FOO) > '<FOO><emph>foo</emph></FOO>' >>>> > Excuse me for being dense: you mean all namespaces have to be explicitly expanded? Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-19 11:53 ` Thomas A. Schmitz @ 2009-02-19 14:10 ` luigi scarso 2009-02-20 15:09 ` Thomas A. Schmitz 0 siblings, 1 reply; 17+ messages in thread From: luigi scarso @ 2009-02-19 14:10 UTC (permalink / raw) To: mailing list for ConTeXt users On Thu, Feb 19, 2009 at 12:53 PM, Thomas A. Schmitz <thomas.schmitz@uni-bonn.de> wrote: > > On Feb 19, 2009, at 11:39 AM, luigi scarso wrote: > >>>>> >>>>> >>>>> FOO = etree.Element("FOO") >>>>> emph = etree.Element("emph") >>>>> [child.tag for child in foo.iterdescendants(tag = >>>>> '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ] >> >> ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span'] >>>>> >>>>> span = [child for child in foo.iterdescendants(tag = >>>>> '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' ) ][0] >>>>> emph.text = span.text >>>>> FOO.append(emph) >>>>> etree.tostring(FOO) >> >> '<FOO><emph>foo</emph></FOO>' >>>>> >> > > Excuse me for being dense: you mean all namespaces have to be explicitly > expanded? see http://codespeak.net/lxml/tutorial.html#namespaces -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-19 14:10 ` luigi scarso @ 2009-02-20 15:09 ` Thomas A. Schmitz 2009-02-20 15:35 ` luigi scarso 0 siblings, 1 reply; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-20 15:09 UTC (permalink / raw) To: mailing list for ConTeXt users On Feb 19, 2009, at 3:10 PM, luigi scarso wrote: > see > http://codespeak.net/lxml/tutorial.html#namespaces Luigi, thanks so much for your patient replies. I have now begun to play with python's lxml. It offers a lot, maybe too much for a beginner. One advantage for my immediate needs that I see is that it offers the possibility to use Python's regular expressions and control structures, so this may make coding easier to maintain and adapt that in the rather clumsy xslt syntax; it may be a big help for the rather messy OpenOffice xml that I want to process. I had already tried w2latex a while ago. I found it very limited and lacking documentation, so I haven't pursued this track. Again, thanks for getting me started! Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-20 15:09 ` Thomas A. Schmitz @ 2009-02-20 15:35 ` luigi scarso 0 siblings, 0 replies; 17+ messages in thread From: luigi scarso @ 2009-02-20 15:35 UTC (permalink / raw) To: mailing list for ConTeXt users On Fri, Feb 20, 2009 at 4:09 PM, Thomas A. Schmitz <thomas.schmitz@uni-bonn.de> wrote: > > On Feb 19, 2009, at 3:10 PM, luigi scarso wrote: > >> see >> http://codespeak.net/lxml/tutorial.html#namespaces > > Luigi, > > thanks so much for your patient replies. I have now begun to play with > python's lxml. It offers a lot, maybe too much for a beginner. One advantage > for my immediate needs that I see is that it offers the possibility to use > Python's regular expressions and control structures, so this may make coding > easier to maintain and adapt that in the rather clumsy xslt syntax; it may > be a big help for the rather messy OpenOffice xml that I want to process. also Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> URI_OFFICE = "urn:oasis:names:tc:opendocument:xmlns:office:1.0" URI_STYLE = "urn:oasis:names:tc:opendocument:xmlns:style:1.0" URI_TEXT = "urn:oasis:names:tc:opendocument:xmlns:text:1.0" URI_TABLE = "urn:oasis:names:tc:opendocument:xmlns:table:1.0" URI_DRAW = "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" URI_FO = "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" URI_XLINK = "http://www.w3.org/1999/xlink" URI_DC = "http://purl.org/dc/elements/1.1/" URI_META = "urn:oasis:names:tc:opendocument:xmlns:meta:1.0" URI_NUMBER = "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" URI_PRESENTATION = "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" URI_SVG = "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" URI_CHART = "urn:oasis:names:tc:opendocument:xmlns:chart:1.0" URI_DR3D = "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" URI_MATH = "http://www.w3.org/1998/Math/MathML" URI_FORM = "urn:oasis:names:tc:opendocument:xmlns:form:1.0" URI_SCRIPT = "urn:oasis:names:tc:opendocument:xmlns:script:1.0" URI_OOO = "http://openoffice.org/2004/office" URI_OOOW = "http://openoffice.org/2004/writer" URI_OOOC = "http://openoffice.org/2004/calc" URI_DOM = "http://www.w3.org/2001/xml-events" URI_XFORMS = "http://www.w3.org/2002/xforms" URI_XSD = "http://www.w3.org/2001/XMLSchema" URI_XSI = "http://www.w3.org/2001/XMLSchema-instance" URI_FIELD = "urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0" >>> NSMAP_OO = { "office" : URI_OFFICE, "style" : URI_STYLE, "text" : URI_TEXT, "table" : URI_TABLE, "draw" : URI_DRAW, "fo" : URI_FO, "xlink" : URI_XLINK, "dc" : URI_DC, "meta" : URI_META, "number" : URI_NUMBER, "presentation" : URI_PRESENTATION, "svg" : URI_SVG, "chart" : URI_CHART, "dr3d" : URI_DR3D, "math" : URI_MATH, "form" : URI_FORM, "script" : URI_SCRIPT, "ooo" : URI_OOO, "ooow" : URI_OOOW, "oooc" : URI_OOOC, "dom" : URI_DOM, "xforms" : URI_XFORMS, "xsd" : URI_XSD, "xsi" : URI_XSI, "field" : URI_FIELD, } >>> from lxml import etree >>> tree = etree.parse(file('t.xml')) >>> >>> foo = tree.getroot() >>> [child.tag for child in foo.iterdescendants(tag = '{%s}span'%URI_TEXT ) ] ['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span'] give a look at http://opendocumentfellowship.com/projects/odfpy too -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-19 8:54 ` Thomas A. Schmitz 2009-02-19 9:24 ` luigi scarso 2009-02-19 10:39 ` luigi scarso @ 2009-02-19 17:02 ` luigi scarso 2 siblings, 0 replies; 17+ messages in thread From: luigi scarso @ 2009-02-19 17:02 UTC (permalink / raw) To: mailing list for ConTeXt users > Yes, you're right of course. I have a similar situation here: the xml > produced by ooo is too messy, so I want to preprocess it to something that > is easier to maintain and modify (e.g., I will, at some point, add index > entries and a TOC); that's why I use xslt here. But I still produce xml > which I process with mkiv. also this http://www.hj-gym.dk/~hj/writer2latex/ -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz 2009-02-14 18:25 ` Wolfgang Schuster @ 2009-02-14 18:31 ` Patrick Gundlach 2009-02-14 19:06 ` Thomas A. Schmitz 2009-02-15 10:14 ` Khaled Hosny 2 siblings, 1 reply; 17+ messages in thread From: Patrick Gundlach @ 2009-02-14 18:31 UTC (permalink / raw) To: ntg-context Hi Thomas, > process the odt directly; instead, I want to transform the xml via > xslt to a simplified format and then process that with ConTeXt. I have > just discovered the tool xalan ( > http://xml.apache.org/xalan-c/index.html ) which allows me to use an > xslt style sheet and direct the output to a new file. I will then > need to clean up these xml files and write a mkiv xml setup for them. > > So for those who know much more about this sort of workflow: does that > make sense? Yes, it does. At my company we clean up (and reorganize) XML data with XSLT all the time. We are happy users of saxon 9 (http://saxon.sourceforge.net/) which is an xslt 2.0 engine. Learning XSLT is not trivial (but not too hard either), but once you get an understanding of it nobody can stop you using XSLT for 'everything'. Patrick -- ConTeXt wiki and more: http://contextgarden.net ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-14 18:31 ` Patrick Gundlach @ 2009-02-14 19:06 ` Thomas A. Schmitz 0 siblings, 0 replies; 17+ messages in thread From: Thomas A. Schmitz @ 2009-02-14 19:06 UTC (permalink / raw) To: mailing list for ConTeXt users On Feb 14, 2009, at 7:31 PM, Patrick Gundlach wrote: > Yes, it does. At my company we clean up (and reorganize) XML data with > XSLT all the time. We are happy users of saxon 9 > (http://saxon.sourceforge.net/) which is an xslt 2.0 engine. Learning > XSLT is not trivial (but not too hard either), but once you get an > understanding of it nobody can stop you using XSLT for 'everything'. > > > Patrick Great, I will look into saxon and xslt! Best Thomas ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: question for the xml-experts 2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz 2009-02-14 18:25 ` Wolfgang Schuster 2009-02-14 18:31 ` Patrick Gundlach @ 2009-02-15 10:14 ` Khaled Hosny 2 siblings, 0 replies; 17+ messages in thread From: Khaled Hosny @ 2009-02-15 10:14 UTC (permalink / raw) To: mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 2822 bytes --] You may consider giving dbcontext a look, it is written in python and seems to use xsl to translate DocBook's xml into TeX files to be typeset by ConTeXt. http://dblatex.sourceforge.net/doc/pt02.html Regards, Khaled On Sat, Feb 14, 2009 at 06:40:51PM +0100, Thomas A. Schmitz wrote: > Hi all, > > this is not a question about direct technical details, but more of a > conceptual problem, and I would love to have your input and ideas on > this. I will be editing several edited volumes in my field (humanities, > classics). From experience, I know that it's impossible to make scholars > in the humanities adhere to standards. Each and every one of them will > turn in a paper (most of them written in half a dozen different versions > of Word) with its own idiosyncracies. At my last conference, I asked them > to please use Unicode for their Greek passages, and I got blank looks and > the question "What the hell is Unicode?" > > So: I want to extract the content of these papers and process it with > ConTeXt. I thought the easiest route might be convert them to OpenOffice > odt and then use the content.xml as a starting point. Since the > formatting will be unusable anyways, it doesn't make sense to process the > odt directly; instead, I want to transform the xml via xslt to a > simplified format and then process that with ConTeXt. I have just > discovered the tool xalan ( http://xml.apache.org/xalan-c/index.html ) > which allows me to use an xslt style sheet and direct the output to a new > file. I will then need to clean up these xml files and write a mkiv xml > setup for them. > > So for those who know much more about this sort of workflow: does that > make sense? Is there any better way to achieve these results, i.e., have > the content of a couple of papers in Word and/or rtf format and typeset > it in a consistent ConTeXt environment? Is there any tool better than > xslt to convert the OpenOffice xml than xslt (anything in lua that can > parse xml)? Anything better than xalan to convert xm -> xml? I'm just > beginning to plan this, so I'd be most grateful for any pointers. > > Thanks for reading this long message, all best > > Thomas > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an entry to the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : https://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ -- Khaled Hosny Arabic localizer and member of Arabeyes.org team [-- Attachment #1.2: Digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 487 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-02-20 15:35 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-02-14 17:40 question for the xml-experts Thomas A. Schmitz 2009-02-14 18:25 ` Wolfgang Schuster 2009-02-14 18:37 ` Thomas A. Schmitz 2009-02-15 9:39 ` luigi scarso 2009-02-15 17:17 ` Thomas A. Schmitz 2009-02-17 22:07 ` luigi scarso 2009-02-19 8:54 ` Thomas A. Schmitz 2009-02-19 9:24 ` luigi scarso 2009-02-19 10:39 ` luigi scarso 2009-02-19 11:53 ` Thomas A. Schmitz 2009-02-19 14:10 ` luigi scarso 2009-02-20 15:09 ` Thomas A. Schmitz 2009-02-20 15:35 ` luigi scarso 2009-02-19 17:02 ` luigi scarso 2009-02-14 18:31 ` Patrick Gundlach 2009-02-14 19:06 ` Thomas A. Schmitz 2009-02-15 10:14 ` Khaled Hosny
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).