From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/22738 Path: news.gmane.org!not-for-mail From: Christopher Creutzig Newsgroups: gmane.comp.tex.context Subject: Re: DOC/RTF to ConTeXt via XML Date: Tue, 27 Sep 2005 11:03:57 +0200 Message-ID: <43390AFD.9070903@creutzig.de> References: <20050927074229.9EF85127E2@ronja.ntg.nl> <4338FD57.3040109@capdm.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1127811904 24054 80.91.229.2 (27 Sep 2005 09:05:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 27 Sep 2005 09:05:04 +0000 (UTC) Original-X-From: ntg-context-bounces@ntg.nl Tue Sep 27 11:04:58 2005 Return-path: Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1EKBNi-0003ou-C7 for gctc-ntg-context-518@m.gmane.org; Tue, 27 Sep 2005 11:04:10 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id DF245127A5; Tue, 27 Sep 2005 11:04:09 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 26134-04-3; Tue, 27 Sep 2005 11:04:05 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 63B57127BB; Tue, 27 Sep 2005 11:04:05 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 14F8E127BB for ; Tue, 27 Sep 2005 11:04:04 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 26134-04-2 for ; Tue, 27 Sep 2005 11:04:02 +0200 (CEST) Original-Received: from bayes.math.uni-paderborn.de (bayes.math.uni-paderborn.de [131.234.116.40]) by ronja.ntg.nl (Postfix) with ESMTP id BA738127A5 for ; Tue, 27 Sep 2005 11:04:02 +0200 (CEST) Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by bayes.math.uni-paderborn.de (Postfix) with ESMTP id 2044CE0000B2 for ; Tue, 27 Sep 2005 11:04:02 +0200 (CEST) Original-Received: from bayes.math.uni-paderborn.de ([127.0.0.1]) by localhost (bayes [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 19595-06 for ; Tue, 27 Sep 2005 11:04:00 +0200 (CEST) Original-Received: from [192.168.1.2] (p548B151A.dip0.t-ipconnect.de [84.139.21.26]) by bayes.math.uni-paderborn.de (Postfix) with ESMTP id D7592E0000B8 for ; Tue, 27 Sep 2005 11:03:59 +0200 (CEST) User-Agent: Mozilla Thunderbird 1.0.6 (Macintosh/20050716) X-Accept-Language: en-us, en Original-To: mailing list for ConTeXt users In-Reply-To: <4338FD57.3040109@capdm.com> X-Enigmail-Version: 0.92.0.0 X-Virus-Scanned: by mailscan-system at math.uni-paderborn.de X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.5 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on smtp.ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:22738 Archived-At: Duncan Hothersall wrote: >>Question: Is it possible to design a doc or rtf template that Open Offi= ce can=20 >>convert to a sane, consistent xml format?=20 >=20 >=20 > OpenOffice.org does allow you to attach an XSLT stylesheet to an export > process which therefore allows you to do a (limited) transformation fro= m > the visual markup which is its native format to a more structured one Why =84limited=93? Complicated things are just, well, a bit complicated= to achieve. It is certainly possible to get a structured document from, say, an average xhtml file. I would prefer not to write that code, though. It would be rather boring and full of hard-to-read special cases= . > which you would need. But the biggest challenge is that all > wordprocessors are designed for visual editing, meaning that there are, > for example, 15 or so different ways to get a bulleted list in Word, > creating 15 or so different RTF constructs, and coping with this can be > a nightmare. Yes, it can. (Although RTF is completely unrelated to this problem, since OOo would read the Word file. And the OOo step greatly simplifies the problem, since iirc the OOo format has just one or maybe two ways of saving bulleted lists. Or were you refering to different bullets?) The stricter your rules for the authors are, the easier it is to write the required xslt program. If your authors expect to be able to write chapter headers by manually switching to a font in the range of 20 to 24 pt and adding a number in front, you've got a hell of a coding session in front of you. If, otoh, you take the dictatorical approach of telling them in advance that manual font changes (maybe apart from pseudo-italics and pseudo-bold which will be mapped to \em in the end) will simply be ignored, your code will be much easier but you may have a problem with the authors. > The FO approach (Paul Tremblay's focus) is one way to process XML to > paginated output, but there are many others. Personally I don't like th= e > FO approach, for a variety of reasons, but I'm sure others have had > success with it. But you should also explore DocBook-in-ConTeXt, which > uses ConTeXt's native XML processing capabilities. And don't rule out The advantage of using DocBook is that you get a very rich set of capabilities. The disadvantage can be described in almost the same words, plus, as I said before, DocBook is one of the most verbose formats in common use. If you only use the format as an intermediate step, that is irrelevant, but if your authors willsend in files that way, it is not. > using a separate scripting language to convert XML into ConTeXt as a > batch process, since that will give you the ultimate flexibility in > accessing all of ConTeXt's abilities. Personally, I'd use xslt for that. Navigating the xml tree is extremely easy and writing out text instead of xml is not really a proble= m. >>Question: Does the entire journal have to be in programmed in xml or ca= n=20 >>ConTeXt process xml locally? For example, I may have my own article don= e in=20 >>COnTeXt mixed with other articles done in rtf=3D>xml. >=20 >=20 > You can just put XML into \startXMLdata ... \stopXMLdata blocks. I do > this for MathML processing within a larger ConTeXt document. I'd approach Idris' problem the other way round: Transform the xml files to ConTeXt and leave the ConTeXt files as is. Then, texexec the whole thing. >>Any other advice (and/or pitfalls to watch for) would be appreciated. T= his=20 >>sounds very promising! >=20 >=20 > Horses for courses. It's possible to get sucked into things like an FO > implementation or an XML conversion and find that you have spent months > perfecting it and it only shaves half an hour off your production time! Amen. Also, don't limit your authors to Word. Offering Word is obviously a requirement, but if you go the way through OOo, there would be no point in not offering an OOo template file. If you are using a standard xml format, such as (a subset of) DocBook or TEI, you probably should accept articles in that format, too. And, of course, ConTeXt. Christopher