ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Christopher Creutzig <christopher@creutzig.de>
Subject: Re: DOC/RTF to ConTeXt via XML
Date: Tue, 27 Sep 2005 11:03:57 +0200	[thread overview]
Message-ID: <43390AFD.9070903@creutzig.de> (raw)
In-Reply-To: <4338FD57.3040109@capdm.com>

Duncan Hothersall wrote:
>>Question: Is it possible to design a doc or rtf template that Open Office can 
>>convert to a sane, consistent xml format? 
> 
> 
> OpenOffice.org does allow you to attach an XSLT stylesheet to an export
> process which therefore allows you to do a (limited) transformation from
> the visual markup which is its native format to a more structured one

 Why „limited“?  Complicated things are just, well, a bit complicated to
achieve.  It is certainly possible to get a structured document from,
say, an average xhtml file.  I would prefer not to write that code,
though.  It would be rather boring and full of hard-to-read special cases.

> which you would need. But the biggest challenge is that all
> wordprocessors are designed for visual editing, meaning that there are,
> for example, 15 or so different ways to get a bulleted list in Word,
> creating 15 or so different RTF constructs, and coping with this can be
> a nightmare.

 Yes, it can.  (Although RTF is completely unrelated to this problem,
since OOo would read the Word file.  And the OOo step greatly simplifies
the problem, since iirc the OOo format has just one or maybe two ways of
saving bulleted lists.  Or were you refering to different bullets?)  The
stricter your rules for the authors are, the easier it is to write the
required xslt program.  If your authors expect to be able to write
chapter headers by manually switching to a font in the range of 20 to 24
pt and adding a number in front, you've got a hell of a coding session
in front of you.  If, otoh, you take the dictatorical approach of
telling them in advance that manual font changes (maybe apart from
pseudo-italics and pseudo-bold which will be mapped to \em in the end)
will simply be ignored, your code will be much easier but you may have a
problem with the authors.

> The FO approach (Paul Tremblay's focus) is one way to process XML to
> paginated output, but there are many others. Personally I don't like the
> FO approach, for a variety of reasons, but I'm sure others have had
> success with it. But you should also explore DocBook-in-ConTeXt, which
> uses ConTeXt's native XML processing capabilities. And don't rule out

 The advantage of using DocBook is that you get a very rich set of
capabilities.  The disadvantage can be described in almost the same
words, plus, as I said before, DocBook is one of the most verbose
formats in common use.  If you only use the format as an intermediate
step, that is irrelevant, but if your authors willsend in files that
way, it is not.

> using a separate scripting language to convert XML into ConTeXt as a
> batch process, since that will give you the ultimate flexibility in
> accessing all of ConTeXt's abilities.

 Personally, I'd use xslt for that.  Navigating the xml tree is
extremely easy and writing out text instead of xml is not really a problem.

>>Question: Does the entire journal have to be in programmed in xml or can 
>>ConTeXt process xml locally? For example, I may have my own article done in 
>>COnTeXt mixed with other articles done in rtf=>xml.
> 
> 
> You can just put XML into \startXMLdata ... \stopXMLdata blocks. I do
> this for MathML processing within a larger ConTeXt document.

 I'd approach Idris' problem the other way round: Transform the xml
files to ConTeXt and leave the ConTeXt files as is.  Then, texexec the
whole thing.

>>Any other advice (and/or pitfalls to watch for) would be appreciated. This 
>>sounds very promising!
> 
> 
> Horses for courses. It's possible to get sucked into things like an FO
> implementation or an XML conversion and find that you have spent months
> perfecting it and it only shaves half an hour off your production time!

 Amen.

 Also, don't limit your authors to Word.  Offering Word is obviously a
requirement, but if you go the way through OOo, there would be no point
in not offering an OOo template file.  If you are using a standard xml
format, such as (a subset of) DocBook or TEI, you probably should accept
articles in that format, too.  And, of course, ConTeXt.


Christopher

  reply	other threads:[~2005-09-27  9:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20050927074229.9EF85127E2@ronja.ntg.nl>
2005-09-27  8:05 ` Duncan Hothersall
2005-09-27  9:03   ` Christopher Creutzig [this message]
     [not found] <20050927100004.7F435127E5@ronja.ntg.nl>
2005-09-27 10:24 ` Duncan Hothersall
2005-09-27 13:42   ` Christopher Creutzig
2005-09-27 14:50 Idris Samawi Hamid
2005-09-28  8:02 ` Christopher Creutzig
2005-09-27 15:10 Idris Samawi Hamid
2005-09-27 15:19 ` Adam Lindsay
2005-09-28  7:08 ` Christopher Creutzig
     [not found] <20050928080211.5A0EB127F8@ronja.ntg.nl>
2005-09-28  8:54 ` Duncan Hothersall
2005-09-28 11:45   ` Christopher Creutzig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43390AFD.9070903@creutzig.de \
    --to=christopher@creutzig.de \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).