From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/34097 Path: news.gmane.org!not-for-mail From: Mari Voipio Newsgroups: gmane.comp.tex.context Subject: Re: Microsoft Word -> Context Date: Tue, 03 Apr 2007 10:20:13 +0300 Message-ID: <4612002D.9040501@iki.fi> References: <627014675.20070402204746@gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1175584835 22089 80.91.229.12 (3 Apr 2007 07:20:35 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 3 Apr 2007 07:20:35 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Tue Apr 03 09:20:32 2007 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1HYdJf-0001El-RX for gctc-ntg-context-518@m.gmane.org; Tue, 03 Apr 2007 09:20:31 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 96C7720217; Tue, 3 Apr 2007 09:20:30 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 07170-01; Tue, 3 Apr 2007 09:20:22 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id A835520136; Tue, 3 Apr 2007 09:20:21 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 955B120136 for ; Tue, 3 Apr 2007 09:20:19 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 06483-03-2 for ; Tue, 3 Apr 2007 09:20:15 +0200 (CEST) Original-Received: from emh02.mail.saunalahti.fi (emh02.mail.saunalahti.fi [62.142.5.108]) by ronja.ntg.nl (Postfix) with ESMTP id 36A3220121 for ; Tue, 3 Apr 2007 09:20:15 +0200 (CEST) Original-Received: from [192.168.1.34] (a88-113-125-47.elisa-laajakaista.fi [88.113.125.47]) by emh02.mail.saunalahti.fi (Postfix) with ESMTP id 9C5E72BED8 for ; Tue, 3 Apr 2007 10:20:13 +0300 (EEST) User-Agent: Thunderbird 1.5.0.10 (Macintosh/20070221) In-Reply-To: <627014675.20070402204746@gmail.com> X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:34097 Archived-At: Vyatcheslav Yatskovsky wrote: > What can community say about the sensibility of my idea? And did > anyone attempt to implement some conversion tool? As has been mentioned (and as you can find out by searching the mailing list archives), this pops up once in a while and has been discussed. However, as former Word teacher and currently IT support and power user I must say that making a really working converter that would save a substantial amount of time is very hard. Why? Because at least 70% of Word users don't seem to know how to make a clearly structured document. And even if they'd once known that, all Word versions newer than 97 make it very hard to keep the consistency, at least with the default settings (which most users never change) that involve umpteen different yucky automatic features. I just got back a Word document that left with nice clean consistent styles and came back twice the size and a complete mess, thanks to Word2003... For example if the user has formatted the titles by hand, the human eye sees easily that "that's top level heading and this is second level heading", but Word thinks they are just specially formatted normal text. Consequentely, if your converter recognizes only the built-in Heading 1 style as top-level heading, you lose that in conversion anyway (even when converting to HTML, for example). Or, even worse, you can half-accidentally make new styles that count to the same level in the table of contents without looking like a heading... Ergo, everything has to be fixed by hand anyway. If the journal you are doing is not very complicated but the problem is getting a consistent quality, I'd do something like this: 1) Make a separate environment file with all the layout information (this is the bit that will take a chunk of your time in the first go if you don't have a huge amoung of experience from before.) 2) Mark the Word files (journal articles) with simple typesetting codes while in Word document format; i.e. add \chapter{} around the main title, \section{} around first level headings etc. And remember to add \starttext-\stoptext tags into the very beginning and end of the file; as environment is in a separate file, nothing is needed above \starttext. If you write a cheat sheet with examples, almost anyone can deal with this, if they have any idea of how document structure works out (and your lady has to have it as she's done it in Word). The human eye is a lot better at discerning what is a heading than an automatic system. I can even write that cheat sheet for you with references to the English version of Word, if that'll help. Now, if you have a lot of mathematics in the stuff, this may be trickier. Although so is the use of MS Equation Editor, a reasonable number of examples on 'if it looks like this, typeset like this' could work out. BTW, you could probably make a VBA macro to do some of the markup job - but it'll still only work if the original writer uses heading styles properly! At least in business environment this seems to be rather an exception than a rule, especially with the newer Words that make all kinds of deduction of their own and mess up with styles and heading levels and *everything* (frustated? me? never...) But Word's replace function is actually quite good, you can look for formats and do wildcards etc, so in theory you can do a macro that looks for 14 pt Arial bold and puts \section{ in front of it and } after it. [I've done some html conversion this way, because Word's own html is totally useless mess as it doesn't do css...] Note! If your files contain graphics, for ConTeXt you have to ask people to send them in separately as pdf, png or jpg (instead of putting them inline in the Word file). I have found *this* hard to achieve once in a while and I still often spend substantial time chasing down originals of graphics I get in Word files. 3) When the basic markup is done in the Word doc where you can see how the writer uses styles, save the file in text format. 4) Either make sure your typist's computer has a fully functioning WinConTeXt (you'll have to install and adjust a bit) with Cyrillic fonts and everything else, or just have her do the basic markup and then compile on your computer. But a lot depends on how your journal looks and how complicated stuff it contains and whether your typist is willing to live with having to type in some strange tags, i.e. if she'll want to learn anything new. [I've found that generally my fellow office workers don't want to deal with *anything* like this, but professional translators have no problems with ConTeXt code; and anybody with html-by-hand experience usually gets the drift very fast.] Having switched a very long structured file from Word to ConTeXt, I can say that doing to layout and the basic markup takes some time. But in the long run I have saved that time many times over. For example, when I have to do a new manual, I can use my existing environment/layout definitions, implementing that takes about 10 secs. For example about now I have to start writing a product manual where some parts of text come from an old Word file. I'll probably just cut and paste what I need from the pdf file, but it's still faster than fighting with Word over original the 9 MB (!) doc - and consistency can be guaranteed, unlike if I used Word, because the old file is done with Word95 and 97 and we now use Word 2003 where the list functions and styles work slightly differently and don't open quite as they used to be. These are very large files even optimized, but if you are very curious, you can compare the following public documents that are in my domain: Doc with Word original (attachments done in ConTeXt): http://www.kpatents.com/pdf/downloads/pr-01-s.pdf Doc that was converted from Word original to ConTeXt (this was my "practice piece"): http://www.kpatents.com/pdf/downloads/pr-03.pdf Similar doc with ConTeXt from the start: http://www.kpatents.com/pdf/downloads/pr-23.pdf I didn't originally make the first one (it predates my employment at the company), but I cleaned it up, and any changes are now made by me. When I started converting number 2 into ConTeXt, the instruction was that the manuals have to look alike. I did make some layout changes partly for legibility (wider margins) and some for practicality (couldn't get small caps out of my ConTeXt, so footer is normal text, not small caps), but they are still fairly alike. Oh and the first one has fixed graphic numbering (no captions), the others have the 'real thing'. And index only turned up with ConTeXt, because indexing is much easier/more transparent in it. NB. Cover pages are still all Word docs, I pdf them and insert into my ConTeXt file. One day I'll bother to learn enough that I can make the covers happen in ConTeXt, would make changing them a lot faster (usually the only change is in the version number). I don't know if this really helps, but at least that gives you some info on how others do things and what kind of experiences there are round this particular problem. Mari from Finland