ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Microsoft Word -> Context
       [not found] <mailman.1.1175508001.8643.ntg-context@ntg.nl>
@ 2007-04-02 17:47 ` Vyatcheslav Yatskovsky
  2007-04-02 19:54   ` Andrea Valle
                     ` (3 more replies)
  2007-04-02 17:56 ` SciTe setup Vyatcheslav Yatskovsky
  1 sibling, 4 replies; 10+ messages in thread
From: Vyatcheslav Yatskovsky @ 2007-04-02 17:47 UTC (permalink / raw)
  To: ntg-context-request@ntg.nl

Hello,

My faculty receives papers in MS Word format.  One poor computer
literate lady is working very hard to typeset a journal of consistent
quality from those papers. All work is performed in MS Word and I
consider to suggest her to move to ConText (she don't have a slightest
idea of it at the moment, by the way). I already have some fonts and
header files to typeset math papers in Russian and I think I could
setup all things for her and provide help if needed.

Then, we need something like Word2ConText (or a macro written in VBA) to convert incoming papers to ConText
code and then easily assemble them. Something, that resembles famous
Word2Tex application.

What can community say about the sensibility of my idea?    And did
anyone attempt to implement some conversion tool?

Best regards,
Vyatcheslav Yatskovsky

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SciTe setup
       [not found] <mailman.1.1175508001.8643.ntg-context@ntg.nl>
  2007-04-02 17:47 ` Microsoft Word -> Context Vyatcheslav Yatskovsky
@ 2007-04-02 17:56 ` Vyatcheslav Yatskovsky
  2007-04-03  7:08   ` Hans Hagen
  1 sibling, 1 reply; 10+ messages in thread
From: Vyatcheslav Yatskovsky @ 2007-04-02 17:56 UTC (permalink / raw)
  To: ntg-context-request@ntg.nl

Hi,

A simple newbie 'abstract' question. How to setup fresh SciTe 1.73 installation to
work with context scripts?

And a more concrete one: how to load context.properties?

Best,
Vyatcheslav

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-02 17:47 ` Microsoft Word -> Context Vyatcheslav Yatskovsky
@ 2007-04-02 19:54   ` Andrea Valle
  2007-04-02 19:57   ` Karsten Heymann
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Andrea Valle @ 2007-04-02 19:54 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1700 bytes --]

The matter of export/import from conTeXt has been discussed many  
times but as far as I know there's no actual solution.
Ideas concern doc-->html-->context (via xml) or also doc-->open  
office xml --> xml in ConText


But no word2ConteXt ...:(

(for my needs I'd prefer conteXt2word)

Best

-a-


On 2 Apr 2007, at 19:47, Vyatcheslav Yatskovsky wrote:

> Hello,
>
> My faculty receives papers in MS Word format.  One poor computer
> literate lady is working very hard to typeset a journal of consistent
> quality from those papers. All work is performed in MS Word and I
> consider to suggest her to move to ConText (she don't have a slightest
> idea of it at the moment, by the way). I already have some fonts and
> header files to typeset math papers in Russian and I think I could
> setup all things for her and provide help if needed.
>
> Then, we need something like Word2ConText (or a macro written in  
> VBA) to convert incoming papers to ConText
> code and then easily assemble them. Something, that resembles famous
> Word2Tex application.
>
> What can community say about the sensibility of my idea?    And did
> anyone attempt to implement some conversion tool?
>
> Best regards,
> Vyatcheslav Yatskovsky
>
> _______________________________________________
> ntg-context mailing list
> ntg-context@ntg.nl
> http://www.ntg.nl/mailman/listinfo/ntg-context

--------------------------------------------------
Andrea Valle
--------------------------------------------------
CIRMA - DAMS
Università degli Studi di Torino
--> http://www.cirma.unito.it/andrea/
--> andrea.valle@unito.it
--------------------------------------------------



[-- Attachment #1.2: Type: text/html, Size: 6908 bytes --]

[-- Attachment #2: Type: text/plain, Size: 139 bytes --]

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-02 17:47 ` Microsoft Word -> Context Vyatcheslav Yatskovsky
  2007-04-02 19:54   ` Andrea Valle
@ 2007-04-02 19:57   ` Karsten Heymann
  2007-04-03  6:30     ` luigi scarso
  2007-04-02 22:35   ` Ricard Roca
  2007-04-03  7:20   ` Mari Voipio
  3 siblings, 1 reply; 10+ messages in thread
From: Karsten Heymann @ 2007-04-02 19:57 UTC (permalink / raw)
  To: Yatskovsky, mailing list for ConTeXt users

Hello Vyatcheslav,

2007/4/2, Vyatcheslav Yatskovsky <yatskovsky@gmail.com>:
> Then, we need something like Word2ConText (or a macro written in VBA) to convert
> incoming papers to ConText code and then easily assemble them. Something, that
> resembles famous Word2Tex application.

I've recently created such a solution for a journal, hand-crafted to a
very specific document template. They now have to pre-format every
article with this template, export it to HTML and
my converter makes Context of it. Be awary, that this required a
significiant amount of time
(and money, as it was contract work). But the basic idea is quite simple:

* preformat the doc in word by applying special paragraph styles to
all paragraphs (which
  will be mapped nicely to CSS classes)
* Export the word doc to HTML
* make XML from it with htmltidy
* filter out those huge amounts of unneeded stuff (CSS-Stuff, DIVs and the like)
* go through the list of paragraphs, and for each paragraph type know what to do

I've implemented it in Python (using DOM and SAX, now that I know
more, I would start with ElementTree from the beginning).
Unfortunately, as it was contract work, I cannot give out the code,
but if specific questions arise, I will gladly share my experiences.

Yours
Karsten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-02 17:47 ` Microsoft Word -> Context Vyatcheslav Yatskovsky
  2007-04-02 19:54   ` Andrea Valle
  2007-04-02 19:57   ` Karsten Heymann
@ 2007-04-02 22:35   ` Ricard Roca
  2007-04-03  7:20   ` Mari Voipio
  3 siblings, 0 replies; 10+ messages in thread
From: Ricard Roca @ 2007-04-02 22:35 UTC (permalink / raw)
  To: Yatskovsky, mailing list for ConTeXt users

Hi,

There are many Word to LaTeX converters, but no Word to ConTeXt 
converters. Some LaTeX converters, however, are highly configurable, and 
you can teach them to write {\em instead of \emph{, or \startitemize 
instead of \begin{itemize}, and so.

If you use a MS operating system, with Word-to-LaTeX (only for Windows) 
you can get a very clean output file with a format that is almost pure 
ConTeXt, only changing the configuration file of the application. You 
can download the program from http://kebrt.webz.cz/programs/word-to-latex/

The mail server does not accept attached my config file. I'll put it in 
the garden in a new page. It's not a definitive solution, however. You 
can play with the multiple options.

Best,

Ricard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-02 19:57   ` Karsten Heymann
@ 2007-04-03  6:30     ` luigi scarso
  2007-04-03  8:40       ` Karsten Heymann
  0 siblings, 1 reply; 10+ messages in thread
From: luigi scarso @ 2007-04-03  6:30 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[OT, sorry]
> I've implemented it in Python (using DOM and SAX, now that I know
> more, I would start with ElementTree from the beginning).
Did you found ElementTree better than standard modules or lxml?

>I will gladly share my experiences.
At epen I have talked (informally) with some peoples  about OO and context, and
I played last summer with python and  OO.
There are some interest about this argument, but I think that there is
still some items to focus.
Have you any suggestion about this ?

luigi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: SciTe setup
  2007-04-02 17:56 ` SciTe setup Vyatcheslav Yatskovsky
@ 2007-04-03  7:08   ` Hans Hagen
  0 siblings, 0 replies; 10+ messages in thread
From: Hans Hagen @ 2007-04-03  7:08 UTC (permalink / raw)
  To: Yatskovsky, mailing list for ConTeXt users

Vyatcheslav Yatskovsky wrote:
> Hi,
>
> A simple newbie 'abstract' question. How to setup fresh SciTe 1.73 installation to
> work with context scripts?
>
> And a more concrete one: how to load context.properties?
>   
there is a scite related manual in the document collection

you can load properties files in one of the user or global properties 
files; in cdwincontext you will find scite preconfigured

Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-02 17:47 ` Microsoft Word -> Context Vyatcheslav Yatskovsky
                     ` (2 preceding siblings ...)
  2007-04-02 22:35   ` Ricard Roca
@ 2007-04-03  7:20   ` Mari Voipio
  2007-04-03 21:26     ` Henning Hraban Ramm
  3 siblings, 1 reply; 10+ messages in thread
From: Mari Voipio @ 2007-04-03  7:20 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Vyatcheslav Yatskovsky wrote:
> What can community say about the sensibility of my idea?    And did
> anyone attempt to implement some conversion tool?

As has been mentioned (and as you can find out by searching the mailing
list archives), this pops up once in a while and has been discussed.


However, as former Word teacher and currently IT support and power user
I must say that making a really working converter that would
save a substantial amount of time is very hard.

Why? Because at least 70% of Word users don't seem to know how to make a 
clearly structured document. And even if they'd once known that, all 
Word versions newer than 97 make it very hard to keep the consistency, 
at least with the default settings (which most users never change) that
involve umpteen different yucky automatic features. I just got back a 
Word document that left with nice clean consistent styles and came back 
twice the size and a complete mess, thanks to Word2003...


For example if the user has formatted the titles by hand, the human eye
sees easily that "that's top level heading and this is second level
heading", but Word thinks they are just specially formatted normal text.
Consequentely, if your converter recognizes only the built-in Heading 1
style as top-level heading, you lose that in conversion anyway (even
when converting to HTML, for example). Or, even worse, you can 
half-accidentally make new styles that count to the same level in the 
table of contents without looking like a heading... Ergo, everything has 
to be fixed by hand anyway.


If the journal you are doing is not very complicated but the problem is 
  getting a consistent quality, I'd do something like this:


1) Make a separate environment file with all the layout information 
(this is the bit that will take a chunk of your time in the first go if 
you don't have a huge amoung of experience from before.)

2) Mark the Word files (journal articles) with simple typesetting codes 
while in Word document format; i.e. add \chapter{} around the main 
title, \section{} around first level headings etc. And remember to add 
\starttext-\stoptext tags into the very beginning and end of the file; 
as environment is in a separate file, nothing is needed above \starttext.
If you write a cheat sheet with examples, almost anyone can deal with 
this, if they have any idea of how document structure works out (and 
your lady has to have it as she's done it in Word). The human eye is a 
lot better at discerning what is a heading than an automatic system.
I can even write that cheat sheet for you with references to the English 
version of Word, if that'll help.
Now, if you have a lot of mathematics in the stuff, this may be 
trickier. Although so is the use of MS Equation Editor, a reasonable 
number of examples on 'if it looks like this, typeset like this' could 
work out.
BTW, you could probably make a VBA macro to do some of the markup job - 
but it'll still only work if the original writer uses heading styles 
properly! At least in business environment this seems to be rather an 
exception than a rule, especially with the newer Words that make all 
kinds of deduction of their own and mess up with styles and heading 
levels and *everything* (frustated? me? never...) But Word's replace 
function is actually quite good, you can look for formats and do 
wildcards etc, so in theory you can do a macro that looks for 14 pt 
Arial bold and puts \section{ in front of it and } after it. [I've done 
some html conversion this way, because Word's own html is totally 
useless mess as it doesn't do css...]

Note! If your files  contain graphics, for ConTeXt you have to ask 
people to send them in separately as pdf, png or jpg (instead of putting 
them inline in the Word file). I have found *this* hard to achieve once 
in a while and I still often spend substantial time chasing down 
originals of graphics I get in Word files.


3) When the basic markup is done in the Word doc where you can see how 
the writer uses styles, save the file in text format.

4) Either make sure your typist's computer has a fully functioning 
WinConTeXt (you'll have to install and adjust a bit) with Cyrillic fonts 
and everything else, or just have her do the basic markup and then 
compile on your computer.

But a lot depends on how your journal looks and how complicated stuff it 
contains and whether your typist is willing to live with having to type 
in some strange tags, i.e. if she'll want to learn anything new.
[I've found that generally my fellow office workers don't want to deal 
with *anything* like this, but professional translators have no problems 
with ConTeXt code; and anybody with html-by-hand experience usually gets 
the drift very fast.]


Having switched a very long structured file from Word to ConTeXt, I can 
say that doing to layout and the basic markup takes some time. But in 
the long run I have saved that time many times over. For example, when I 
have to do a new manual, I can use my existing environment/layout 
definitions, implementing that takes about 10 secs.

For example about now I have to start writing a product manual where 
some parts of text come from an old Word file. I'll probably just cut 
and paste what I need from the pdf file, but it's still faster than 
fighting with Word over original the 9 MB (!) doc - and consistency can 
be guaranteed, unlike if I used Word, because the old file is done with 
Word95 and 97 and we now use Word 2003 where the list functions and 
styles work slightly differently and don't open quite as they used to be.


These are very large files even optimized, but if you are very curious, 
you can compare the following public documents that are in my domain:

Doc with Word original (attachments done in ConTeXt): 
http://www.kpatents.com/pdf/downloads/pr-01-s.pdf

Doc that was converted from Word original to ConTeXt (this was my 
"practice piece"): http://www.kpatents.com/pdf/downloads/pr-03.pdf

Similar doc with ConTeXt from the start: 
http://www.kpatents.com/pdf/downloads/pr-23.pdf


I didn't originally make the first one (it predates my employment at the 
company), but I cleaned it up, and any changes are now made by me. When 
I started converting number 2 into ConTeXt, the instruction was that the 
manuals have to look alike. I did make some layout changes partly for 
legibility (wider margins) and some for practicality (couldn't get small 
caps out of my ConTeXt, so footer is normal text, not small caps), but 
they are still fairly alike. Oh and the first one has fixed graphic 
numbering (no captions), the others have the 'real thing'. And index 
only turned up with ConTeXt, because indexing is much easier/more 
transparent in it.

NB. Cover pages are still all Word docs, I pdf them and insert into my 
ConTeXt file. One day I'll bother to learn enough that I can make the 
covers happen in ConTeXt, would make changing them a lot faster (usually 
the only change is in the version number).



I don't know if this really helps, but at least that gives you some info 
on how others do things and what kind of experiences there are round 
this particular problem.



Mari from Finland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-03  6:30     ` luigi scarso
@ 2007-04-03  8:40       ` Karsten Heymann
  0 siblings, 0 replies; 10+ messages in thread
From: Karsten Heymann @ 2007-04-03  8:40 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi Luigi,

2007/4/3, luigi scarso <luigi.scarso@gmail.com>:
> [OT, sorry]
>> I've implemented it in Python (using DOM and SAX, now that I know
>> more, I would start with ElementTree from the beginning).
> Did you found ElementTree better than standard modules or lxml?

Definitely better than the standard modules. I did not use lxml so
far, but as far as I could see, it implements the ElementTree
interface too.

Yours
Karsten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Microsoft Word -> Context
  2007-04-03  7:20   ` Mari Voipio
@ 2007-04-03 21:26     ` Henning Hraban Ramm
  0 siblings, 0 replies; 10+ messages in thread
From: Henning Hraban Ramm @ 2007-04-03 21:26 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Am 2007-04-03 um 09:20 schrieb Mari Voipio:

> Note! If your files  contain graphics, for ConTeXt you have to ask
> people to send them in separately as pdf, png or jpg (instead of  
> putting
> them inline in the Word file). I have found *this* hard to achieve  
> once
> in a while and I still often spend substantial time chasing down
> originals of graphics I get in Word files.

A good way is to save the docs as OpenOffice docs, unzip them and  
collect the images from their folder.
But pictures in Word documents are crap anyway, most of the time.

For my main project at work (a city magazine, typeset with InDesign)  
I got everything as Word Docs until some issues before. After  
struggling with useless text formatting (hyperlinks! blech!) we  
copypasted only plain text and did the formatting again manually.
Now I wrote a editorial system as web application, where the authors  
have to fill fixed text boxes (title, intro, text, infos, author  
etc.). If everything's ready, I pull the whole stuff from the  
database and apply formatting (InDesign tagged text, but could be  
anything) to ease the layout work.
Event timetable data works similar, but via XML. (Why? InDesign can  
place images with XML, but not with TaggedText, and we need some  
icons in the calendar. We could use XML for everything, but InDesign  
is much faster with TaggedText.)

Of course that's no solution for most Word-to-ConTeXt cases, only as  
a side note...
And BTW: I really like InDesign as a layout app, but it's text  
handling (regarding XML or TaggedText import) is horrible! (Crappy  
coded - doesn't understand different line endings or different text  
encodings, only incomplete UTF-16 without BOM and predeclared Win or  
Mac line endings... XML is always whitespace sensible...)
Enough OT.

> [I've found that generally my fellow office workers don't want to deal
> with *anything* like this, but professional translators have no  
> problems
> with ConTeXt code; and anybody with html-by-hand experience usually  
> gets
> the drift very fast.]

Unfortunately even my HTML coding colleagues fear the command line.
And providing GUIs for my nice automation scripts (e.g. CD cover  
generator with ConTeXt) is tedious...

> For example about now I have to start writing a product manual where
> some parts of text come from an old Word file. I'll probably just cut
> and paste what I need from the pdf file, but it's still faster than
> fighting with Word over original the 9 MB (!) doc - and consistency  
> can
> be guaranteed, unlike if I used Word, because the old file is done  
> with
> Word95 and 97 and we now use Word 2003 where the list functions and
> styles work slightly differently and don't open quite as they used  
> to be.

Yup, I get a lot of crashes if the Word versions don't fit. I use  
TextEdit.app then to extract the text, but then (like with most other  
Word converters) you have to clean up the hyperlink and versions crap.


Greetlings from Lake Constance!
Hraban
---
http://www.fiee.net/texnique/
http://wiki.contextgarden.net
https://www.cacert.org (I'm an assurer)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-04-03 21:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.1.1175508001.8643.ntg-context@ntg.nl>
2007-04-02 17:47 ` Microsoft Word -> Context Vyatcheslav Yatskovsky
2007-04-02 19:54   ` Andrea Valle
2007-04-02 19:57   ` Karsten Heymann
2007-04-03  6:30     ` luigi scarso
2007-04-03  8:40       ` Karsten Heymann
2007-04-02 22:35   ` Ricard Roca
2007-04-03  7:20   ` Mari Voipio
2007-04-03 21:26     ` Henning Hraban Ramm
2007-04-02 17:56 ` SciTe setup Vyatcheslav Yatskovsky
2007-04-03  7:08   ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).