From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/34093 Path: news.gmane.org!not-for-mail From: "Karsten Heymann" Newsgroups: gmane.comp.tex.context Subject: Re: Microsoft Word -> Context Date: Mon, 2 Apr 2007 21:57:56 +0200 Message-ID: References: <627014675.20070402204746@gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1175546361 12357 80.91.229.12 (2 Apr 2007 20:39:21 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 2 Apr 2007 20:39:21 +0000 (UTC) To: Yatskovsky , "mailing list for ConTeXt users" Original-X-From: ntg-context-bounces@ntg.nl Mon Apr 02 22:39:18 2007 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1HYTJ6-0002PO-Kk for gctc-ntg-context-518@m.gmane.org; Mon, 02 Apr 2007 22:39:16 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 200DE20061; Mon, 2 Apr 2007 22:39:16 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 04331-02; Mon, 2 Apr 2007 22:39:15 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id AF4D41FB6B; Mon, 2 Apr 2007 22:36:08 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id E3C9E1FAE5 for ; Mon, 2 Apr 2007 22:36:06 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 04276-07-2 for ; Mon, 2 Apr 2007 22:36:04 +0200 (CEST) Original-Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.225]) by ronja.ntg.nl (Postfix) with ESMTP id 18BC3203A8 for ; Mon, 2 Apr 2007 21:58:00 +0200 (CEST) Original-Received: by nz-out-0506.google.com with SMTP id z3so938586nzf for ; Mon, 02 Apr 2007 12:57:56 -0700 (PDT) Original-Received: by 10.115.61.1 with SMTP id o1mr1957743wak.1175543876200; Mon, 02 Apr 2007 12:57:56 -0700 (PDT) Original-Received: by 10.114.36.4 with HTTP; Mon, 2 Apr 2007 12:57:56 -0700 (PDT) In-Reply-To: <627014675.20070402204746@gmail.com> Content-Disposition: inline X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:34093 Archived-At: Hello Vyatcheslav, 2007/4/2, Vyatcheslav Yatskovsky : > Then, we need something like Word2ConText (or a macro written in VBA) to convert > incoming papers to ConText code and then easily assemble them. Something, that > resembles famous Word2Tex application. I've recently created such a solution for a journal, hand-crafted to a very specific document template. They now have to pre-format every article with this template, export it to HTML and my converter makes Context of it. Be awary, that this required a significiant amount of time (and money, as it was contract work). But the basic idea is quite simple: * preformat the doc in word by applying special paragraph styles to all paragraphs (which will be mapped nicely to CSS classes) * Export the word doc to HTML * make XML from it with htmltidy * filter out those huge amounts of unneeded stuff (CSS-Stuff, DIVs and the like) * go through the list of paragraphs, and for each paragraph type know what to do I've implemented it in Python (using DOM and SAX, now that I know more, I would start with ElementTree from the beginning). Unfortunately, as it was contract work, I cannot give out the code, but if specific questions arise, I will gladly share my experiences. Yours Karsten