From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/37308 Path: news.gmane.org!not-for-mail From: "Idris Samawi Hamid" Newsgroups: gmane.comp.tex.context Subject: Re: Doc to ConTeXt [was Re: HTML to ConTeXt] Date: Fri, 09 Nov 2007 20:33:58 -0700 Organization: Colorado State University Message-ID: References: <9AE14A44-6B23-450B-B3E5-DEE82341AFBC@di.unito.it> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1194665688 28752 80.91.229.12 (10 Nov 2007 03:34:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 10 Nov 2007 03:34:48 +0000 (UTC) To: "mailing list for ConTeXt users" Original-X-From: ntg-context-bounces@ntg.nl Sat Nov 10 04:34:50 2007 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1Iqh7S-00087h-KH for gctc-ntg-context-518@m.gmane.org; Sat, 10 Nov 2007 04:34:50 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id B55CD1FB27; Sat, 10 Nov 2007 04:34:38 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 10805-06-5; Sat, 10 Nov 2007 04:34:05 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 7F6C31FAD5; Sat, 10 Nov 2007 04:34:05 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 7F1FA1FAD5 for ; Sat, 10 Nov 2007 04:34:02 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 10805-06-4 for ; Sat, 10 Nov 2007 04:33:18 +0100 (CET) Original-Received: from trueband.net (director.trueband.net [216.163.120.8]) by ronja.ntg.nl (Postfix) with SMTP id D6BA61FA39 for ; Sat, 10 Nov 2007 04:33:16 +0100 (CET) Original-Received: (qmail 9666 invoked by uid 1006); 10 Nov 2007 03:33:12 -0000 Original-Received: from ishamid@colostate.edu by rs0 by uid 1003 with qmail-scanner-1.16 (spamassassin: 3.1.4. Clear:SA:0(2.3/100.0):. Processed in 3.173826 secs); 10 Nov 2007 03:33:12 -0000 Original-Received: from unknown (HELO trueband.net) (172.16.0.19) by -v with SMTP; 10 Nov 2007 03:33:08 -0000 Original-Received: (qmail 1728 invoked from network); 10 Nov 2007 03:33:06 -0000 Original-Received: from unknown (HELO your-b27fb1c401) (75.104.82.83) by -v with SMTP; 10 Nov 2007 03:33:06 -0000 User-Agent: Opera Mail/9.20 (Win32) In-Reply-To: <9AE14A44-6B23-450B-B3E5-DEE82341AFBC@di.unito.it> X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:37308 Archived-At: On Fri, 09 Nov 2007 18:30:36 -0700, Andrea Valle wrote: > After wasting my time with an awful pdf to html converter by > Acrobat, I discovered this, you may all know: > http://pdftohtml.sourceforge.net/ Looks impressive... > The html conversion is very very good in resulting rendering and > also in sources, but after some tweakings I got interested in the xml > conversion it allows. > The xml format substantially encodes the infos related to page, > typically each line is an element. Plus, there are bold and italics > marked easily as and > I'm still struggling to understand something really operative of XML > processing in ConTeXt, so I switched back to Python. > I used an incremental sax parser with some replacement. > This is today's draft. > Original: > http://www.semiotiche.it/andrea/membrana/02%20imp.pdf > > Recomposed (no setup at all, only \enableregime[utf]): > http://www.semiotiche.it/andrea/membrana/02imp.pdf Looks VERY impressive... Tell me, how did you set up the cropmarks etc.? > pdf --> pdftoxml --> xml --> python script --> tex --> pdf > > I recovered par, bold, em, footnotes, stripping dashes and > reassembling the text with footnote references. Not bad as a first step. Did you also try pdftohtml --> html --> context? > I guess that you xml gurus could probably do much easier and cleaner. > So, I mean -just for my very specific needs, I con probably take > word sources, convert to pdf and then finally reach ConTeXt as > discussed. Again, very nice stuff! Best wishes Idris -- Professor Idris Samawi Hamid, Editor-in-Chief International Journal of Shi`i Studies Department of Philosophy Colorado State University Fort Collins, CO 80523 -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________