From: "Idris Samawi Hamid" <ishamid@colostate.edu>
To: "mailing list for ConTeXt users" <ntg-context@ntg.nl>
Subject: Re: Doc to ConTeXt [was Re: HTML to ConTeXt]
Date: Fri, 09 Nov 2007 20:33:58 -0700 [thread overview]
Message-ID: <op.t1j326pmnx1yh1@your-b27fb1c401> (raw)
In-Reply-To: <9AE14A44-6B23-450B-B3E5-DEE82341AFBC@di.unito.it>
On Fri, 09 Nov 2007 18:30:36 -0700, Andrea Valle <valle@di.unito.it> wrote:
> After wasting my time with an awful pdf to html converter by
> Acrobat, I discovered this, you may all know:
> http://pdftohtml.sourceforge.net/
Looks impressive...
> The html conversion is very very good in resulting rendering and
> also in sources, but after some tweakings I got interested in the xml
> conversion it allows.
> The xml format substantially encodes the infos related to page,
> typically each line is an element. Plus, there are bold and italics
> marked easily as <b> and <i>
> I'm still struggling to understand something really operative of XML
> processing in ConTeXt, so I switched back to Python.
> I used an incremental sax parser with some replacement.
> This is today's draft.
> Original:
> http://www.semiotiche.it/andrea/membrana/02%20imp.pdf
>
> Recomposed (no setup at all, only \enableregime[utf]):
> http://www.semiotiche.it/andrea/membrana/02imp.pdf
Looks VERY impressive... Tell me, how did you set up the cropmarks etc.?
> pdf --> pdftoxml --> xml --> python script --> tex --> pdf
>
> I recovered par, bold, em, footnotes, stripping dashes and
> reassembling the text with footnote references. Not bad as a first step.
Did you also try pdftohtml --> html --> context?
> I guess that you xml gurus could probably do much easier and cleaner.
> So, I mean -just for my very specific needs, I con probably take
> word sources, convert to pdf and then finally reach ConTeXt as
> discussed.
Again, very nice stuff!
Best wishes
Idris
--
Professor Idris Samawi Hamid, Editor-in-Chief
International Journal of Shi`i Studies
Department of Philosophy
Colorado State University
Fort Collins, CO 80523
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
next prev parent reply other threads:[~2007-11-10 3:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-25 14:50 HTML to ConTeXt Aditya Mahajan
2007-10-25 20:17 ` Idris Samawi Hamid
2007-10-26 4:22 ` Aditya Mahajan
2007-10-26 11:37 ` Doc to ConTeXt [was Re: HTML to ConTeXt] Idris Samawi Hamid
2007-11-10 1:30 ` Andrea Valle
2007-11-10 3:14 ` Idris Samawi Hamid
2007-11-10 11:25 ` Andrea Valle
2007-11-10 12:09 ` Andrea Valle
2007-11-10 3:33 ` Idris Samawi Hamid [this message]
2007-11-10 11:59 ` Andrea Valle
2007-11-10 14:07 ` Idris Samawi Hamid
2007-11-10 14:11 ` Andrea Valle
2007-11-10 19:08 ` Hans Hagen
2007-11-10 5:44 ` Saji Njarackalazhikam Hameed
2007-11-10 13:10 ` Andrea Valle
[not found] ` <6faad9f00711100331h547664c6l97d2c3b82c16d8dd@mail.gmail.com>
2007-11-10 12:30 ` Andrea Valle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=op.t1j326pmnx1yh1@your-b27fb1c401 \
--to=ishamid@colostate.edu \
--cc=ntg-context@ntg.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).