public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* best strategy to convert several HTML files
@ 2014-06-20 23:12 Paulo Ney de Souza
  0 siblings, 0 replies; only message in thread
From: Paulo Ney de Souza @ 2014-06-20 23:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1879 bytes --]

My previous question is closely related to the (probably deeper) question
of how to best convert several HTML files that link to one another - which
arises very often when dealing with CHM (Compiled HTML) and HTMLZ formats.

The strategy I use to go from a set of multiply-interlinked-HTML files
right now is to use Calibre to convert to HTMLZ, fix some link by hand or
script, and then use Pandoc to convert to LaTeX - where one is more free to
generate TOC's, Indexes, etc ...

    Mult-InterLinked-HTML -------> HTMLZ ------> LaTeX

This is about to become a mute point because the ePub reader will have to
address the issue in some form...so converting to ePub is probably going to
be a better path.... but ....I can only imagine how hard it will be to hunt
one wrongly encoded character in an ePub package. It would be nice if
Pandoc could do just like Calibre, once you see a wrongly encoded
character, continue, output to std.error.... and then inform the file (in
the package) it occurred, line and position.

It would also be really nice if the ePub reader could deal with CHM and
HTMLZ, since the variation in between the formats is very small.

Has anyone done any of this types of conversions before? What is your best
strategy to translate a set of linked HTML files?

Paulo Ney

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMahcRLz9nzYjA6wzNOc1Ee-i5S_O%3De6mkOqKuXm1OH2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 2519 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2014-06-20 23:12 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-20 23:12 best strategy to convert several HTML files Paulo Ney de Souza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).