ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Hans Hagen <pragma@wxs.nl>
To: ntg-context@ntg.nl
Subject: Re: EPUB XHTML Format
Date: Thu, 05 Sep 2013 15:55:08 +0200	[thread overview]
Message-ID: <52288D3C.3040308@wxs.nl> (raw)
In-Reply-To: <CAANrE7pNxXJuZJrggCTirupUUy7uGWUE4adVw=xEu7pdHxRAxw@mail.gmail.com>

On 9/4/2013 7:55 PM, Thangalin wrote:
> Hi.
>
>     of course we could alternatively export all as <div
>     class="tag-subtag-..."> but i don't like that too much; html itself
>     is not rich enough for our purpose
>
> What about giving developers the ability to change the destination
> element? For example:
>
>     \setuplist[chapter][
>        xml={\starttag[h1]#1\stoptag}
>     ]
>
> Would produce, upon export:
>
>     <h1>Chapter</h1>

export doesn't happen at that level; something like that would add an 
ugly overhead; it's way easier to make some xslt script that converts 
the rather systematic export to something like that and it only has to 
be written once by someone (not me)

> Or (using "export" instead of "xml"; I don't care what it is named):
>
>     \setuplist[chapter][
>
>     export={\starttag[div]\startattribute[class]{chapter}#1\stopattribute\stoptag}}
>     ]
>
> Similarly, this would produce:
>
>     <div class="chapter">Chapter</div>

you use some tex syntax but it all happens in lua; also, the only way to 
provide some kind of different tagging is to support plugins (read: lua 
functions) that could override default behaviour (but again, it's quite 
easy to do that as a postprocessing step)

> This would offer the flexibility of custom XML documents without
> affecting the default behaviour.
>
>            * Generates XHTML headers (including <!DOCTYPE and <html...>)
>
>     not needed as we're 'standalone'
>
> Having the ability to produce the <!DOCTYPE...> and <htmnl> elements
> could be as simple as:
>
>     \setupexport[
>        standalone=no,
>     ]
>
>            * Produces images as img tags, rather than float tags.
>
>     the css can deal with them (info is written to files for that)
>
> Yes, but they aren't standard. There is an ecosystem of tools (e.g.,
> Calibre, normalizing CSS templates, etc.), not to mention a widespread
> knowledge-base, that groks the minimal XHTML specification. Plus, using
> XML tags that are not in the minimal XHTML spec. means more testing on
> more devices to make sure that their XHTML parsers render correctly.

most of the xml we get here is a funny mix of whatever tags and html 
(often for tables) and normaly there is way more structure than in the 
average html document; the export is meant to be close to the source and 
turning it into some html / div mixture makes it messy

for instance, we have more levels than H1..H6, so how to do H7? if 
someone has to deal with that, he/she can as well transform all into H1 
with some class which is a local solution then

>     xhtml has no typical tags .. it's xml + css (or xslt) ...
>     unfortunately browsers have
>
> That is, a Strictly Conforming XHTML Document, as per:
>
> http://www.w3.org/TR/2000/REC-xhtml1-20000126/#docconf
>
>     the export of context is in fact just xml, and by tagging it as
>     xhtml we can apply css to it; but if someone has a workflow for
>     producing epub an option if to postprocess that xml file into
>     whatever epub one wants

indeed. that was the idea: export xml, tag it as xhtml (with the option 
to provide hyperlinks, an exception), provide some standard css as 
starter and then let users deal with matters the way they like; you can 
be pretty sure that what you want is not the same as what someone else 
wants; and if more people want it, they can together write a 
transformation script (or hire someone)

keep in mind that the export itself is already tricky enough and for me 
it doesn't pay off to provide tons of additional functionality (well, it 
doesn't pay of to export anyway)

> I could transform the ConTeXt-generated XML into strictly conforming
> XHTML, but it was a step I was hoping to avoid. Right now my process is:
>
>  1. Convert XML data to a ConTeXt .tex file.
>  2. Convert ConTeXt to either PDF or EPUB.
>  3. Stylize EPUB using CSS.

but writing the transform that suits you is just one step (with yuou 
spending the time on it) while extending the export into a complete 
transformation and configuration thing would put the burden on me -)

> I want to use ConTeXt here (instead of going directly from XML data to
> EPUB) because ConTeXt provides functionality such as multiple indexes,
> table-of-contents, and bundling the .epub. Having an extra step to
> generate strictly conforming XHTML is architecturally painful as it
> means transforming the document three times (XML -> ConTeXt, ConTeXt ->
> XML, then XML -> XHTML).

why is it painful? the export if quite generic and will not change; it 
is also flexible as it honors user defined sectioning and styling

>     Everytime we look into epub there's another issue ... it's not a
>     standard but reversed engineered application mess (happen soften
>     with xml: turn some application data structures into xml and call it
>     a standard)
>
>
> Some book vendors only accept validating EPUBs. ConTeXt is documented as
> being able to generate EPUBs. The documentation should state the EPUBs
> do not validate and do not generate strictly conforming XHTML.

well, i, luigi and some others did tests: the thing is that epub is 
evolving and we had quite some conflicting validations (and specs) and 
we try as good as possible to adapt

so you need to be more precise in "doesn't validate": it's proper xml 
and therefore proper xhtml (and nothing says that there should be html 
tags)

> I have spent the last three weeks converting documents from LaTeX to
> ConTeXt because the documentation stated that ConTeXt can produce EPUBs.
> While true, the documentation did not mention its shortcomings. Had I
> known in advance, I probably would have gone straight to EPUB using Java
> or, with a little revulsion, PHP classes. ;-) That said, I probably
> should have tested this feature sooner. :-)

the export is a reconstruction of the input, and the more structure the 
better; if you really need a multiple out put format, you should use xml 
as source and then use context fo rpdf creation and xslt for html creation

i really see no problem with a transformation from the generic export to 
some epub (whatever variant your whatever device supports) ... really: 
you cannot expect me to provide an extensive configurable export system 
(for only one user) that will never suit all users so ... also, 
configuring it for some document is probably as much work as writing an 
xslt transformation

>     as i have no real use/demand for epub it's not something i look into
>     on a daily basis
>
>
> How can I help resolve these issues?
>
> Merely "testing" (which I am happy to do) isn't going to produce a
> strictly conforming XHTML document.

indeed it isn't producing an html document (with properly matched tags) 
but i'm not convinced that it isn't xhtml

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


  reply	other threads:[~2013-09-05 13:55 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-04  1:19 Thangalin
2013-09-04  9:20 ` Hans Hagen
2013-09-04 17:55   ` Thangalin
2013-09-05 13:55     ` Hans Hagen [this message]
2013-09-12 14:32       ` Alan BRASLAU
2013-09-05 16:38   ` Hans Hagen
2013-09-05 16:57     ` Thangalin
2013-09-05 17:57       ` Khaled Hosny
2013-09-05 18:22         ` Hans Hagen
2013-09-05 17:22     ` Aditya Mahajan
2013-09-05 18:21       ` Hans Hagen
2013-09-05 18:11 ` honyk
     [not found] ` <00b501ceaa63$61805e50$24811af0$@tosovsky@email.cz>
2013-09-05 18:20   ` Aditya Mahajan
2013-09-05 18:24     ` Hans Hagen
2013-09-05 19:54       ` Mica Semrick
2013-09-05 21:15       ` Michael Hallgren
2013-09-05 22:00     ` Thangalin
2013-09-06 16:09       ` Hans Hagen
2013-09-06 16:36       ` Mica Semrick
2013-09-06 20:20         ` Thangalin
2013-09-06 21:22           ` Thangalin
2013-09-06 21:27             ` Aditya Mahajan
2013-09-07 12:07           ` Hans Hagen
2013-09-07 18:31             ` Thangalin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52288D3C.3040308@wxs.nl \
    --to=pragma@wxs.nl \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).