Re: EPUB XHTML Format

From: Hans Hagen <pragma@wxs.nl>
To: ntg-context@ntg.nl
Subject: Re: EPUB XHTML Format
Date: Thu, 05 Sep 2013 15:55:08 +0200	[thread overview]
Message-ID: <52288D3C.3040308@wxs.nl> (raw)
In-Reply-To: <CAANrE7pNxXJuZJrggCTirupUUy7uGWUE4adVw=xEu7pdHxRAxw@mail.gmail.com>

On 9/4/2013 7:55 PM, Thangalin wrote:
> Hi.
>
>     of course we could alternatively export all as <div
>     class="tag-subtag-..."> but i don't like that too much; html itself
>     is not rich enough for our purpose
>
> What about giving developers the ability to change the destination
> element? For example:
>
>     \setuplist[chapter][
>        xml={\starttag[h1]#1\stoptag}
>     ]
>
> Would produce, upon export:
>
>     <h1>Chapter</h1>

export doesn't happen at that level; something like that would add an 
ugly overhead; it's way easier to make some xslt script that converts 
the rather systematic export to something like that and it only has to 
be written once by someone (not me)

> Or (using "export" instead of "xml"; I don't care what it is named):
>
>     \setuplist[chapter][
>
>     export={\starttag[div]\startattribute[class]{chapter}#1\stopattribute\stoptag}}
>     ]
>
> Similarly, this would produce:
>
>     <div class="chapter">Chapter</div>

you use some tex syntax but it all happens in lua; also, the only way to 
provide some kind of different tagging is to support plugins (read: lua 
functions) that could override default behaviour (but again, it's quite 
easy to do that as a postprocessing step)

> This would offer the flexibility of custom XML documents without
> affecting the default behaviour.
>
>            * Generates XHTML headers (including <!DOCTYPE and <html...>)
>
>     not needed as we're 'standalone'
>
> Having the ability to produce the <!DOCTYPE...> and <htmnl> elements
> could be as simple as:
>
>     \setupexport[
>        standalone=no,
>     ]
>
>            * Produces images as img tags, rather than float tags.
>
>     the css can deal with them (info is written to files for that)
>
> Yes, but they aren't standard. There is an ecosystem of tools (e.g.,
> Calibre, normalizing CSS templates, etc.), not to mention a widespread
> knowledge-base, that groks the minimal XHTML specification. Plus, using
> XML tags that are not in the minimal XHTML spec. means more testing on
> more devices to make sure that their XHTML parsers render correctly.

most of the xml we get here is a funny mix of whatever tags and html 
(often for tables) and normaly there is way more structure than in the 
average html document; the export is meant to be close to the source and 
turning it into some html / div mixture makes it messy

for instance, we have more levels than H1..H6, so how to do H7? if 
someone has to deal with that, he/she can as well transform all into H1 
with some class which is a local solution then

>     xhtml has no typical tags .. it's xml + css (or xslt) ...
>     unfortunately browsers have
>
> That is, a Strictly Conforming XHTML Document, as per:
>
> http://www.w3.org/TR/2000/REC-xhtml1-20000126/#docconf
>
>     the export of context is in fact just xml, and by tagging it as
>     xhtml we can apply css to it; but if someone has a workflow for
>     producing epub an option if to postprocess that xml file into
>     whatever epub one wants

indeed. that was the idea: export xml, tag it as xhtml (with the option 
to provide hyperlinks, an exception), provide some standard css as 
starter and then let users deal with matters the way they like; you can 
be pretty sure that what you want is not the same as what someone else 
wants; and if more people want it, they can together write a 
transformation script (or hire someone)

keep in mind that the export itself is already tricky enough and for me 
it doesn't pay off to provide tons of additional functionality (well, it 
doesn't pay of to export anyway)

> I could transform the ConTeXt-generated XML into strictly conforming
> XHTML, but it was a step I was hoping to avoid. Right now my process is:
>
>  1. Convert XML data to a ConTeXt .tex file.
>  2. Convert ConTeXt to either PDF or EPUB.
>  3. Stylize EPUB using CSS.

but writing the transform that suits you is just one step (with yuou 
spending the time on it) while extending the export into a complete 
transformation and configuration thing would put the burden on me -)

> I want to use ConTeXt here (instead of going directly from XML data to
> EPUB) because ConTeXt provides functionality such as multiple indexes,
> table-of-contents, and bundling the .epub. Having an extra step to
> generate strictly conforming XHTML is architecturally painful as it
> means transforming the document three times (XML -> ConTeXt, ConTeXt ->
> XML, then XML -> XHTML).

why is it painful? the export if quite generic and will not change; it 
is also flexible as it honors user defined sectioning and styling

>     Everytime we look into epub there's another issue ... it's not a
>     standard but reversed engineered application mess (happen soften
>     with xml: turn some application data structures into xml and call it
>     a standard)
>
>
> Some book vendors only accept validating EPUBs. ConTeXt is documented as
> being able to generate EPUBs. The documentation should state the EPUBs
> do not validate and do not generate strictly conforming XHTML.

well, i, luigi and some others did tests: the thing is that epub is 
evolving and we had quite some conflicting validations (and specs) and 
we try as good as possible to adapt

so you need to be more precise in "doesn't validate": it's proper xml 
and therefore proper xhtml (and nothing says that there should be html 
tags)

> I have spent the last three weeks converting documents from LaTeX to
> ConTeXt because the documentation stated that ConTeXt can produce EPUBs.
> While true, the documentation did not mention its shortcomings. Had I
> known in advance, I probably would have gone straight to EPUB using Java
> or, with a little revulsion, PHP classes. ;-) That said, I probably
> should have tested this feature sooner. :-)

the export is a reconstruction of the input, and the more structure the 
better; if you really need a multiple out put format, you should use xml 
as source and then use context fo rpdf creation and xslt for html creation

i really see no problem with a transformation from the generic export to 
some epub (whatever variant your whatever device supports) ... really: 
you cannot expect me to provide an extensive configurable export system 
(for only one user) that will never suit all users so ... also, 
configuring it for some document is probably as much work as writing an 
xslt transformation

>     as i have no real use/demand for epub it's not something i look into
>     on a daily basis
>
>
> How can I help resolve these issues?
>
> Merely "testing" (which I am happy to do) isn't going to produce a
> strictly conforming XHTML document.

indeed it isn't producing an html document (with properly matched tags) 
but i'm not convinced that it isn't xhtml

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________