From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout-webserver.scc.kit.edu (mailout-webmail.scc.kit.edu [129.13.185.232]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id s7AGDHbg020269 for ; Sun, 10 Aug 2014 12:13:18 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1XGVkC-0005lg-Ui; Sun, 10 Aug 2014 18:13:16 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1XGVkC-0008P1-TY; Sun, 10 Aug 2014 18:13:16 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1XGVkC-0003ed-Rq; Sun, 10 Aug 2014 18:13:16 +0200 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1XGVjR-0000JC-Vq; Sun, 10 Aug 2014 18:12:30 +0200 Date: Sun, 10 Aug 2014 18:12:29 +0200 From: Ingo Schwarze To: Kristaps Dzonsons Cc: discuss@mdocml.bsd.lv Subject: Re: HTML5 Message-ID: <20140810161229.GD325@iris.usta.de> References: <53E6AFDD.8010001@bsd.lv> <20140810022307.GC32716@iris.usta.de> <53E75A07.2070907@bsd.lv> X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53E75A07.2070907@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, Kristaps Dzonsons wrote on Sun, Aug 10, 2014 at 01:39:51PM +0200: > I agree. In short, unifying under HTML5 will simplify the code (no > switching)--that much is clear. I don't care whether it's HTML5 or > asciidoc I care that we stay away from asciidoc. :-) > so long as it gets the job done. And for browsers, it's > between flavours of HTML. So let's consider why HTML5 instead of > just HTML4 or XHTML1 as-is. > > First, note that the patch's HTML5 is called "polyglot" HTML5, which > is to say, HTML5 with XML syntax. > > (Link: .) > > A "pro" is that polyglot HTML5 has the same doctype *and content > type* for its XHTML and HTML modes. So we can create well-formed, > parseable HTML5 mark-up using strict XML syntax, then serve it with > text/html and be happily standards-compliant. As it is, we put a > burden on the agency serving -Thtml or -Txhtml pages to know the > difference. > > The "con" is that by unifying -Thtml/xhtml as HTML5--or anything, > really--we lose strict HTML4 callers of -Ofragment (XHTML1 callers > would be fine). The only caller right now is cgi.c, which > stipulates HTML4. This can be fixed easily: remove the HTML4 parts > of to cgi.c's DOCTYPE and close the void img, meta, and link > elements. See the "pro" above for why that's also a smart idea. Fixing that sounds easy indeed and does not seem to have a downside. As far as i see, the only thing that's really messed up between HTML 4.01 and XHTML (and hence polyglot HTML 5) is void elements. Void elements have to be
in polyglot. That parses as
> in HTML 4.01 if i understand correctly, so strictly speaking, a document that is supposed to be both valid polyglot and valid HTML 4.01 cannot use any void elements. But i think ignoring that detail and just shrugging our shoulders with respect to the extra > that HTML 4.01 parsing would output seems the best we can do, and good enough. It is not likely to become a problem in practice, i think. > Another "pro" is that we get eqn. Ingo, this is the "feature" > you're worried about. Not really. I didn't think about that. What i meant is HTML 5 only syntax creeping in for stuff that can be rendered in HTML 4 because it looks a bit better or even just because it's more modern. There is no sane way to render eqn in HTML 4, it's beyond the scope. So a document using eqn will look crappy with a HTML 4 browser in any case. Whether that is because the document doesn't attempt to render the eqn content at all, or renders it in terminal-style, or emits MathML that the browser cannot handle makes no difference. So just go ahead emitting MathML for eqn, i have no problem with that. > And it's a pretty big issue for me (so this > is a "for me"): all of my equations (in eqn, or really in > DocBook--which I use with docbook2mdoc for some scientific > applications--which I'd like to convert into eqn) are lost. Also > lost are LAPACK manuals, OpenGL, and a host of other eqn systems. > If we stick to HTML4, we'd need to cripple ourselves with > table-based equations. If we stick with XHTML1, we need to jump > through namespace hoops. But with HTML5, we get embedded MathML. Makes sense to me. > The "con", yes, is that MathML is a scary feature, and it doesn't > exist yet. YET. > > At the end of the day, the browser doesn't really care whether it's > HTML4, XHTML1, or HTML5. It'll render regardless. Callers of > -Ofragment will care, but right now that's just us. Ingo, you > mentioned non-conforming browsers. Care to point me to one that > will puke on the HTML5 output from the patch? Even lynx(1) can read > that! I have no specific browser in mind. My remark was more about syntax bloat, that is, using elaborate syntax because we can, as opposed to because it is useful (like for eqn). > If we're really at loggerheads over it, /adding/ HTML5 is as easy as > another switch statement of two. I think it's a good idea > regardless of whether it's added or replacing for the reasons above. If Anthony puts forward a good argument why he needs strict HTML 4.01 - so far, i don't understand why - i guess that's the way to go. Otherwise, just drop 4.01 and XHTML and call the polyglot output close enough. > ...on to other matters: the style-sheet. > > The status quo bugs me because of the header and footer table. > These have hard-coded widths and alignments to make them look decent > without a stylesheet. This is inadvisable in any modern flavour of > HTML. At the very least, we should replace the "width" and "height" > for embedded styles. Unfortunately, that's a problem: inline styles > can't be overriden without the "!important" qualifier in CSS, which > is annoying. (That's why I used the width attributes in the first > place.) So I think that putting just the table styles *before* the > is a good idea. Makes perfect sense to me. > The question goes: if we're > going to do that, is there anything else that should go there? Right now, i don't see anything. If anything crops up later, with a reasoning as good as the above, it can be added later. > The stylesheet I put in place does serve an important purpose: it > prevents overriding styles. In man(7) and mdoc(7), for example, you > can't have overlapping styles. E.g., > > .Bf Sy > Hi > .Ar there > .Ef > > The "there" doesn't have both styles: they reset when they're > nested. (There are probably better ways to do it in CSS, but it > needs to be done one way or another.) Without some sort of > style-sheet, font modes will be nested. Yes, I'm at conflict with > myself: on the one hand, mdoc(7) does this because consoles > generally haven't supported overlapping fonts, and I don't like > console hold-vers in HTML output. Or should we discard that > convention? (groff's -Tps does it too!) Font-changing blocks that can contain elements or other blocks are rare in mdoc(7) do not exist in man(7) in the first place: .Sh .Ss HEAD (discouraged to contain elements) .Dl .Bd -literal .Bf For .Dl and .Bd, we definitely *want* embedded elements to be both literal (fixed-width) and italic or bold, respectively. The .Bf block is almost never useful in the first place; when do you ever need to embolden or italicise a whole block of text? And if you do, embedded markup should probably add up, so the "there" above *should* be bold and italic. For -Tascii, i'll stay bug-compatible with groff. But for -Thtml, lets just do what makes sense. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv