From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout-kit-01-web.scc.kit.edu (scc-mailout-kit-01-web.scc.kit.edu [129.13.231.93]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id 017884a5 for ; Fri, 25 Dec 2015 10:38:04 -0500 (EST) Received: from asta-nat.asta.uni-karlsruhe.de ([172.22.63.82] helo=hekate.usta.de) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (envelope-from ) id 1aCURO-0000qc-Hf; Fri, 25 Dec 2015 16:38:03 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1aCURO-000586-FB; Fri, 25 Dec 2015 16:38:02 +0100 Received: from athene.usta.de ([172.24.96.10]) by donnerwolke.usta.de with esmtp (Exim 4.84) (envelope-from ) id 1aCUBX-0006Wn-LG; Fri, 25 Dec 2015 16:21:39 +0100 Received: from localhost (1031@localhost [local]); by localhost (OpenSMTPD) with ESMTPA id 25e91211; Fri, 25 Dec 2015 16:38:02 +0100 (CET) Date: Fri, 25 Dec 2015 16:38:02 +0100 From: Ingo Schwarze To: "Anthony J. Bentley" Cc: tech@mdocml.bsd.lv Subject: Re: Use literal text for HTML section ids Message-ID: <20151225153802.GI3859@athene.usta.de> References: <1547.1451011025@CATHET.us> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1547.1451011025@CATHET.us> User-Agent: Mutt/1.5.23 (2014-03-12) Hi Anthony, Anthony J. Bentley wrote on Thu, Dec 24, 2015 at 07:37:05PM -0700: > Currently mandoc(1) generates HTML ids by prefixing with 'x', then > printing the ASCII values in hexadecimal. Presumably this was done > to satisfy HTML 4's fairly strict requirements for id values, > [A-Za-z][-A-Za-z0-9_:.]* > > Since we've gone full HTML 5, though, the requirement is much > simpler: an id can contain anything except spaces. So we can print > the contents of the section out directly, and just replace spaces > with underscores. > > This makes section-linked URLs much more sensible: > http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#x546865204c6f6e6720466f726d6174 > versus > http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#The_Long_Format > > It even does the right thing for ids containing UTF-8, whether literal > or percent-encoded in the URL (tested in Firefox and Lynx--they will > correctly follow both types). > > This is not a change to apply lightly, as it breaks any existing links > that include ids. But mandoc currently doesn't expose them (unless you > view source, find the id, and append it to the URL manually), and I > suspect the current unreadable format might make people reluctant to use > them anyway. I think this change is worth it. I like that, and i don't worry about deep links. OK schwarze@, but please use bufncat() rather than bufcat_fmt() when committing. I'll then merge to bsd.lv. Thanks, Ingo > Index: mdoc_html.c > =================================================================== > RCS file: /cvs/mdocml/mdoc_html.c,v > retrieving revision 1.238 > diff -u -p -r1.238 mdoc_html.c > --- mdoc_html.c 12 Oct 2015 00:08:15 -0000 1.238 > +++ mdoc_html.c 24 Dec 2015 21:52:02 -0000 > @@ -542,7 +542,6 @@ mdoc_sh_pre(MDOC_ARGS) > } > > bufinit(h); > - bufcat(h, "x"); > > for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) { > bufcat_id(h, n->string); > @@ -572,7 +571,6 @@ mdoc_ss_pre(MDOC_ARGS) > return 1; > > bufinit(h); > - bufcat(h, "x"); > > for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) { > bufcat_id(h, n->string); > @@ -1063,7 +1061,7 @@ mdoc_sx_pre(MDOC_ARGS) > struct htmlpair tag[2]; > > bufinit(h); > - bufcat(h, "#x"); > + bufcat(h, "#"); > > for (n = n->child; n; ) { > bufcat_id(h, n->string); > Index: html.c > =================================================================== > RCS file: /cvs/mdocml/html.c,v > retrieving revision 1.191 > diff -u -p -r1.191 html.c > --- html.c 13 Oct 2015 22:59:54 -0000 1.191 > +++ html.c 24 Dec 2015 21:52:02 -0000 > @@ -720,8 +720,8 @@ void > bufcat_id(struct html *h, const char *src) > { > > - /* Cf. . */ > + /* Cf. . */ > > - while ('\0' != *src) > - bufcat_fmt(h, "%.2x", *src++); > + for (; '\0' != *src; *src++) > + bufcat_fmt(h, "%c", *src == ' ' ? '_' : *src); > } -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv