tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: "Anthony J. Bentley" <anthony@anjbe.name>
To: tech@mdocml.bsd.lv
Subject: Use literal text for HTML section ids
Date: Thu, 24 Dec 2015 19:37:05 -0700	[thread overview]
Message-ID: <1547.1451011025@CATHET.us> (raw)

Hi,

Currently mandoc(1) generates HTML ids by prefixing with 'x', then
printing the ASCII values in hexadecimal. Presumably this was done
to satisfy HTML 4's fairly strict requirements for id values,
[A-Za-z][-A-Za-z0-9_:.]*

Since we've gone full HTML 5, though, the requirement is much
simpler: an id can contain anything except spaces. So we can print
the contents of the section out directly, and just replace spaces
with underscores.

This makes section-linked URLs much more sensible:
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#x546865204c6f6e6720466f726d6174
versus
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#The_Long_Format

It even does the right thing for ids containing UTF-8, whether literal
or percent-encoded in the URL (tested in Firefox and Lynx--they will
correctly follow both types).

This is not a change to apply lightly, as it breaks any existing links
that include ids. But mandoc currently doesn't expose them (unless you
view source, find the id, and append it to the URL manually), and I
suspect the current unreadable format might make people reluctant to use
them anyway. I think this change is worth it.


Index: mdoc_html.c
===================================================================
RCS file: /cvs/mdocml/mdoc_html.c,v
retrieving revision 1.238
diff -u -p -r1.238 mdoc_html.c
--- mdoc_html.c	12 Oct 2015 00:08:15 -0000	1.238
+++ mdoc_html.c	24 Dec 2015 21:52:02 -0000
@@ -542,7 +542,6 @@ mdoc_sh_pre(MDOC_ARGS)
 	}
 
 	bufinit(h);
-	bufcat(h, "x");
 
 	for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) {
 		bufcat_id(h, n->string);
@@ -572,7 +571,6 @@ mdoc_ss_pre(MDOC_ARGS)
 		return 1;
 
 	bufinit(h);
-	bufcat(h, "x");
 
 	for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) {
 		bufcat_id(h, n->string);
@@ -1063,7 +1061,7 @@ mdoc_sx_pre(MDOC_ARGS)
 	struct htmlpair	 tag[2];
 
 	bufinit(h);
-	bufcat(h, "#x");
+	bufcat(h, "#");
 
 	for (n = n->child; n; ) {
 		bufcat_id(h, n->string);
Index: html.c
===================================================================
RCS file: /cvs/mdocml/html.c,v
retrieving revision 1.191
diff -u -p -r1.191 html.c
--- html.c	13 Oct 2015 22:59:54 -0000	1.191
+++ html.c	24 Dec 2015 21:52:02 -0000
@@ -720,8 +720,8 @@ void
 bufcat_id(struct html *h, const char *src)
 {
 
-	/* Cf. <http://www.w3.org/TR/html4/types.html#h-6.2>. */
+	/* Cf. <http://www.w3.org/TR/html5/dom.html#the-id-attribute>. */
 
-	while ('\0' != *src)
-		bufcat_fmt(h, "%.2x", *src++);
+	for (; '\0' != *src; *src++)
+		bufcat_fmt(h, "%c", *src == ' ' ? '_' : *src);
 }
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

             reply	other threads:[~2015-12-25  2:37 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-25  2:37 Anthony J. Bentley [this message]
2015-12-25 15:38 ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1547.1451011025@CATHET.us \
    --to=anthony@anjbe.name \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).