Use literal text for HTML section ids

tech@mandoc.bsd.lv
 help / color / mirror / Atom feed

* Use literal text for HTML section ids
@ 2015-12-25  2:37 Anthony J. Bentley
  2015-12-25 15:38 ` Ingo Schwarze
  0 siblings, 1 reply; 2+ messages in thread
From: Anthony J. Bentley @ 2015-12-25  2:37 UTC (permalink / raw)
  To: tech

Hi,

Currently mandoc(1) generates HTML ids by prefixing with 'x', then
printing the ASCII values in hexadecimal. Presumably this was done
to satisfy HTML 4's fairly strict requirements for id values,
[A-Za-z][-A-Za-z0-9_:.]*

Since we've gone full HTML 5, though, the requirement is much
simpler: an id can contain anything except spaces. So we can print
the contents of the section out directly, and just replace spaces
with underscores.

This makes section-linked URLs much more sensible:
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#x546865204c6f6e6720466f726d6174
versus
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#The_Long_Format

It even does the right thing for ids containing UTF-8, whether literal
or percent-encoded in the URL (tested in Firefox and Lynx--they will
correctly follow both types).

This is not a change to apply lightly, as it breaks any existing links
that include ids. But mandoc currently doesn't expose them (unless you
view source, find the id, and append it to the URL manually), and I
suspect the current unreadable format might make people reluctant to use
them anyway. I think this change is worth it.

Index: mdoc_html.c
===================================================================
RCS file: /cvs/mdocml/mdoc_html.c,v
retrieving revision 1.238
diff -u -p -r1.238 mdoc_html.c
--- mdoc_html.c	12 Oct 2015 00:08:15 -0000	1.238
+++ mdoc_html.c	24 Dec 2015 21:52:02 -0000
@@ -542,7 +542,6 @@ mdoc_sh_pre(MDOC_ARGS)
 	}

 	bufinit(h);
-	bufcat(h, "x");

 	for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) {
 		bufcat_id(h, n->string);
@@ -572,7 +571,6 @@ mdoc_ss_pre(MDOC_ARGS)
 		return 1;

 	bufinit(h);
-	bufcat(h, "x");

 	for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) {
 		bufcat_id(h, n->string);
@@ -1063,7 +1061,7 @@ mdoc_sx_pre(MDOC_ARGS)
 	struct htmlpair	 tag[2];

 	bufinit(h);
-	bufcat(h, "#x");
+	bufcat(h, "#");

 	for (n = n->child; n; ) {
 		bufcat_id(h, n->string);
Index: html.c
===================================================================
RCS file: /cvs/mdocml/html.c,v
retrieving revision 1.191
diff -u -p -r1.191 html.c
--- html.c	13 Oct 2015 22:59:54 -0000	1.191
+++ html.c	24 Dec 2015 21:52:02 -0000
@@ -720,8 +720,8 @@ void
 bufcat_id(struct html *h, const char *src)
 {

-	/* Cf. <http://www.w3.org/TR/html4/types.html#h-6.2>. */
+	/* Cf. <http://www.w3.org/TR/html5/dom.html#the-id-attribute>. */

-	while ('\0' != *src)
-		bufcat_fmt(h, "%.2x", *src++);
+	for (; '\0' != *src; *src++)
+		bufcat_fmt(h, "%c", *src == ' ' ? '_' : *src);
 }
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Use literal text for HTML section ids
  2015-12-25  2:37 Use literal text for HTML section ids Anthony J. Bentley
@ 2015-12-25 15:38 ` Ingo Schwarze
  0 siblings, 0 replies; 2+ messages in thread
From: Ingo Schwarze @ 2015-12-25 15:38 UTC (permalink / raw)
  To: Anthony J. Bentley; +Cc: tech

Hi Anthony,

Anthony J. Bentley wrote on Thu, Dec 24, 2015 at 07:37:05PM -0700:

> Currently mandoc(1) generates HTML ids by prefixing with 'x', then
> printing the ASCII values in hexadecimal. Presumably this was done
> to satisfy HTML 4's fairly strict requirements for id values,
> [A-Za-z][-A-Za-z0-9_:.]*
> 
> Since we've gone full HTML 5, though, the requirement is much
> simpler: an id can contain anything except spaces. So we can print
> the contents of the section out directly, and just replace spaces
> with underscores.
> 
> This makes section-linked URLs much more sensible:
> http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#x546865204c6f6e6720466f726d6174
> versus
> http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/ls.1#The_Long_Format
> 
> It even does the right thing for ids containing UTF-8, whether literal
> or percent-encoded in the URL (tested in Firefox and Lynx--they will
> correctly follow both types).
> 
> This is not a change to apply lightly, as it breaks any existing links
> that include ids. But mandoc currently doesn't expose them (unless you
> view source, find the id, and append it to the URL manually), and I
> suspect the current unreadable format might make people reluctant to use
> them anyway. I think this change is worth it.

I like that, and i don't worry about deep links.

OK schwarze@, but please use bufncat() rather than bufcat_fmt()
when committing.  I'll then merge to bsd.lv.

Thanks,
  Ingo


> Index: mdoc_html.c
> ===================================================================
> RCS file: /cvs/mdocml/mdoc_html.c,v
> retrieving revision 1.238
> diff -u -p -r1.238 mdoc_html.c
> --- mdoc_html.c	12 Oct 2015 00:08:15 -0000	1.238
> +++ mdoc_html.c	24 Dec 2015 21:52:02 -0000
> @@ -542,7 +542,6 @@ mdoc_sh_pre(MDOC_ARGS)
>  	}
>  
>  	bufinit(h);
> -	bufcat(h, "x");
>  
>  	for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) {
>  		bufcat_id(h, n->string);
> @@ -572,7 +571,6 @@ mdoc_ss_pre(MDOC_ARGS)
>  		return 1;
>  
>  	bufinit(h);
> -	bufcat(h, "x");
>  
>  	for (n = n->child; n != NULL && n->type == ROFFT_TEXT; ) {
>  		bufcat_id(h, n->string);
> @@ -1063,7 +1061,7 @@ mdoc_sx_pre(MDOC_ARGS)
>  	struct htmlpair	 tag[2];
>  
>  	bufinit(h);
> -	bufcat(h, "#x");
> +	bufcat(h, "#");
>  
>  	for (n = n->child; n; ) {
>  		bufcat_id(h, n->string);
> Index: html.c
> ===================================================================
> RCS file: /cvs/mdocml/html.c,v
> retrieving revision 1.191
> diff -u -p -r1.191 html.c
> --- html.c	13 Oct 2015 22:59:54 -0000	1.191
> +++ html.c	24 Dec 2015 21:52:02 -0000
> @@ -720,8 +720,8 @@ void
>  bufcat_id(struct html *h, const char *src)
>  {
>  
> -	/* Cf. <http://www.w3.org/TR/html4/types.html#h-6.2>. */
> +	/* Cf. <http://www.w3.org/TR/html5/dom.html#the-id-attribute>. */
>  
> -	while ('\0' != *src)
> -		bufcat_fmt(h, "%.2x", *src++);
> +	for (; '\0' != *src; *src++)
> +		bufcat_fmt(h, "%c", *src == ' ' ? '_' : *src);
>  }
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-12-25 15:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-25  2:37 Use literal text for HTML section ids Anthony J. Bentley
2015-12-25 15:38 ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).