discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: anthony@anjbe.name
Cc: discuss@mandoc.bsd.lv
Subject: Re: HTML output: section headers with diacritics not in table of contents
Date: Sat, 26 Mar 2022 11:33:28 +0100	[thread overview]
Message-ID: <Yj7r+NR9idXjijR7@asta-kit.de> (raw)
In-Reply-To: <10474-1648225308.014815@KUMT.SLa5.YYhl>

Hi Anthony,

Anthony J. Bentley wrote on Fri, Mar 25, 2022 at 10:21:48AM -0600:
> Ingo Schwarze writes:

>> Maybe mandoc should treat any \\[uXXXX] sequence as a letter for
>> the purposes of tagging?  The code needed for that will look rather
>> awkward though, and even when implemented perfectly, the tags will
>> be UTF-8 rather than ASCII-encoded.  Would links like
>>
>>   https://man.archlinux.org/man/diff.1.de#%C3%9CBERSICHT
>>
>> really be all that useful?  What do people think?

> There would be no need for Mandoc to percent-encode UTF-8 here.
> In HTML5, a URL fragment (that is, the portion after the '#') may
> contain unescaped "URL code points," which are:
> 
>    "ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('),
>     U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*),
>     U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A
>     (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_),
>     U+007E (~), and code points in the range U+00A0 to U+10FFFD,
>     inclusive, excluding surrogates and noncharacters."

Thanks, that sounds like a useful hint.

Excluding surrogates is easy, and

  http://www.unicode.org/faq/private_use.html#noncharacters

tells me what "noncharacters" are.  Since those 66 codepoints are
stable, it is feasible to exclude them, too, without needing any
Unicode library.

All the same, before starting work on an implementation, i would
also appreciate your opinion, Anthony (and possibly of similarly
prolific users and maintainers in other operating systems) whether
such functionality seems desirable to you, because i feel like
sitting on the fence myself.

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


  parent reply	other threads:[~2022-03-26 10:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 17:13 Mario Blättermann
2022-03-24 17:33 ` Michael Stapelberg
2022-03-24 18:00   ` Mario Blättermann
2022-03-25 12:27 ` Ingo Schwarze
2022-03-25 16:07   ` Mario Blättermann
2022-03-25 20:58     ` Jan Stary
2022-03-26 12:34     ` Ingo Schwarze
2022-03-26 13:35       ` Mario Blättermann
2022-03-25 16:21   ` Anthony J. Bentley
2022-03-25 21:15     ` Jan Stary
2022-03-26 10:33     ` Ingo Schwarze [this message]
2022-03-26 17:55       ` Anthony J. Bentley
2022-03-27 11:17         ` Ingo Schwarze
2022-03-27 11:44           ` Ingo Schwarze
2022-03-25 16:57   ` Mario Blättermann
2022-03-25 20:36     ` Jan Stary
2022-03-25 20:59       ` Mario Blättermann
2022-03-25 21:20         ` Jan Stary
2022-03-26  9:25           ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yj7r+NR9idXjijR7@asta-kit.de \
    --to=schwarze@usta.de \
    --cc=anthony@anjbe.name \
    --cc=discuss@mandoc.bsd.lv \
    --subject='Re: HTML output: section headers with diacritics not in table of contents' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).