From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 25011 invoked from network); 25 Mar 2022 21:15:24 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 25 Mar 2022 21:15:24 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 643bae4e for ; Fri, 25 Mar 2022 16:15:21 -0500 (EST) Received: from mx.stare.cz (uvt.stare.cz [185.63.96.79]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 98d9bb17 for ; Fri, 25 Mar 2022 16:15:16 -0500 (EST) Received: from localhost (stare.cz [local]) by stare.cz (OpenSMTPD) with ESMTPA id 5f49d967 for ; Fri, 25 Mar 2022 22:15:14 +0100 (CET) Date: Fri, 25 Mar 2022 22:15:14 +0100 From: Jan Stary To: discuss@mandoc.bsd.lv Subject: Re: HTML output: section headers with diacritics not in table of contents Message-ID: References: <10474-1648225308.014815@KUMT.SLa5.YYhl> X-Mailinglist: mandoc-discuss Reply-To: discuss@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <10474-1648225308.014815@KUMT.SLa5.YYhl> On Mar 25 10:21:48, anthony@anjbe.name wrote: > Hi Ingo, > > Ingo Schwarze writes: > > Maybe mandoc should treat any \\[uXXXX] sequence as a letter for > > the purposes of tagging? The code needed for that will look rather > > awkward though, and even when implemented perfectly, the tags will > > be UTF-8 rather than ASCII-encoded. Would links like > > > > https://man.archlinux.org/man/diff.1.de#%C3%9CBERSICHT > > > > really be all that useful? What do people think? > > There would be no need for Mandoc to percent-encode UTF-8 here. > In HTML5, a URL fragment (that is, the portion after the '#') may > contain unescaped "URL code points," which are: > > "ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), > U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), > U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A > (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), > U+007E (~), and code points in the range U+00A0 to U+10FFFD, > inclusive, excluding surrogates and noncharacters." Ah, right. I took the liberty of tweaking mandoc's html output of the manpage below to have a #NÄMË instead of the current NÄMË and it works juts fine. http://stare.cz/.tmp/mt.html#NÄMË Thanks for the lolz. Jan .Dd Mar 25, 2022 .Dt MT 666 .Os .Sh NÄMË .Nm Mötley Crüe .Nd hëävy mëtäl ümläüt .Sh SŸNÖPSŸS .Nm .Sh DËSCRÏPTÏÖN .Nm röcks äs fück. -- To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv