From: "Mario Blättermann" <firstname.lastname@example.org> To: email@example.com Subject: HTML output: section headers with diacritics not in table of contents Date: Thu, 24 Mar 2022 18:13:23 +0100 [thread overview] Message-ID: <CAHi0vA_pfSMOHCd+JuG0efwz=WYUDs2N4=Mgv+BzHcP62T-jAw@mail.gmail.com> (raw) Hello, recently I'm switched from GNU man-db to mandoc. It's really a big step ahead, especially regarding the creation of HTML pages, but it has its own peculiarities … For creating a HTML man page I use the following command: mandoc -T html -O toc ./manpage.1 > manpage.1.html This works so far for English man pages. For man pages in other languages, I stumbled upon problems with creating toc entries. For example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is displayed correctly, but the header is not clickable because it doesn't have a toc entry. You can see this in the Archlinux online man pages ; as you might know, "Archmanweb" uses Mandoc. The German keyboard produces the letter "Ü" as a single character named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ̈ U+0308 COMBINING DIAERESIS. If I change it in the Groff source, toc creation works fine using this splitted one. Moreover, in the Vietnamese version of the same man page , even more toc entries are missing. Obviously because multiple section headers start with "T", followed by diacritics, no toc entry is created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN does have one. I can't imagine how Mandoc distinguishes between acceptable and unacceptable diacritics. The described behavior is the same with a pure Mandoc on my local system and with Archmanweb. However, the developers of Debiman obviously found a solution , maybe unconsciously …? In any case, their Mandoc is wrapped in a Go-based environment. Besides some extra features Archmanweb doesn't have (for example, better detection of cross-references to other man pages if they are not formatted as such), the toc creation works, even for the Vietnamese version . Any idea what is wrong? Well, first I thought the problem is on my machine, but Archmanweb shows the same behavior. As a workaround, I could produce a few more toc entries by replacing "Ü" with "Ü" and similar, but as long as I don't know what rules Mandoc applies internally, it's almost impossible to fix. To mention, as one of the maintainers of the manpages-l10n project , I have to maintain many languages, not only my own one … I consider the online collections just as important as the local versions, especially for linking to a specific man page section or subsection in email or web, and for searching in man pages which are not installed locally. Any help with solving this problem would be appreciated.  https://man.archlinux.org/man/diff.1.de  https://man.archlinux.org/man/diff.1.vi  https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html  https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz  https://salsa.debian.org/manpages-l10n-team/manpages-l10n Best Regards, Mario -- To unsubscribe send an email to firstname.lastname@example.org
next reply other threads:[~2022-03-24 17:13 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-24 17:13 Mario Blättermann [this message] 2022-03-24 17:33 ` Michael Stapelberg 2022-03-24 18:00 ` Mario Blättermann 2022-03-25 12:27 ` Ingo Schwarze 2022-03-25 16:07 ` Mario Blättermann 2022-03-25 20:58 ` Jan Stary 2022-03-26 12:34 ` Ingo Schwarze 2022-03-26 13:35 ` Mario Blättermann 2022-03-25 16:21 ` Anthony J. Bentley 2022-03-25 21:15 ` Jan Stary 2022-03-26 10:33 ` Ingo Schwarze 2022-03-26 17:55 ` Anthony J. Bentley 2022-03-27 11:17 ` Ingo Schwarze 2022-03-27 11:44 ` Ingo Schwarze 2022-03-25 16:57 ` Mario Blättermann 2022-03-25 20:36 ` Jan Stary 2022-03-25 20:59 ` Mario Blättermann 2022-03-25 21:20 ` Jan Stary 2022-03-26 9:25 ` Ingo Schwarze
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAHi0vA_pfSMOHCd+JuG0efwz=WYUDs2N4=Mgv+BzHcP62T-jAw@mail.gmail.com' \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: HTML output: section headers with diacritics not in table of contents' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).