From: "Mario Blättermann" <email@example.com> To: firstname.lastname@example.org Subject: Re: HTML output: section headers with diacritics not in table of contents Date: Thu, 24 Mar 2022 19:00:30 +0100 [thread overview] Message-ID: <CAHi0vA_zFTPNjo61BFYnXVoiXWnsVGfhnVFk8KgXu7A_XXpnrg@mail.gmail.com> (raw) In-Reply-To: <CANnVG6n9fEfuNapLzi=rPFmjZ36uZwkE6pu+sTaMauAGWVG=Ww@mail.gmail.com> Hello Michael, thanks for your quick answer. Am Do., 24. März 2022 um 18:34 Uhr schrieb Michael Stapelberg <email@example.com>: > > > > On Thu, 24 Mar 2022 at 18:13, Mario Blättermann <firstname.lastname@example.org> wrote: >> >> Hello, >> >> recently I'm switched from GNU man-db to mandoc. It's really a big >> step ahead, especially regarding the creation of HTML pages, but it >> has its own peculiarities … >> >> For creating a HTML man page I use the following command: >> >> mandoc -T html -O toc ./manpage.1 > manpage.1.html >> >> This works so far for English man pages. For man pages in other >> languages, I stumbled upon problems with creating toc entries. For >> example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is >> displayed correctly, but the header is not clickable because it >> doesn't have a toc entry. You can see this in the Archlinux online man >> pages ; as you might know, "Archmanweb" uses Mandoc. >> >> The German keyboard produces the letter "Ü" as a single character >> named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of >> splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ̈ U+0308 >> COMBINING DIAERESIS. If I change it in the Groff source, toc creation >> works fine using this splitted one. >> >> Moreover, in the Vietnamese version of the same man page , even >> more toc entries are missing. Obviously because multiple section >> headers start with "T", followed by diacritics, no toc entry is >> created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN >> does have one. I can't imagine how Mandoc distinguishes between >> acceptable and unacceptable diacritics. >> >> The described behavior is the same with a pure Mandoc on my local >> system and with Archmanweb. However, the developers of Debiman >> obviously found a solution , maybe unconsciously …? In any case, >> their Mandoc is wrapped in a Go-based environment. Besides some extra >> features Archmanweb doesn't have (for example, better detection of >> cross-references to other man pages if they are not formatted as >> such), the toc creation works, even for the Vietnamese version . > > > Hello, I’m the author of debiman :) > The reason why it uses a different TOC implementation is historical: > debiman introduced a TOC in 2017, whereas mandoc itself only gained -O toc in 2018. > OK, Debiman uses its own TOC implementation, so it needs either to be fixed in Mandoc itself, what would be the preferred solution, or reimplemented in Python for Archmanweb. But the latter wouldn't solve the problem for local users. BTW, there are some more online man page collections using Mandoc, for OpenBSD, NetBSD, FreeBSD. But neither of the BSDs seem to have translated man pages, so I can't test the behavior. > I’m glad to hear that our code is unicode clean in that regard. > Good unicode/internationalization was one of the project’s goals, > and is easy to accomplish in Go. > Yes, of course. But I don't have any programming skills, so I hope that a Mandoc developer can fix it. Best Regards, Mario >> >> >> Any idea what is wrong? Well, first I thought the problem is on my >> machine, but Archmanweb shows the same behavior. As a workaround, I >> could produce a few more toc entries by replacing "Ü" with "Ü" and >> similar, but as long as I don't know what rules Mandoc applies >> internally, it's almost impossible to fix. To mention, as one of the >> maintainers of the manpages-l10n project , I have to maintain many >> languages, not only my own one … >> >> I consider the online collections just as important as the local >> versions, especially for linking to a specific man page section or >> subsection in email or web, and for searching in man pages which are >> not installed locally. Any help with solving this problem would be >> appreciated. >> >>  https://man.archlinux.org/man/diff.1.de >>  https://man.archlinux.org/man/diff.1.vi >>  https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html >>  https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz >>  https://salsa.debian.org/manpages-l10n-team/manpages-l10n >> >> Best Regards, >> Mario >> -- >> To unsubscribe send an email to email@example.com >> > > > -- > Best regards, > Michael -- To unsubscribe send an email to firstname.lastname@example.org
next prev parent reply other threads:[~2022-03-24 18:00 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-24 17:13 Mario Blättermann 2022-03-24 17:33 ` Michael Stapelberg 2022-03-24 18:00 ` Mario Blättermann [this message] 2022-03-25 12:27 ` Ingo Schwarze 2022-03-25 16:07 ` Mario Blättermann 2022-03-25 20:58 ` Jan Stary 2022-03-26 12:34 ` Ingo Schwarze 2022-03-26 13:35 ` Mario Blättermann 2022-03-25 16:21 ` Anthony J. Bentley 2022-03-25 21:15 ` Jan Stary 2022-03-26 10:33 ` Ingo Schwarze 2022-03-26 17:55 ` Anthony J. Bentley 2022-03-27 11:17 ` Ingo Schwarze 2022-03-27 11:44 ` Ingo Schwarze 2022-03-25 16:57 ` Mario Blättermann 2022-03-25 20:36 ` Jan Stary 2022-03-25 20:59 ` Mario Blättermann 2022-03-25 21:20 ` Jan Stary 2022-03-26 9:25 ` Ingo Schwarze
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAHi0vA_zFTPNjo61BFYnXVoiXWnsVGfhnVFk8KgXu7A_XXpnrg@mail.gmail.com \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: HTML output: section headers with diacritics not in table of contents' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).