On Thu, 24 Mar 2022 at 18:13, Mario Blättermann < mario.blaettermann@gmail.com> wrote: > Hello, > > recently I'm switched from GNU man-db to mandoc. It's really a big > step ahead, especially regarding the creation of HTML pages, but it > has its own peculiarities … > > For creating a HTML man page I use the following command: > > mandoc -T html -O toc ./manpage.1 > manpage.1.html > > This works so far for English man pages. For man pages in other > languages, I stumbled upon problems with creating toc entries. For > example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is > displayed correctly, but the header is not clickable because it > doesn't have a toc entry. You can see this in the Archlinux online man > pages [1]; as you might know, "Archmanweb" uses Mandoc. > > The German keyboard produces the letter "Ü" as a single character > named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of > splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ‎̈ U+0308 > COMBINING DIAERESIS. If I change it in the Groff source, toc creation > works fine using this splitted one. > > Moreover, in the Vietnamese version of the same man page [2], even > more toc entries are missing. Obviously because multiple section > headers start with "T", followed by diacritics, no toc entry is > created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN > does have one. I can't imagine how Mandoc distinguishes between > acceptable and unacceptable diacritics. > > The described behavior is the same with a pure Mandoc on my local > system and with Archmanweb. However, the developers of Debiman > obviously found a solution [3], maybe unconsciously …? In any case, > their Mandoc is wrapped in a Go-based environment. Besides some extra > features Archmanweb doesn't have (for example, better detection of > cross-references to other man pages if they are not formatted as > such), the toc creation works, even for the Vietnamese version [4]. > Hello, I’m the author of debiman :) The reason why it uses a different TOC implementation is historical: debiman introduced a TOC in 2017, whereas mandoc itself only gained -O toc in 2018. I’m glad to hear that our code is unicode clean in that regard. Good unicode/internationalization was one of the project’s goals, and is easy to accomplish in Go. > > Any idea what is wrong? Well, first I thought the problem is on my > machine, but Archmanweb shows the same behavior. As a workaround, I > could produce a few more toc entries by replacing "Ü" with "Ü" and > similar, but as long as I don't know what rules Mandoc applies > internally, it's almost impossible to fix. To mention, as one of the > maintainers of the manpages-l10n project [5], I have to maintain many > languages, not only my own one … > > I consider the online collections just as important as the local > versions, especially for linking to a specific man page section or > subsection in email or web, and for searching in man pages which are > not installed locally. Any help with solving this problem would be > appreciated. > > [1] https://man.archlinux.org/man/diff.1.de > [2] https://man.archlinux.org/man/diff.1.vi > [3] https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html > [4] https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz > [5] https://salsa.debian.org/manpages-l10n-team/manpages-l10n > > Best Regards, > Mario > -- > To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv > > -- Best regards, Michael