From: Michael Stapelberg <firstname.lastname@example.org> To: email@example.com Subject: Re: HTML output: section headers with diacritics not in table of contents Date: Thu, 24 Mar 2022 18:33:50 +0100 [thread overview] Message-ID: <CANnVG6n9fEfuNapLzi=rPFmjZ36uZwkE6pu+sTaMauAGWVG=Ww@mail.gmail.com> (raw) In-Reply-To: <CAHi0vA_pfSMOHCd+JuG0efwz=WYUDs2N4=Mgv+BzHcP62T-jAw@mail.gmail.com> [-- Attachment #1: Type: text/plain, Size: 3619 bytes --] On Thu, 24 Mar 2022 at 18:13, Mario Blättermann < firstname.lastname@example.org> wrote: > Hello, > > recently I'm switched from GNU man-db to mandoc. It's really a big > step ahead, especially regarding the creation of HTML pages, but it > has its own peculiarities … > > For creating a HTML man page I use the following command: > > mandoc -T html -O toc ./manpage.1 > manpage.1.html > > This works so far for English man pages. For man pages in other > languages, I stumbled upon problems with creating toc entries. For > example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is > displayed correctly, but the header is not clickable because it > doesn't have a toc entry. You can see this in the Archlinux online man > pages ; as you might know, "Archmanweb" uses Mandoc. > > The German keyboard produces the letter "Ü" as a single character > named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of > splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ̈ U+0308 > COMBINING DIAERESIS. If I change it in the Groff source, toc creation > works fine using this splitted one. > > Moreover, in the Vietnamese version of the same man page , even > more toc entries are missing. Obviously because multiple section > headers start with "T", followed by diacritics, no toc entry is > created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN > does have one. I can't imagine how Mandoc distinguishes between > acceptable and unacceptable diacritics. > > The described behavior is the same with a pure Mandoc on my local > system and with Archmanweb. However, the developers of Debiman > obviously found a solution , maybe unconsciously …? In any case, > their Mandoc is wrapped in a Go-based environment. Besides some extra > features Archmanweb doesn't have (for example, better detection of > cross-references to other man pages if they are not formatted as > such), the toc creation works, even for the Vietnamese version . > Hello, I’m the author of debiman :) The reason why it uses a different TOC implementation is historical: debiman introduced a TOC in 2017, whereas mandoc itself only gained -O toc in 2018. I’m glad to hear that our code is unicode clean in that regard. Good unicode/internationalization was one of the project’s goals, and is easy to accomplish in Go. > > Any idea what is wrong? Well, first I thought the problem is on my > machine, but Archmanweb shows the same behavior. As a workaround, I > could produce a few more toc entries by replacing "Ü" with "Ü" and > similar, but as long as I don't know what rules Mandoc applies > internally, it's almost impossible to fix. To mention, as one of the > maintainers of the manpages-l10n project , I have to maintain many > languages, not only my own one … > > I consider the online collections just as important as the local > versions, especially for linking to a specific man page section or > subsection in email or web, and for searching in man pages which are > not installed locally. Any help with solving this problem would be > appreciated. > >  https://man.archlinux.org/man/diff.1.de >  https://man.archlinux.org/man/diff.1.vi >  https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html >  https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz >  https://salsa.debian.org/manpages-l10n-team/manpages-l10n > > Best Regards, > Mario > -- > To unsubscribe send an email to email@example.com > > -- Best regards, Michael [-- Attachment #2: Type: text/html, Size: 5077 bytes --]
next prev parent reply other threads:[~2022-03-24 17:34 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-24 17:13 Mario Blättermann 2022-03-24 17:33 ` Michael Stapelberg [this message] 2022-03-24 18:00 ` Mario Blättermann 2022-03-25 12:27 ` Ingo Schwarze 2022-03-25 16:07 ` Mario Blättermann 2022-03-25 20:58 ` Jan Stary 2022-03-26 12:34 ` Ingo Schwarze 2022-03-26 13:35 ` Mario Blättermann 2022-03-25 16:21 ` Anthony J. Bentley 2022-03-25 21:15 ` Jan Stary 2022-03-26 10:33 ` Ingo Schwarze 2022-03-26 17:55 ` Anthony J. Bentley 2022-03-27 11:17 ` Ingo Schwarze 2022-03-27 11:44 ` Ingo Schwarze 2022-03-25 16:57 ` Mario Blättermann 2022-03-25 20:36 ` Jan Stary 2022-03-25 20:59 ` Mario Blättermann 2022-03-25 21:20 ` Jan Stary 2022-03-26 9:25 ` Ingo Schwarze
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CANnVG6n9fEfuNapLzi=rPFmjZ36uZwkE6pu+sTaMauAGWVG=Ww@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: HTML output: section headers with diacritics not in table of contents' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).