help / color / mirror / Atom feed
From: Michael Stapelberg <stapelberg@debian.org>
To: discuss@mandoc.bsd.lv
Subject: Re: HTML output: section headers with diacritics not in table of contents
Date: Thu, 24 Mar 2022 18:33:50 +0100	[thread overview]
Message-ID: <CANnVG6n9fEfuNapLzi=rPFmjZ36uZwkE6pu+sTaMauAGWVG=Ww@mail.gmail.com> (raw)
In-Reply-To: <CAHi0vA_pfSMOHCd+JuG0efwz=WYUDs2N4=Mgv+BzHcP62T-jAw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3619 bytes --]

On Thu, 24 Mar 2022 at 18:13, Mario Blättermann <
mario.blaettermann@gmail.com> wrote:

> Hello,
> recently I'm switched from GNU man-db to mandoc. It's really a big
> step ahead, especially regarding the creation of HTML pages, but it
> has its own peculiarities …
> For creating a HTML man page I use the following command:
> mandoc -T html -O toc ./manpage.1 > manpage.1.html
> This works so far for English man pages. For man pages in other
> languages, I stumbled upon problems with creating toc entries. For
> example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is
> displayed correctly, but the header is not clickable because it
> doesn't have a toc entry. You can see this in the Archlinux online man
> pages [1]; as you might know, "Archmanweb" uses Mandoc.
> The German keyboard produces the letter "Ü" as a single character
> named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of
> splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ‎̈ U+0308
> COMBINING DIAERESIS. If I change it in the Groff source, toc creation
> works fine using this splitted one.
> Moreover, in the Vietnamese version of the same man page [2], even
> more toc entries are missing. Obviously because multiple section
> headers start with "T", followed by diacritics, no toc entry is
> created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN
> does have one. I can't imagine how Mandoc distinguishes between
> acceptable and unacceptable diacritics.
> The described behavior is the same with a pure Mandoc on my local
> system and with Archmanweb. However, the developers of Debiman
> obviously found a solution [3], maybe unconsciously …? In any case,
> their Mandoc is wrapped in a Go-based environment. Besides some extra
> features Archmanweb doesn't have (for example, better detection of
> cross-references to other man pages if they are not formatted as
> such), the toc creation works, even for the Vietnamese version [4].

Hello, I’m the author of debiman :)
The reason why it uses a different TOC implementation is historical:
debiman introduced a TOC in 2017, whereas mandoc itself only gained -O toc
in 2018.

I’m glad to hear that our code is unicode clean in that regard.
Good unicode/internationalization was one of the project’s goals,
and is easy to accomplish in Go.

> Any idea what is wrong? Well, first I thought the problem is on my
> machine, but Archmanweb shows the same behavior. As a workaround, I
> could produce a few more toc entries by replacing "Ü" with "Ü" and
> similar, but as long as I don't know what rules Mandoc applies
> internally, it's almost impossible to fix. To mention, as one of the
> maintainers of the manpages-l10n project [5], I have to maintain many
> languages, not only my own one …
> I consider the online collections just as important as the local
> versions, especially for linking to a specific man page section or
> subsection in email or web, and for searching in man pages which are
> not installed locally. Any help with solving this problem would be
> appreciated.
> [1] https://man.archlinux.org/man/diff.1.de
> [2] https://man.archlinux.org/man/diff.1.vi
> [3] https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html
> [4] https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz
> [5] https://salsa.debian.org/manpages-l10n-team/manpages-l10n
> Best Regards,
> Mario
> --
>  To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv

Best regards,

[-- Attachment #2: Type: text/html, Size: 5077 bytes --]

  reply	other threads:[~2022-03-24 17:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 17:13 Mario Blättermann
2022-03-24 17:33 ` Michael Stapelberg [this message]
2022-03-24 18:00   ` Mario Blättermann
2022-03-25 12:27 ` Ingo Schwarze
2022-03-25 16:07   ` Mario Blättermann
2022-03-25 20:58     ` Jan Stary
2022-03-26 12:34     ` Ingo Schwarze
2022-03-26 13:35       ` Mario Blättermann
2022-03-25 16:21   ` Anthony J. Bentley
2022-03-25 21:15     ` Jan Stary
2022-03-26 10:33     ` Ingo Schwarze
2022-03-26 17:55       ` Anthony J. Bentley
2022-03-27 11:17         ` Ingo Schwarze
2022-03-27 11:44           ` Ingo Schwarze
2022-03-25 16:57   ` Mario Blättermann
2022-03-25 20:36     ` Jan Stary
2022-03-25 20:59       ` Mario Blättermann
2022-03-25 21:20         ` Jan Stary
2022-03-26  9:25           ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANnVG6n9fEfuNapLzi=rPFmjZ36uZwkE6pu+sTaMauAGWVG=Ww@mail.gmail.com' \
    --to=stapelberg@debian.org \
    --cc=discuss@mandoc.bsd.lv \
    --subject='Re: HTML output: section headers with diacritics not in table of contents' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).