help / color / mirror / Atom feed
From: "Mario Blättermann" <mario.blaettermann@gmail.com>
To: discuss@mandoc.bsd.lv
Subject: HTML output: section headers with diacritics not in table of contents
Date: Thu, 24 Mar 2022 18:13:23 +0100	[thread overview]
Message-ID: <CAHi0vA_pfSMOHCd+JuG0efwz=WYUDs2N4=Mgv+BzHcP62T-jAw@mail.gmail.com> (raw)


recently I'm switched from GNU man-db to mandoc. It's really a big
step ahead, especially regarding the creation of HTML pages, but it
has its own peculiarities …

For creating a HTML man page I use the following command:

mandoc -T html -O toc ./manpage.1 > manpage.1.html

This works so far for English man pages. For man pages in other
languages, I stumbled upon problems with creating toc entries. For
example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is
displayed correctly, but the header is not clickable because it
doesn't have a toc entry. You can see this in the Archlinux online man
pages [1]; as you might know, "Archmanweb" uses Mandoc.

The German keyboard produces the letter "Ü" as a single character
named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of
splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ‎̈ U+0308
COMBINING DIAERESIS. If I change it in the Groff source, toc creation
works fine using this splitted one.

Moreover, in the Vietnamese version of the same man page [2], even
more toc entries are missing. Obviously because multiple section
headers start with "T", followed by diacritics, no toc entry is
created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN
does have one. I can't imagine how Mandoc distinguishes between
acceptable and unacceptable diacritics.

The described behavior is the same with a pure Mandoc on my local
system and with Archmanweb. However, the developers of Debiman
obviously found a solution [3], maybe unconsciously …? In any case,
their Mandoc is wrapped in a Go-based environment. Besides some extra
features Archmanweb doesn't have (for example, better detection of
cross-references to other man pages if they are not formatted as
such), the toc creation works, even for the Vietnamese version [4].

Any idea what is wrong? Well, first I thought the problem is on my
machine, but Archmanweb shows the same behavior. As a workaround, I
could produce a few more toc entries by replacing "Ü" with "Ü" and
similar, but as long as I don't know what rules Mandoc applies
internally, it's almost impossible to fix. To mention, as one of the
maintainers of the manpages-l10n project [5], I have to maintain many
languages, not only my own one …

I consider the online collections just as important as the local
versions, especially for linking to a specific man page section or
subsection in email or web, and for searching in man pages which are
not installed locally. Any help with solving this problem would be

[1] https://man.archlinux.org/man/diff.1.de
[2] https://man.archlinux.org/man/diff.1.vi
[3] https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html
[4] https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz
[5] https://salsa.debian.org/manpages-l10n-team/manpages-l10n

Best Regards,
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv

             reply	other threads:[~2022-03-24 17:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 17:13 Mario Blättermann [this message]
2022-03-24 17:33 ` Michael Stapelberg
2022-03-24 18:00   ` Mario Blättermann
2022-03-25 12:27 ` Ingo Schwarze
2022-03-25 16:07   ` Mario Blättermann
2022-03-25 20:58     ` Jan Stary
2022-03-26 12:34     ` Ingo Schwarze
2022-03-26 13:35       ` Mario Blättermann
2022-03-25 16:21   ` Anthony J. Bentley
2022-03-25 21:15     ` Jan Stary
2022-03-26 10:33     ` Ingo Schwarze
2022-03-26 17:55       ` Anthony J. Bentley
2022-03-27 11:17         ` Ingo Schwarze
2022-03-27 11:44           ` Ingo Schwarze
2022-03-25 16:57   ` Mario Blättermann
2022-03-25 20:36     ` Jan Stary
2022-03-25 20:59       ` Mario Blättermann
2022-03-25 21:20         ` Jan Stary
2022-03-26  9:25           ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHi0vA_pfSMOHCd+JuG0efwz=WYUDs2N4=Mgv+BzHcP62T-jAw@mail.gmail.com' \
    --to=mario.blaettermann@gmail.com \
    --cc=discuss@mandoc.bsd.lv \
    --subject='Re: HTML output: section headers with diacritics not in table of contents' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).