help / color / mirror / Atom feed
* HTML output: section headers with diacritics not in table of contents
@ 2022-03-24 17:13 Mario Blättermann
  2022-03-24 17:33 ` Michael Stapelberg
  2022-03-25 12:27 ` Ingo Schwarze
  0 siblings, 2 replies; 19+ messages in thread
From: Mario Blättermann @ 2022-03-24 17:13 UTC (permalink / raw)
  To: discuss


recently I'm switched from GNU man-db to mandoc. It's really a big
step ahead, especially regarding the creation of HTML pages, but it
has its own peculiarities …

For creating a HTML man page I use the following command:

mandoc -T html -O toc ./manpage.1 > manpage.1.html

This works so far for English man pages. For man pages in other
languages, I stumbled upon problems with creating toc entries. For
example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is
displayed correctly, but the header is not clickable because it
doesn't have a toc entry. You can see this in the Archlinux online man
pages [1]; as you might know, "Archmanweb" uses Mandoc.

The German keyboard produces the letter "Ü" as a single character
named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of
splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ‎̈ U+0308
COMBINING DIAERESIS. If I change it in the Groff source, toc creation
works fine using this splitted one.

Moreover, in the Vietnamese version of the same man page [2], even
more toc entries are missing. Obviously because multiple section
headers start with "T", followed by diacritics, no toc entry is
created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN
does have one. I can't imagine how Mandoc distinguishes between
acceptable and unacceptable diacritics.

The described behavior is the same with a pure Mandoc on my local
system and with Archmanweb. However, the developers of Debiman
obviously found a solution [3], maybe unconsciously …? In any case,
their Mandoc is wrapped in a Go-based environment. Besides some extra
features Archmanweb doesn't have (for example, better detection of
cross-references to other man pages if they are not formatted as
such), the toc creation works, even for the Vietnamese version [4].

Any idea what is wrong? Well, first I thought the problem is on my
machine, but Archmanweb shows the same behavior. As a workaround, I
could produce a few more toc entries by replacing "Ü" with "Ü" and
similar, but as long as I don't know what rules Mandoc applies
internally, it's almost impossible to fix. To mention, as one of the
maintainers of the manpages-l10n project [5], I have to maintain many
languages, not only my own one …

I consider the online collections just as important as the local
versions, especially for linking to a specific man page section or
subsection in email or web, and for searching in man pages which are
not installed locally. Any help with solving this problem would be

[1] https://man.archlinux.org/man/diff.1.de
[2] https://man.archlinux.org/man/diff.1.vi
[3] https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html
[4] https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz
[5] https://salsa.debian.org/manpages-l10n-team/manpages-l10n

Best Regards,
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-03-27 11:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-24 17:13 HTML output: section headers with diacritics not in table of contents Mario Blättermann
2022-03-24 17:33 ` Michael Stapelberg
2022-03-24 18:00   ` Mario Blättermann
2022-03-25 12:27 ` Ingo Schwarze
2022-03-25 16:07   ` Mario Blättermann
2022-03-25 20:58     ` Jan Stary
2022-03-26 12:34     ` Ingo Schwarze
2022-03-26 13:35       ` Mario Blättermann
2022-03-25 16:21   ` Anthony J. Bentley
2022-03-25 21:15     ` Jan Stary
2022-03-26 10:33     ` Ingo Schwarze
2022-03-26 17:55       ` Anthony J. Bentley
2022-03-27 11:17         ` Ingo Schwarze
2022-03-27 11:44           ` Ingo Schwarze
2022-03-25 16:57   ` Mario Blättermann
2022-03-25 20:36     ` Jan Stary
2022-03-25 20:59       ` Mario Blättermann
2022-03-25 21:20         ` Jan Stary
2022-03-26  9:25           ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).