discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: "Mario Blättermann" <mario.blaettermann@gmail.com>
To: discuss@mandoc.bsd.lv
Subject: Re: HTML output: section headers with diacritics not in table of contents
Date: Thu, 24 Mar 2022 19:00:30 +0100	[thread overview]
Message-ID: <CAHi0vA_zFTPNjo61BFYnXVoiXWnsVGfhnVFk8KgXu7A_XXpnrg@mail.gmail.com> (raw)
In-Reply-To: <CANnVG6n9fEfuNapLzi=rPFmjZ36uZwkE6pu+sTaMauAGWVG=Ww@mail.gmail.com>

Hello Michael,
thanks for your quick answer.

Am Do., 24. März 2022 um 18:34 Uhr schrieb Michael Stapelberg
<stapelberg@debian.org>:
>
>
>
> On Thu, 24 Mar 2022 at 18:13, Mario Blättermann <mario.blaettermann@gmail.com> wrote:
>>
>> Hello,
>>
>> recently I'm switched from GNU man-db to mandoc. It's really a big
>> step ahead, especially regarding the creation of HTML pages, but it
>> has its own peculiarities …
>>
>> For creating a HTML man page I use the following command:
>>
>> mandoc -T html -O toc ./manpage.1 > manpage.1.html
>>
>> This works so far for English man pages. For man pages in other
>> languages, I stumbled upon problems with creating toc entries. For
>> example, the "SYNOPSIS" is "ÜBERSICHT" in German, and the "Ü" is
>> displayed correctly, but the header is not clickable because it
>> doesn't have a toc entry. You can see this in the Archlinux online man
>> pages [1]; as you might know, "Archmanweb" uses Mandoc.
>>
>> The German keyboard produces the letter "Ü" as a single character
>> named "LATIN CAPITAL LETTER U WITH DIAERESIS", but there's a kind of
>> splitted "Ü" available: U U+0055 LATIN CAPITAL LETTER U ‎̈ U+0308
>> COMBINING DIAERESIS. If I change it in the Groff source, toc creation
>> works fine using this splitted one.
>>
>> Moreover, in the Vietnamese version of the same man page [2], even
>> more toc entries are missing. Obviously because multiple section
>> headers start with "T", followed by diacritics, no toc entry is
>> created for those. But interesting: TÓM TẮT doesn't have an entry, TÊN
>> does have one. I can't imagine how Mandoc distinguishes between
>> acceptable and unacceptable diacritics.
>>
>> The described behavior is the same with a pure Mandoc on my local
>> system and with Archmanweb. However, the developers of Debiman
>> obviously found a solution [3], maybe unconsciously …? In any case,
>> their Mandoc is wrapped in a Go-based environment. Besides some extra
>> features Archmanweb doesn't have (for example, better detection of
>> cross-references to other man pages if they are not formatted as
>> such), the toc creation works, even for the Vietnamese version [4].
>
>
> Hello, I’m the author of debiman :)
> The reason why it uses a different TOC implementation is historical:
> debiman introduced a TOC in 2017, whereas mandoc itself only gained -O toc in 2018.
>
OK, Debiman uses its own TOC implementation, so it needs either to be
fixed in Mandoc itself, what would be the preferred solution, or
reimplemented in Python for Archmanweb. But the latter wouldn't solve
the problem for local users.

BTW, there are some more online man page collections using Mandoc, for
OpenBSD, NetBSD, FreeBSD. But neither of the BSDs seem to have
translated man pages, so I can't test the behavior.

> I’m glad to hear that our code is unicode clean in that regard.
> Good unicode/internationalization was one of the project’s goals,
> and is easy to accomplish in Go.
>
Yes, of  course. But I don't have any programming skills, so I hope
that a Mandoc developer can fix it.

Best Regards,
Mario


>>
>>
>> Any idea what is wrong? Well, first I thought the problem is on my
>> machine, but Archmanweb shows the same behavior. As a workaround, I
>> could produce a few more toc entries by replacing "Ü" with "Ü" and
>> similar, but as long as I don't know what rules Mandoc applies
>> internally, it's almost impossible to fix. To mention, as one of the
>> maintainers of the manpages-l10n project [5], I have to maintain many
>> languages, not only my own one …
>>
>> I consider the online collections just as important as the local
>> versions, especially for linking to a specific man page section or
>> subsection in email or web, and for searching in man pages which are
>> not installed locally. Any help with solving this problem would be
>> appreciated.
>>
>> [1] https://man.archlinux.org/man/diff.1.de
>> [2] https://man.archlinux.org/man/diff.1.vi
>> [3] https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html
>> [4] https://manpages.debian.org/unstable/manpages-vi/diff.1.vi.html.gz
>> [5] https://salsa.debian.org/manpages-l10n-team/manpages-l10n
>>
>> Best Regards,
>> Mario
>> --
>>  To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv
>>
>
>
> --
> Best regards,
> Michael
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


  reply	other threads:[~2022-03-24 18:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 17:13 Mario Blättermann
2022-03-24 17:33 ` Michael Stapelberg
2022-03-24 18:00   ` Mario Blättermann [this message]
2022-03-25 12:27 ` Ingo Schwarze
2022-03-25 16:07   ` Mario Blättermann
2022-03-25 20:58     ` Jan Stary
2022-03-26 12:34     ` Ingo Schwarze
2022-03-26 13:35       ` Mario Blättermann
2022-03-25 16:21   ` Anthony J. Bentley
2022-03-25 21:15     ` Jan Stary
2022-03-26 10:33     ` Ingo Schwarze
2022-03-26 17:55       ` Anthony J. Bentley
2022-03-27 11:17         ` Ingo Schwarze
2022-03-27 11:44           ` Ingo Schwarze
2022-03-25 16:57   ` Mario Blättermann
2022-03-25 20:36     ` Jan Stary
2022-03-25 20:59       ` Mario Blättermann
2022-03-25 21:20         ` Jan Stary
2022-03-26  9:25           ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHi0vA_zFTPNjo61BFYnXVoiXWnsVGfhnVFk8KgXu7A_XXpnrg@mail.gmail.com \
    --to=mario.blaettermann@gmail.com \
    --cc=discuss@mandoc.bsd.lv \
    --subject='Re: HTML output: section headers with diacritics not in table of contents' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).