discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: mario.blaettermann@gmail.com
Cc: discuss@mandoc.bsd.lv
Subject: Re: HTML output: section headers with diacritics not in table of contents
Date: Sat, 26 Mar 2022 13:34:12 +0100	[thread overview]
Message-ID: <Yj8IRC2PWiP8ZDYO@asta-kit.de> (raw)
In-Reply-To: <CAHi0vA-GTGOP=8pC1gFOn6=yERZycuT3Mm_iOYrN1DsYY7Rozg@mail.gmail.com>

Hi Mario,

Mario Blättermann wrote on Fri, Mar 25, 2022 at 05:07:13PM +0100:
> Am Fr., 25. März 2022 um 13:27 Uhr schrieb Ingo Schwarze <schwarze@usta.de>:
>> Mario Blättermann wrote on Thu, Mar 24, 2022 at 06:13:23PM +0100:

>>> recently I'm switched from GNU man-db to mandoc. It's really a big
>>> step ahead, especially regarding the creation of HTML pages, but it
>>> has its own peculiarities …
>>>
>>> For creating a HTML man page I use the following command:
>>>
>>> mandoc -T html -O toc ./manpage.1 > manpage.1.html

>> You should really use the -O style=... and -O man=... options in
>> addition to the options you are already using.  Without "style",
>> CSS support is next to absent; no real style sheet is linked to,
>> and only a minimal style sheet is embedded with <style>, so minimal
>> that many features cannot work.

> As far as I understand, proper TOC creation depends on a CSS file?

No.  The TOC (in the sense Jan and i explained in earlier messages)
does not need CSS.

Then again, i guess what you meant here probably was "tagging depends
on a CSS file".  That statement would be mostly misleading but arguably
somewhat true to a lesser extent.  To understand what i mean, look at
this HTML code (make sure to disable HTML in you mail user agent if you
have it enabled):

  <h1 class="Sh" id="DESCRIPTION">
    <a class="permalink" href="#DESCRIPTION">DESCRIPTION</a>
  </h1>

Tagging involves four aspects:

 1. The id= attribute of the h1 element shown above.
    The value of that attribute, "DESCRIPTION", is called
    the "tag" in mandoc(1), less(1), and ctags(1) parlance,
    admittedly a bit unfortunately as "h1" is called a tag
    in HTML parlance.  The mandoc/less/ctags tag is generated
    if and only if the mdoc(7) or man(7) parser finds a section
    title that looks sufficiently alphabetic.  It's intentional
    that i use an imprecise wording here because what
    "sufficiently alphabetic" means is the technical detail
    that we are considering to change right now.  This tag
    and id=-attribute is generated even if you do not use a CSS
    file.

 2. The 'a class="permalink"' element.  As long as a tag was
    generated in no. 1 above, that element is also generated no
    matter whether you use a CSS file or not.

 3. Formatting of the h1 element depends on the stylesheet.
    The following CSS properties are absent when you fail to
    use a stylesheet:

	margin-top: 1.2em;
	margin-bottom: 0.6em;
	margin-left: -3.2em;

    Also, no tooltip is shown when you hover your mouse over
    the h1 element unless you use the CSS file.

    Arguably, none of this no. 3 is related to tagging.

 4. Formatting of the "a" element depends on the stylesheet.
    The following CSS properties are absent when you fail to
    use a stylesheet:

	color: inherit;
	font: inherit;
	text-decoration: inherit; 
	border-bottom: thin dotted;

    Arguably, this no. 4 is related to tagging because these
    properties determine how the presence of the tag is
    indicated by the rendering.  Then again, without using
    the stylesheet, the presence of the tag is usually also
    indicated in whatever way is the default for the browser,
    for example by a big blue font with a solid underline.

>> Without "man", you get no hyperlinks from .Xr macros.

> It's not about hyperlinks, this works at least in Archmanweb, and on
> my local machine I don't need such links

I have no idea how Archmanweb might create hyperlinks for manual
page references unless you pass the -O man= option to mandoc.
Well, Archmanweb might perhaps tinker around with the generated
HTML code after the fact, using some crude heuristics.  I don't
know what Archmanweb does.

>> Such mistranslations obviously not only happen for reserved words
>> like "SYNOPSIS", but also in the main text of manual pages.
>> That's why i hate translated manual pages so much.  Reading German
>> manual pages, i usually find them pretty unitelligible.

> OK, if you hate German man pages anyway, why get upset...?

For several reasons.

First and foremost, i really care about good documentation, so bad
documentation bothers me.

Secondly, once in a while machines maintained by other people
(not my own machines, of course) show me German error messages
and/or German documentation even if i don't ask for that, and
having to do extra configuration work just to get an intelligibly
user interface on some random machine feels annoying to me.

Finally, i am interested in questions of languages (formal and
living) in general, even though i'm not a specialist for language
theory (neither for formal nor for living languages).

My former teacher in theoretical physics, Prof. Dahmen (whom i greatly
respected in other matters) always made a point of strongly insisting that
a thesis ought to be written in German because developing professional and
technical terminology in all possible fields is crucial (in his opinion)
to keep a living language alive.  Even though i always found the idea
intriguing, i never managed to make up my mind whether that opinion is
true or false, or rather: to which degree it is reasonable.

But for technical terms in computer science, i fear German already is
a dead language (in Prof. Dahmen's sense) whether we like it or not.
Firmly established translations do exist for many technical terms in
computer science (for example input = Eingabe), even more technical terms
are firmly established as loanwords in German (for example hyperlink =
Hyperlink, patch = Patch), but huge numbers of technical terms do not
have a generally accepted and used translation to German.  In such cases,
people sometimes simply use the English word when talking in German (for
example diff = Diff), which may sometimes indicate that the establishment
of a new loanword is in progress.  In many cases, a translation does not
really exist.  A striking example from the example manual page you picked
is the English technical term "unified diff".  The (admittedly meager)
German Wikipedia page https://de.wikipedia.org/wiki/Diff works around
the gap in the langauge by using the somewhat leangthy wording "Das
sogenannte vereinheitlichte Format (unified diff)".  This solution feels
completely adequate to me: it is easy to understand by both professionals
and beginners, and the wording is also elegant from the perspective of
the language.

https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html
says, by contrast:

  -u, -U ANZAHL, --unified[=ANZAHL]
    ANZAHL Zeilen (Vorgabe 3) des vereinheitlichten Kontexts ausgeben

That's completely unitelligible for a German native speaker unless they
are also fluent in English *and* already know what the technical meanings
of "unified" *and* of "context" are in this particular context.  The only
way to understand this particular German wording is to translate the word
"vereinheitlicht" back to English and then recognize that "unified" and
"context" here both function as highly specialized technical terms
and *neither* of them is to be interpreted in the everyday sense of the
plain English words "unified" and "context".

I discuss this here in so much detail because i do care about such
matters and because i think such considerations do have some bearing
on the question which functionality matters to which degree in a
formatting program for technical documentation.

You cannot design a program well without considering how it should
and how it should better not be used.

>> So while i'm not aggressively trying to *not* support translated manual
>> pages, i don't think translated manual pages are particularly relevant
>> either.

> OK, I understand. I don't expect any further efforts to get a better
> TOC creation from your side. Maybe I can discuss this with the
> Archmanweb developers.

I fear you misunderstand.  I didn't mean to say, "fuck you, go to hell".
I'd like to apologize if it sounded like that to you.

I regularly consider features for implementation even when i consider
them "not particularly relevant".  If something is not partcularly
relevant and causes huge effort or disruption, it is likely to be
rejected.  But if something is easy to do, it might be worthwhile
even if it only provides marginal benefit.

No feature is implemented without carfully scrutinizing the design,
though.

Besides, i may be missing something and it might emerge that the defect
you are talking about causes more trouble than i so far think, and
the feature you are proposing provides more benefit than i so far
recognize.  In another mail, i said "i feel like sitting on the
fence."

And finally, while the questions of how the formatter should handle a
translated manual page and how translations can be improved to actually
become useable are clearly somewhat related, in the following sense, they
are at the same time close to orthogonal: *if* formatters get better at
handling translated manuals, that also helps to make translated manuals
better, no matter how the latter may be achieved in the text itself.
Maybe not all hope is lost for reviving at least some of the most widely
used native languages for this particular technical field, for example
German, Spanish, and Japanese.  As i said, i'm not sure whether that
is desirable, feasible, and if so, how.

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


  parent reply	other threads:[~2022-03-26 12:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 17:13 Mario Blättermann
2022-03-24 17:33 ` Michael Stapelberg
2022-03-24 18:00   ` Mario Blättermann
2022-03-25 12:27 ` Ingo Schwarze
2022-03-25 16:07   ` Mario Blättermann
2022-03-25 20:58     ` Jan Stary
2022-03-26 12:34     ` Ingo Schwarze [this message]
2022-03-26 13:35       ` Mario Blättermann
2022-03-25 16:21   ` Anthony J. Bentley
2022-03-25 21:15     ` Jan Stary
2022-03-26 10:33     ` Ingo Schwarze
2022-03-26 17:55       ` Anthony J. Bentley
2022-03-27 11:17         ` Ingo Schwarze
2022-03-27 11:44           ` Ingo Schwarze
2022-03-25 16:57   ` Mario Blättermann
2022-03-25 20:36     ` Jan Stary
2022-03-25 20:59       ` Mario Blättermann
2022-03-25 21:20         ` Jan Stary
2022-03-26  9:25           ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yj8IRC2PWiP8ZDYO@asta-kit.de \
    --to=schwarze@usta.de \
    --cc=discuss@mandoc.bsd.lv \
    --cc=mario.blaettermann@gmail.com \
    --subject='Re: HTML output: section headers with diacritics not in table of contents' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).