From: Ingo Schwarze <schwarze@usta.de> To: mario.blaettermann@gmail.com Cc: discuss@mandoc.bsd.lv Subject: Re: HTML output: section headers with diacritics not in table of contents Date: Sat, 26 Mar 2022 13:34:12 +0100 [thread overview] Message-ID: <Yj8IRC2PWiP8ZDYO@asta-kit.de> (raw) In-Reply-To: <CAHi0vA-GTGOP=8pC1gFOn6=yERZycuT3Mm_iOYrN1DsYY7Rozg@mail.gmail.com> Hi Mario, Mario Blättermann wrote on Fri, Mar 25, 2022 at 05:07:13PM +0100: > Am Fr., 25. März 2022 um 13:27 Uhr schrieb Ingo Schwarze <schwarze@usta.de>: >> Mario Blättermann wrote on Thu, Mar 24, 2022 at 06:13:23PM +0100: >>> recently I'm switched from GNU man-db to mandoc. It's really a big >>> step ahead, especially regarding the creation of HTML pages, but it >>> has its own peculiarities … >>> >>> For creating a HTML man page I use the following command: >>> >>> mandoc -T html -O toc ./manpage.1 > manpage.1.html >> You should really use the -O style=... and -O man=... options in >> addition to the options you are already using. Without "style", >> CSS support is next to absent; no real style sheet is linked to, >> and only a minimal style sheet is embedded with <style>, so minimal >> that many features cannot work. > As far as I understand, proper TOC creation depends on a CSS file? No. The TOC (in the sense Jan and i explained in earlier messages) does not need CSS. Then again, i guess what you meant here probably was "tagging depends on a CSS file". That statement would be mostly misleading but arguably somewhat true to a lesser extent. To understand what i mean, look at this HTML code (make sure to disable HTML in you mail user agent if you have it enabled): <h1 class="Sh" id="DESCRIPTION"> <a class="permalink" href="#DESCRIPTION">DESCRIPTION</a> </h1> Tagging involves four aspects: 1. The id= attribute of the h1 element shown above. The value of that attribute, "DESCRIPTION", is called the "tag" in mandoc(1), less(1), and ctags(1) parlance, admittedly a bit unfortunately as "h1" is called a tag in HTML parlance. The mandoc/less/ctags tag is generated if and only if the mdoc(7) or man(7) parser finds a section title that looks sufficiently alphabetic. It's intentional that i use an imprecise wording here because what "sufficiently alphabetic" means is the technical detail that we are considering to change right now. This tag and id=-attribute is generated even if you do not use a CSS file. 2. The 'a class="permalink"' element. As long as a tag was generated in no. 1 above, that element is also generated no matter whether you use a CSS file or not. 3. Formatting of the h1 element depends on the stylesheet. The following CSS properties are absent when you fail to use a stylesheet: margin-top: 1.2em; margin-bottom: 0.6em; margin-left: -3.2em; Also, no tooltip is shown when you hover your mouse over the h1 element unless you use the CSS file. Arguably, none of this no. 3 is related to tagging. 4. Formatting of the "a" element depends on the stylesheet. The following CSS properties are absent when you fail to use a stylesheet: color: inherit; font: inherit; text-decoration: inherit; border-bottom: thin dotted; Arguably, this no. 4 is related to tagging because these properties determine how the presence of the tag is indicated by the rendering. Then again, without using the stylesheet, the presence of the tag is usually also indicated in whatever way is the default for the browser, for example by a big blue font with a solid underline. >> Without "man", you get no hyperlinks from .Xr macros. > It's not about hyperlinks, this works at least in Archmanweb, and on > my local machine I don't need such links I have no idea how Archmanweb might create hyperlinks for manual page references unless you pass the -O man= option to mandoc. Well, Archmanweb might perhaps tinker around with the generated HTML code after the fact, using some crude heuristics. I don't know what Archmanweb does. >> Such mistranslations obviously not only happen for reserved words >> like "SYNOPSIS", but also in the main text of manual pages. >> That's why i hate translated manual pages so much. Reading German >> manual pages, i usually find them pretty unitelligible. > OK, if you hate German man pages anyway, why get upset...? For several reasons. First and foremost, i really care about good documentation, so bad documentation bothers me. Secondly, once in a while machines maintained by other people (not my own machines, of course) show me German error messages and/or German documentation even if i don't ask for that, and having to do extra configuration work just to get an intelligibly user interface on some random machine feels annoying to me. Finally, i am interested in questions of languages (formal and living) in general, even though i'm not a specialist for language theory (neither for formal nor for living languages). My former teacher in theoretical physics, Prof. Dahmen (whom i greatly respected in other matters) always made a point of strongly insisting that a thesis ought to be written in German because developing professional and technical terminology in all possible fields is crucial (in his opinion) to keep a living language alive. Even though i always found the idea intriguing, i never managed to make up my mind whether that opinion is true or false, or rather: to which degree it is reasonable. But for technical terms in computer science, i fear German already is a dead language (in Prof. Dahmen's sense) whether we like it or not. Firmly established translations do exist for many technical terms in computer science (for example input = Eingabe), even more technical terms are firmly established as loanwords in German (for example hyperlink = Hyperlink, patch = Patch), but huge numbers of technical terms do not have a generally accepted and used translation to German. In such cases, people sometimes simply use the English word when talking in German (for example diff = Diff), which may sometimes indicate that the establishment of a new loanword is in progress. In many cases, a translation does not really exist. A striking example from the example manual page you picked is the English technical term "unified diff". The (admittedly meager) German Wikipedia page https://de.wikipedia.org/wiki/Diff works around the gap in the langauge by using the somewhat leangthy wording "Das sogenannte vereinheitlichte Format (unified diff)". This solution feels completely adequate to me: it is easy to understand by both professionals and beginners, and the wording is also elegant from the perspective of the language. https://manpages.debian.org/bullseye/manpages-de/diff.1.de.html says, by contrast: -u, -U ANZAHL, --unified[=ANZAHL] ANZAHL Zeilen (Vorgabe 3) des vereinheitlichten Kontexts ausgeben That's completely unitelligible for a German native speaker unless they are also fluent in English *and* already know what the technical meanings of "unified" *and* of "context" are in this particular context. The only way to understand this particular German wording is to translate the word "vereinheitlicht" back to English and then recognize that "unified" and "context" here both function as highly specialized technical terms and *neither* of them is to be interpreted in the everyday sense of the plain English words "unified" and "context". I discuss this here in so much detail because i do care about such matters and because i think such considerations do have some bearing on the question which functionality matters to which degree in a formatting program for technical documentation. You cannot design a program well without considering how it should and how it should better not be used. >> So while i'm not aggressively trying to *not* support translated manual >> pages, i don't think translated manual pages are particularly relevant >> either. > OK, I understand. I don't expect any further efforts to get a better > TOC creation from your side. Maybe I can discuss this with the > Archmanweb developers. I fear you misunderstand. I didn't mean to say, "fuck you, go to hell". I'd like to apologize if it sounded like that to you. I regularly consider features for implementation even when i consider them "not particularly relevant". If something is not partcularly relevant and causes huge effort or disruption, it is likely to be rejected. But if something is easy to do, it might be worthwhile even if it only provides marginal benefit. No feature is implemented without carfully scrutinizing the design, though. Besides, i may be missing something and it might emerge that the defect you are talking about causes more trouble than i so far think, and the feature you are proposing provides more benefit than i so far recognize. In another mail, i said "i feel like sitting on the fence." And finally, while the questions of how the formatter should handle a translated manual page and how translations can be improved to actually become useable are clearly somewhat related, in the following sense, they are at the same time close to orthogonal: *if* formatters get better at handling translated manuals, that also helps to make translated manuals better, no matter how the latter may be achieved in the text itself. Maybe not all hope is lost for reviving at least some of the most widely used native languages for this particular technical field, for example German, Spanish, and Japanese. As i said, i'm not sure whether that is desirable, feasible, and if so, how. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv
next prev parent reply other threads:[~2022-03-26 12:34 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-24 17:13 Mario Blättermann 2022-03-24 17:33 ` Michael Stapelberg 2022-03-24 18:00 ` Mario Blättermann 2022-03-25 12:27 ` Ingo Schwarze 2022-03-25 16:07 ` Mario Blättermann 2022-03-25 20:58 ` Jan Stary 2022-03-26 12:34 ` Ingo Schwarze [this message] 2022-03-26 13:35 ` Mario Blättermann 2022-03-25 16:21 ` Anthony J. Bentley 2022-03-25 21:15 ` Jan Stary 2022-03-26 10:33 ` Ingo Schwarze 2022-03-26 17:55 ` Anthony J. Bentley 2022-03-27 11:17 ` Ingo Schwarze 2022-03-27 11:44 ` Ingo Schwarze 2022-03-25 16:57 ` Mario Blättermann 2022-03-25 20:36 ` Jan Stary 2022-03-25 20:59 ` Mario Blättermann 2022-03-25 21:20 ` Jan Stary 2022-03-26 9:25 ` Ingo Schwarze
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Yj8IRC2PWiP8ZDYO@asta-kit.de \ --to=schwarze@usta.de \ --cc=discuss@mandoc.bsd.lv \ --cc=mario.blaettermann@gmail.com \ --subject='Re: HTML output: section headers with diacritics not in table of contents' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).