discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: discuss@mdocml.bsd.lv
Subject: Re: Raw UTF-8?
Date: Wed, 7 Jul 2010 23:12:12 +0200	[thread overview]
Message-ID: <20100707211212.GC19725@iris.usta.de> (raw)
In-Reply-To: <20100707191807.GA18154@britannica.bec.de>

Hi Joerg,

Joerg Sonnenberger wrote on Wed, Jul 07, 2010 at 09:18:08PM +0200:

> Consider my name -- I would strongly hope that output devices with
> proper Latin1/Latin15/UTF-8 support to use the diacrit, but fall
> back to the transliterated version otherwise.

You hope in vain.  Did you try?

Both old and new groff render that as 'J"\borg Sonnenberger',
which looks like "Jorg Sonnenberger" on a typical terminal.

Maybe the reason for using the unreliable backspace-encoding
variant instead of the transliteration "oe" is that more languages
than just german might use the "LATIN SMALL LETTER O WITH DIAERESIS",
as Unicode calls it, and who knows how a good transliteration from
those languages into ASCII might look like?

The point is, for correct results, you must transliterate before
encoding, when you still know the context, e.g. the language,
which is often required to figure out a correct transliteration.

Thus, you should really use

.An Joerg Sonnenberger

and never

.An J\(:org Sonnenberger

when documenting your programs.


> You know that C99 just like many other modern language (dialects)
> allow full 8bit input?

I know that some do, and i have fought with Python code garbled
in that way, and all the more do i call it insane.

> The primary problem I have with using 8bit input for mandoc(1) (or groff
> in general) is that it doesn't have a way to specify the input character
> set. If that is addressed, the discussion would move to the more
> interesting point of transliteration.

In my experience, as soon as you start dealing with character sets,
chaos ensues.  WTF has made matters worse, not better, because now
many people think it is OK to scatter crap all over the place.
In typesetting, the mentioned chaos is unfortunately unavoidable,
and you need to deal with it; but most of the time, it is also easier
to handle there because in most typesetting environments, you deal
with one language at a time, and you know beforehand with which one.

Unless we enjoy pain, bloat and code obfuscation *and* want to be
continuously distracted from serious development, we should keep
mandoc as far away from any kind of charset considerations as
possible.

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv

  reply	other threads:[~2010-07-07 21:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-07  3:13 Anthony J. Bentley
2010-07-07  9:33 ` Kristaps Dzonsons
2010-07-07 14:39   ` Anthony J. Bentley
2010-07-07 20:13     ` Ingo Schwarze
2010-07-07 18:58 ` Ingo Schwarze
2010-07-07 19:18   ` Joerg Sonnenberger
2010-07-07 21:12     ` Ingo Schwarze [this message]
2010-07-07 21:17       ` Joerg Sonnenberger
2010-07-09 21:05         ` Ulrich Spörlein
2010-07-10 18:11           ` J.C. Roberts
2010-07-11 22:17             ` Ingo Schwarze
2010-07-11 22:38           ` Kristaps Dzonsons
2010-07-13 19:23             ` Ulrich Spörlein
2010-07-13 23:25               ` Kristaps Dzonsons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100707211212.GC19725@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=discuss@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).