From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from www.sonnenberger.org (www.sonnenberger.org [92.79.50.50]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o67JOZXJ001857 for ; Wed, 7 Jul 2010 15:24:36 -0400 (EDT) Received: from britannica.bec.de (www.sonnenberger.org [192.168.1.10]) by www.sonnenberger.org (Postfix) with ESMTP id F42366665E for ; Wed, 7 Jul 2010 21:24:28 +0200 (CEST) Received: by britannica.bec.de (Postfix, from userid 1000) id 73F2D150A8; Wed, 7 Jul 2010 21:18:08 +0200 (CEST) Date: Wed, 7 Jul 2010 21:18:08 +0200 From: Joerg Sonnenberger To: discuss@mdocml.bsd.lv Subject: Re: Raw UTF-8? Message-ID: <20100707191807.GA18154@britannica.bec.de> References: <4c33f0f0.0c87970a.3458.fffff43f@mx.google.com> <20100707185815.GA19725@iris.usta.de> X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100707185815.GA19725@iris.usta.de> User-Agent: Mutt/1.5.20 (2009-06-14) On Wed, Jul 07, 2010 at 08:58:15PM +0200, Ingo Schwarze wrote: > For the occasional proper name of an author, use transliteration > to ASCII. I consider using non-ASCII-output escape sequences in > there a discourtesy with respect to the author, because then some > people will not be able to read the name. Actually, I would consider the reverse the correct behavior. The escape sequences should provide the transliteration depending on the device capabilities. Consider my name -- I would strongly hope that output devices with proper Latin1/Latin15/UTF-8 support to use the diacrit, but fall back to the transliterated version otherwise. > > I use plain UTF-8 instead of the escapes documented in mandoc_char(7), > > for a couple reasons. I'm just wondering, is this practice > > discouraged in any way? > > Yes. Eight-Bit characters in roff, man and mdoc source code are syntax > errors, just like they are in C and in any sane programming language. > The current implementation passes them through, but it could as well > throw them away, or abort the parser, subject to change without notice. You know that C99 just like many other modern language (dialects) allow full 8bit input? The primary problem I have with using 8bit input for mandoc(1) (or groff in general) is that it doesn't have a way to specify the input character set. If that is addressed, the discussion would move to the more interesting point of transliteration. Joreg -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv