From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o67IwKFS002241 for ; Wed, 7 Jul 2010 14:58:22 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1OWZp2-0007Iz-Kx; Wed, 07 Jul 2010 20:58:18 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.71) (envelope-from ) id 1OWZp2-00052z-J8 for discuss@mdocml.bsd.lv; Wed, 07 Jul 2010 20:58:16 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1OWZp2-00044j-IC for discuss@mdocml.bsd.lv; Wed, 07 Jul 2010 20:58:16 +0200 Received: from schwarze by usta.de with local (Exim 4.71) (envelope-from ) id 1OWZp2-0003JB-AD for discuss@mdocml.bsd.lv; Wed, 07 Jul 2010 20:58:16 +0200 Date: Wed, 7 Jul 2010 20:58:15 +0200 From: Ingo Schwarze To: discuss@mdocml.bsd.lv Subject: Re: Raw UTF-8? Message-ID: <20100707185815.GA19725@iris.usta.de> References: <4c33f0f0.0c87970a.3458.fffff43f@mx.google.com> X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4c33f0f0.0c87970a.3458.fffff43f@mx.google.com> User-Agent: Mutt/1.5.20 (2009-06-14) Hi Anthony, > When using special characters in manpages, I consider that a terrible idea. In a nutshell, such manuals are useless on terminals. If some piece of information is important, you should really encode it such that all readers can see it. If it is unimportant, just leave it out instead of obfuscating it, which will make some people wonder whether they are missing anything. We should probably add a warning to discourage people from using characters needing more than ASCII on output, saying something like "this manual is not portable and will not display correctly in some environments". >From my point of view, non-ASCII-output escape sequences are only supported for backward compatibility with legacy manuals, and displaying something semi-sensible in their place is done on a best-effort basis, knowing that it is ultimately unreliable. Using such escape sequences in new mdoc(7) source code, you would only show that you don't care about the usability of your manuals. For the occasional proper name of an author, use transliteration to ASCII. I consider using non-ASCII-output escape sequences in there a discourtesy with respect to the author, because then some people will not be able to read the name. > I use plain UTF-8 instead of the escapes documented in mandoc_char(7), > for a couple reasons. I'm just wondering, is this practice > discouraged in any way? Yes. Eight-Bit characters in roff, man and mdoc source code are syntax errors, just like they are in C and in any sane programming language. The current implementation passes them through, but it could as well throw them away, or abort the parser, subject to change without notice. > Is there a chance of this _not_ working in future versions of mandoc? If it works, that is by mere chance, but not portable in any way, neither between output devices, nor between platforms, nor between different versions of mandoc. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv