From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.scc.kit.edu (mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id s1SH65Dm004634 for ; Fri, 28 Feb 2014 12:06:06 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1WJQss-0002qA-8z; Fri, 28 Feb 2014 18:06:02 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1WJQss-00083w-AH; Fri, 28 Feb 2014 18:06:02 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1WJQss-0003pU-6y; Fri, 28 Feb 2014 18:06:02 +0100 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1WJQss-0007jy-0r; Fri, 28 Feb 2014 18:06:02 +0100 Date: Fri, 28 Feb 2014 18:06:01 +0100 From: Ingo Schwarze To: discuss@mdocml.bsd.lv Cc: Ted Unangst , "Anthony J. Bentley" Subject: Re: Accents vs. combining accents Message-ID: <20140228170601.GB21476@iris.usta.de> References: X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Hi Anthony, Anthony J. Bentley wrote on Wed, Feb 26, 2014 at 12:51:38AM -0700: > Mandoc misbehaves a bit when printing accents in UTF-8. Right, see below for a patch which i intend to commit. > In summary: > > - Under normal circumstances, \` and \' should print spacing accents > and not combining accents. Correct. The same is true for many other accent escape sequences. > - Maybe we should consider printing real quotes (U+2018/9) on raw ` > and ' in UTF-8 mode. Maybe worth bringing up on the groff list too. That would cause similar issues with copy and paste like the ones just discussed regarding hyphens and dashes. So at least for manuals, it would probably have to be disabled right away, just like manuals output the ASCII character for - and \-. I tend to agree with Dmitrij Czarkoff that plain ASCII input in better left as-is, and people should use escape sequences if they want specific fancy UTF-8 characters (except that in manuals, they probably shouldn't, it merely harms portability to ask for fancy characters). [...] > In situations with no composite request (I guess?), mandoc should > print U+0060 and U+00B4 for \` and \' respectively, as groff does. Correct. Mandoc doesn't support escape sequences involving composite characters at all, so mandoc has to use the codes you cite in all cases. Yours, Ingo P.S. I snipped your discussion of texinfo2man, which makes sense to me. Index: chars.in =================================================================== RCS file: /cvs/src/usr.bin/mandoc/chars.in,v retrieving revision 1.20 diff -u -p -r1.20 chars.in --- chars.in 22 Jan 2014 20:58:35 -0000 1.20 +++ chars.in 28 Feb 2014 16:46:26 -0000 @@ -49,21 +49,21 @@ CHAR("c", "", 0) CHAR("}", "", 0) /* Accents. */ -CHAR("a\"", "\"", 779) +CHAR("a\"", "\"", 733) CHAR("a-", "-", 175) CHAR("a.", ".", 729) -CHAR("a^", "^", 770) -CHAR("\'", "\'", 769) -CHAR("aa", "\'", 769) -CHAR("ga", "`", 768) -CHAR("`", "`", 768) -CHAR("ab", "`", 774) -CHAR("ac", ",", 807) -CHAR("ad", "\"", 776) +CHAR("a^", "^", 94) +CHAR("\'", "\'", 180) +CHAR("aa", "\'", 180) +CHAR("ga", "`", 96) +CHAR("`", "`", 96) +CHAR("ab", "`", 728) +CHAR("ac", ",", 184) +CHAR("ad", "\"", 168) CHAR("ah", "v", 711) CHAR("ao", "o", 730) -CHAR("a~", "~", 771) -CHAR("ho", ",", 808) +CHAR("a~", "~", 126) +CHAR("ho", ",", 731) CHAR("ha", "^", 94) CHAR("ti", "~", 126) -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv