From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.scc.kit.edu (mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id s1GKuxLo007413 for ; Sun, 16 Feb 2014 15:57:01 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1WF8lk-0008SU-5Q; Sun, 16 Feb 2014 21:56:56 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1WF8lk-00052b-3k; Sun, 16 Feb 2014 21:56:56 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1WF8lk-00005v-0D; Sun, 16 Feb 2014 21:56:56 +0100 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1WF8lj-0007Xv-V2; Sun, 16 Feb 2014 21:56:55 +0100 Date: Sun, 16 Feb 2014 21:56:55 +0100 From: Ingo Schwarze To: Thomas Klausner Cc: discuss@mdocml.bsd.lv Subject: Re: FWD: man.conf mandoc -Tlocale Message-ID: <20140216205655.GA18878@iris.usta.de> References: <20140214130647.GF20867@iris.usta.de> <20140215084309.GA14964@danbala.tuwien.ac.at> <20140215094251.GA24366@iris.usta.de> X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140215094251.GA24366@iris.usta.de> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Thomas, Dmitrij D. Czarkoff just pointed out to me in private mail that my analysis wasn't quite right, so i reinvestigated, and i have to correct this part: Ingo Schwarze wrote on Sat, Feb 15, 2014 at 10:42:51AM +0100: > Thomas Klausner wrote on Sat, Feb 15, 2014 at 09:43:09AM +0100: >> One thing I remember being broken at some point: Does this still allow >> examples to be copied, or do we have to be extra careful about marking >> them up then? > Yes. Plain '-' as an input character is rendered as an UTF-8 hyphen: That is *not* true. Plain '-' always renders as plain '-'. > $ mandoc -Tlocale ls.1 | hexdump -C | head -n 7 | tail -n 2 > 00000050 4e 08 4e 41 08 41 4d 08 4d 45 08 45 0a 20 20 20 |N.NA.AM.ME.E. | > 00000060 20 20 6c 08 6c 73 08 73 20 e2 80 93 20 6c 69 73 | l.ls.s ... lis| The reason for this is that we use \(en between .Nm and .Nd in the NAME section, not a plain '-'. > However, the input string "\-" is rendered as a plain ASCII minus sign, > even with -Tutf8: > > $ mandoc -Tlocale ls.1 | hexdump -C | head -n 70 | tail -n 3 > 00000430 0a 0a 20 20 20 20 20 54 68 65 20 6f 70 74 69 6f |.. The optio| > 00000440 6e 73 20 61 72 65 20 61 73 20 66 6f 6c 6c 6f 77 |ns are as follow| > 00000450 73 3a 0a 0a 20 20 20 20 20 2d 08 2d 31 08 31 20 |s:.. -.-1.1 | That part is correct. So, we have these mappings: input output ----- ASCII UTF-8 ----- ----- - - - \- - - \(hy - U+2010 \(en - U+2013 \(em -- U+2014 See also these lines in chars.in: CHAR("-", "-", 45) CHAR("hy", "-", 8208) CHAR("en", "-", 8211) CHAR("em", "--", 8212) So, unless people put \(hy, \(en, or \(em into their example code, i would expect copy and paste to work just fine even in UTF-8 mode. Yours, Ingo -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv