From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21599 invoked from network); 13 Feb 2007 21:11:48 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,FORGED_RCVD_HELO autolearn=ham version=3.1.7 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 13 Feb 2007 21:11:48 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 91545 invoked from network); 13 Feb 2007 21:11:42 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 13 Feb 2007 21:11:42 -0000 Received: (qmail 20735 invoked by alias); 13 Feb 2007 21:11:36 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 23171 Received: (qmail 20725 invoked from network); 13 Feb 2007 21:11:36 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 13 Feb 2007 21:11:36 -0000 Received: (qmail 90898 invoked from network); 13 Feb 2007 21:11:36 -0000 Received: from mtaout03-winn.ispmail.ntl.com (81.103.221.49) by a.mx.sunsite.dk with SMTP; 13 Feb 2007 21:11:29 -0000 Received: from aamtaout03-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout03-winn.ispmail.ntl.com with ESMTP id <20070213211128.LNRD1468.mtaout03-winn.ispmail.ntl.com@aamtaout03-winn.ispmail.ntl.com> for ; Tue, 13 Feb 2007 21:11:28 +0000 Received: from pwslaptop.csr.com ([81.107.44.96]) by aamtaout03-winn.ispmail.ntl.com with ESMTP id <20070213211128.CVXG26699.aamtaout03-winn.ispmail.ntl.com@pwslaptop.csr.com> for ; Tue, 13 Feb 2007 21:11:28 +0000 Received: from pwslaptop.csr.com (pwslaptop.csr.com [127.0.0.1]) by pwslaptop.csr.com (8.13.8/8.13.7) with ESMTP id l1DLB5rA003849 for ; Tue, 13 Feb 2007 21:11:07 GMT Message-Id: <200702132111.l1DLB5rA003849@pwslaptop.csr.com> From: Peter Stephenson To: zsh-workers@sunsite.dk Subject: Re: Quoting problem and crashes with ${(#)var} In-Reply-To: Message from Bart Schaefer of "Mon, 12 Feb 2007 20:59:40 PST." <070212205940.ZM2872@torch.brasslantern.com> Date: Tue, 13 Feb 2007 21:11:05 +0000 Bart Schaefer wrote: > I'm a bit puzzled, given this test ... > > } if (isset(MULTIBYTE) && ires > 127) { > > ... why ${(V)x} for x in 128 through 159 display as \u0080 through > \u009f, but then 160 through 255 are treated as directly printable. On my terminal, I've got different effects, which worries me more: if I assign the UTF-8 representation of character 128 to a variable, ${(V)x} tries to print it out directly (and it only shows up if send it through xxd or equivalent). Quite possibly the shell is linked with different libraries. (However, the ZLE function insert-unicode-char correctly shows it as control character, ^ followed by A with a grave accent.) Anyway, 128 to 159 aren't printable, 160 on are: in Unicode: 0080 ... 009F = APPLICATION PROGRAM COMMAND 00A0 NO-BREAK SPACE = NBSP x (space - 0020) x (figure space - 2007) x (narrow no-break space - 202F) x (word joiner - 2060) x (zero width no-break space - FEFF) # 0020 (V) is documented as "make special characters visible". That's exactly what you're getting (but I'm not, for some reason---I'd be interested in knowing where on your system the printability test is taking place). > Furthermore, if I run with LANG=C I get > > % for x in {1..254}; h[x]=${(V#)x} > zsh: character not in range > > That seems wrong. It does the right thing if "unsetopt multibyte" > is also in effect, but why should I have to explicitly do so? Well, because you've (explicitly or otherwise) got it set to a locale with no knowledge of characters beyond 127; it only knows about the portable character set. It's simply telling you it doesn't know what to do with them. It can't guess, because there's nothing really for it to guess; locale C is a statement of ignorance about the non-portable, post-ASCII world. You're probably expecting 128 to print out a single octet corresponding to the value of a C unsigned char with 128 in it. Yet for all the computer's been told, character 128 on the terminal you're using is "really" the symbol for a deity worshipped by a Venusian cargo cult which is represented by a string of 17 0xff's followed by 0x73 0x57, the magic number indicating the transfer of energy between worshippers when the Earth is high in the sky. (All right, maybe this particular example wasn't very realistic.) What you're asking is for some kludged special case for LANG=C (presumably we shouldn't second-guess any other character set). It's doable, I suppose, but I can't see the gain. MULTIBYTE mode was never intended to be backward compatible; that's exactly why NO_MULTIBYTE exists. -- Peter Stephenson Web page now at http://homepage.ntlworld.com/p.w.stephenson/