From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11817 invoked from network); 26 Apr 2008 19:10:12 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 26 Apr 2008 19:10:12 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 74670 invoked from network); 26 Apr 2008 19:10:07 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 26 Apr 2008 19:10:07 -0000 Received: (qmail 8921 invoked by alias); 26 Apr 2008 19:10:03 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24875 Received: (qmail 8884 invoked from network); 26 Apr 2008 19:10:01 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 26 Apr 2008 19:10:01 -0000 Received: from mail.o2.co.uk (jabba.london.02.net [82.132.130.169]) by bifrost.dotsrc.org (Postfix) with ESMTP id B5405808A38A for ; Sat, 26 Apr 2008 21:09:57 +0200 (CEST) Received: from sc.homeunix.net (78.105.216.138) by mail.o2.co.uk (8.0.013.3) (authenticated as stephane.chazelas) id 480CEB800151C397; Sat, 26 Apr 2008 20:09:57 +0100 Received: from chazelas by sc.homeunix.net with local (Exim 4.69) (envelope-from ) id 1JppmW-000387-NY; Sat, 26 Apr 2008 20:09:56 +0100 Date: Sat, 26 Apr 2008 20:09:56 +0100 From: Stephane Chazelas To: zsh-workers@sunsite.dk, Samuel Thibault , 478019@bugs.debian.org Subject: Re: Bug#478019: zsh: Should handle non-breaking space as word separator Message-ID: <20080426190956.GA9225@sc.homeunix.net> Mail-Followup-To: zsh-workers@sunsite.dk, Samuel Thibault , 478019@bugs.debian.org References: <20080426110003.GA16650@implementation> <20080426150548.GB6165@scru.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20080426150548.GB6165@scru.org> User-Agent: Mutt/1.5.16 (2007-09-19) X-Virus-Scanned: ClamAV 0.91.2/6957/Sat Apr 26 20:28:22 2008 on bifrost X-Virus-Status: Clean On Sat, Apr 26, 2008 at 04:05:48PM +0100, Clint Adams wrote: > On Sat, Apr 26, 2008 at 12:00:03PM +0100, Samuel Thibault wrote: > > Hello, > > > > On a french keyboard, '|' is typed by using alt-gr, and the non-breaking > > space is often typed by using alt-gr space. That often leads to this: > > > > ¤ echo a | grep a > > zsh: command not found:  grep > > > > Because zsh looks for a " grep" command, with leading non-breaking space > > because my thumb remained a bit too long on the alt-gr key. > > > > This doesn't happen with bash, because bash treats non-breaking space as > > a word separator. Could zsh do the same? (currently, I have defined > > alias  grep=grep > > alias  vi=vi > > ...) > > Having locale-based (and multibyte) word separators sounds like a nightmare > to me, but maybe someone has some ideas. Having the shell syntax that depends on the environment looks like a very bad idea to me (think of scripts!). There are already problems like that such as case $x in ([a-z]) ...;; esac which is locale dependent while in most cases it's not what you want. And to work around that is a nightmare POSIXly like: LC_ALL=C command eval 'case $x in ([a-z]) ... esac' (which doesn't even work in some shells because of bugs). Another bad example which is causing more harm than benefit: in ksh93, the decimal point is locale dependent, so you can't do: float Pi=3.14159265359 which is a syntax error is some locales: $ LC_ALL=fr_FR ksh93 -c 'float Pi=3.141592653589' ksh93[1]: typeset: 3.141592653589: arithmetic syntax error This one is even harder to overcome: $ LC_ALL=fr_FR ksh93 -c ' LC_ALL=C command float Pi=3.141592653589; print $Pi' $ LC_ALL=fr_FR ksh93 -c 'LC_ALL=C command eval float Pi=3.141592653589 print $((Pi))' ksh93: line 2: 3.141592653589: arithmetic syntax error LC_ALL=fr_FR ksh93 -c 'in_C_locale() { typeset LC_ALL=C; eval "$@"; } in_C_locale float Pi=3.141592653589; echo $LC_ALL; print $((Pi))' C 3.141592653589 All of which look like bugs to me. Anyway, my point was to say that it's a bad idea to have the syntax of the shell dependant on the locale. -- Stéphane