From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8166 invoked from network); 10 May 2007 09:48:10 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.0 (2007-05-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=no version=3.2.0 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 10 May 2007 09:48:10 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 78177 invoked from network); 10 May 2007 09:48:05 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 10 May 2007 09:48:04 -0000 Received: (qmail 22882 invoked by alias); 10 May 2007 09:48:02 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 23413 Received: (qmail 22872 invoked from network); 10 May 2007 09:48:01 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 10 May 2007 09:48:01 -0000 Received: (qmail 77874 invoked from network); 10 May 2007 09:48:01 -0000 Received: from cluster-d.mailcontrol.com (217.69.20.190) by a.mx.sunsite.dk with SMTP; 10 May 2007 09:47:58 -0000 Received: from cameurexb01.EUROPE.ROOT.PRI ([62.189.241.200]) by rly17d.srv.mailcontrol.com (MailControl) with ESMTP id l4A9jd0h024810 for ; Thu, 10 May 2007 10:47:52 +0100 Received: from news01.csr.com ([10.103.143.38]) by cameurexb01.EUROPE.ROOT.PRI with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 May 2007 10:46:10 +0100 Received: from news01.csr.com (localhost.localdomain [127.0.0.1]) by news01.csr.com (8.13.8/8.13.4) with ESMTP id l4A9k9NR001154 for ; Thu, 10 May 2007 10:46:09 +0100 Received: from csr.com (pws@localhost) by news01.csr.com (8.13.8/8.13.8/Submit) with ESMTP id l4A9k9FI001151 for ; Thu, 10 May 2007 10:46:09 +0100 Message-Id: <200705100946.l4A9k9FI001151@news01.csr.com> X-Authentication-Warning: news01.csr.com: pws owned process doing -bs To: zsh-workers@sunsite.dk Subject: Re: Silent UTF-8 assumption? In-reply-to: <200705101156.19776.arvidjaar@newmail.ru> References: <200705101156.19776.arvidjaar@newmail.ru> Comments: In-reply-to Andrey Borzenkov message dated "Thu, 10 May 2007 11:56:18 +0400." Date: Thu, 10 May 2007 10:46:09 +0100 From: Peter Stephenson X-OriginalArrivalTime: 10 May 2007 09:46:10.0056 (UTC) FILETIME=[09F66880:01C792E8] Content-Type: text/plain MIME-Version: 1.0 X-Scanned-By: MailControl A-06-00-00 (www.mailcontrol.com) on 10.68.0.127 Andrey Borzenkov wrote: > --nextPart1795203.6vxPbZfGLe > Content-Type: text/plain; > charset="us-ascii" > Content-Transfer-Encoding: quoted-printable > Content-Disposition: inline > > This caught my attention: > > static wchar_t > charref(char *x, char *y) > { > wchar_t wc; > size_t ret; > > if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80)) > return (wchar_t) STOUC(*x); > > well, this is definitely not valid for arbitrary multibyte character > set. We're not using an arbitrary character set, we're using one that has the portable character set (i.e. ASCII) as a 7-bit subset, including the property of UTF-8 that any true multibyte stream has the eighth bit set in all octets. That's entirely for the practical reason that, if we don't make that assumption, all hell will break use because we have to make *every* part of the shell that ever tests a character, even an ASCII character, multibyte aware. There's a good chance the multibyte character set in question is UTF-8, but it doesn't necessarily have to be. -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview