From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26484 invoked by alias); 7 Jun 2015 02:27:01 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 35409 Received: (qmail 20923 invoked from network); 7 Jun 2015 02:26:59 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.0 Message-ID: <5573AAB9.5090709@inlv.org> Date: Sun, 07 Jun 2015 04:21:45 +0200 From: Martijn Dekker User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: zsh-workers@zsh.org Subject: Re: In POSIX mode, ${#var} measures length in bytes, not characters References: <55738EFE.1050805@inlv.org> <6697841433637291@web4o.yandex.ru> In-Reply-To: <6697841433637291@web4o.yandex.ru> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit ZyX schreef op 07-06-15 om 02:34: > Do you have a reference where “character” is defined? Yes: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_02 POSIX specifically allows any character encoding, including multibyte characters, depending on the user's locale, and on the condition that the portable character set (basically US-ASCII) is a subset of the locale's character set. With UTF-8 now the de facto standard locale and it including multibyte characters, it's become important for shells to get this right. > This behaviour is the same in posh and dash: Yes, dash and pdksh/mksh/posh unfortunately have this bug, too. But bash, ksh93, and yash correctly measure characters, not bytes. (yash is supposed to be the most POSIX-compliant of them all.) Thanks, - Martijn