From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15089 invoked by alias); 17 Dec 2014 20:43:43 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 19556 Received: (qmail 18125 invoked from network); 17 Dec 2014 20:43:39 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, T_FSL_HELO_BARE_IP_2 autolearn=ham version=3.3.2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1418848315; bh=EHWSrm3VJOA9rPcjmy2935U934VbOCPajuflY0O6Mw0=; h=From:To:In-Reply-To:References:Subject:Date; b=AhYiPF8+TP7mg7Vd0fO1dUcuwRgFBsuKgYa0cdklHYCFA3/pFZuiFmS5R4KslzNbP IbSGz+tTNa2ShMDKNEAciBO/ie1cqDi0DVZKH26xqpFlq0NWwTNxrLqj6NagBsFywc +q+fcwlsPv3bLOVpORzbGKJVZS/bhhMCnUT57+4w= From: ZyX To: Ray Andrews , Zsh Users In-Reply-To: <5491C5E7.1070207@eastlink.ca> References: <5491C5E7.1070207@eastlink.ca> Subject: Re: utf-8 MIME-Version: 1.0 Message-Id: <577101418848314@web25o.yandex.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Wed, 17 Dec 2014 23:31:54 +0300 Content-Transfer-Encoding: 7bit Content-Type: text/plain 17.12.2014, 21:37, "Ray Andrews" : > When we talk about utf-8 and zsh, what is the relevance of that? I mean > what/when/where is zsh concerned with character encoding? Filenames I > guess, and inside strings too, perhaps? Not in zsh syntax itself I > presume. I guess that any data stream would/could be utf-8erized. > Anywhere else? Or is this something where I'm not even asking the right > question? You can check out explicit `utf-8` support by searching for `(?i)utf-?8|unicode` in `man zshall`. It looks like it is the following: - Explicit support in RE patterns. - COMBINING_CHARS option that tells zsh that terminal is able to display combining characters correctly (i.e. when calculating width zsh should assume that combining characters are joined with non-combining ones and thus are effectively zero cells wide). - MULTIBYTE option that affects string indexing and string length calculations, also `${(#)SOME_INTEGER_THAT_IS_GREATER_THEN_127}` parameter expansion flag. - `$'\uXXXX'` and `$'\UXXXXXXXX'`. - Width calculations for unicode characters with East Asian width property equal to F and W (i.e. fullwidth or double-width characters). - `insert-unicode-char` widget. Otherwise zsh supports encoding from the system locale (which may be UTF-8 or not) and not UTF-8. // Note: I did not actually check the code, I only checked the documentation. Also note that it would be very, very strange if zsh assumed filenames are in any encoding. File systems usually hold filenames as pure byte strings that just cannot contain some characters (for POSIX filesystem they only cannot contain `/` (because it is directory separator) and `\0` (because it is almost impossible to implement since there was some legacy: C strings are considered zero-terminated)). Any sane language knows that filename is a zero-terminated `/`-separated (with some additional assumptions if it intends to be run on Windows) byte string and that filename is *just* zero-terminated `/`-separated string *and nothing beyond that*. Not even that `abc/./../def` can be transformed to `def`: it is generally not true, so such normalization is always done only explicitly. (Note: Python-3 is *not* sane.)