From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13558 invoked by alias); 11 Sep 2015 18:40:23 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 20539 Received: (qmail 3996 invoked from network); 11 Sep 2015 18:40:22 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-Biglobe-Sender: Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Match length and multibyte characters From: "Jun T." In-Reply-To: Date: Sat, 12 Sep 2015 03:02:56 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Erik Bernstein , zsh-users@zsh.org X-Mailer: Apple Mail (2.1878.6) X-Biglobe-Spnum: 50075 2015/09/10 20:35, Erik Bernstein wrote: > % array=3D(a =C3=A4 a) > % print ${${(O)array//(#m)*/${#MATCH}}[1]} ${${(ON)array%%*}[1]} > 1 2 >=20 > Can maybe someone shed some light on whether the second version is > supposed to work with multibyte characters and, The second version returns 2 because =C3=A4 is a 2 byte character in = UTF-8. This is a bug of the current zsh; all the flags N, B and E do not work well with multibyte characters in ${...#...}, ${...%...} etc. The patch below may fix the bug. BTW, in your example, it is better to replace the flag (O) by (On) so that the length is sorted in numerical order. Otherwise, 10 comes before 2. diff --git a/Src/glob.c b/Src/glob.c index dea1bf5..43d135b 100644 --- a/Src/glob.c +++ b/Src/glob.c @@ -2491,17 +2491,17 @@ get_match_ret(char *s, int b, int e, int fl, = char *replstr, ll +=3D 1 + (l - (e - b)); if (fl & SUB_BIND) { /* position of start of matched portion */ - sprintf(buf, "%d ", b + 1); + sprintf(buf, "%d ", MB_METASTRLEN2END(s, 0, s+b) + 1); ll +=3D (bl =3D strlen(buf)); } if (fl & SUB_EIND) { /* position of end of matched portion */ - sprintf(buf + bl, "%d ", e + 1); + sprintf(buf + bl, "%d ", MB_METASTRLEN2END(s, 0, s+e) + 1); ll +=3D (bl =3D strlen(buf)); } if (fl & SUB_LEN) { /* length of matched portion */ - sprintf(buf + bl, "%d ", e - b); + sprintf(buf + bl, "%d ", MB_METASTRLEN2END(s+b, 0, s+e)); ll +=3D (bl =3D strlen(buf)); } if (bl)