From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22145 invoked by alias); 17 Sep 2015 15:14:07 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 20572 Received: (qmail 24241 invoked from network); 17 Sep 2015 15:14:04 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=K2PfrXAcJcpBGdUzzu2px5lmuHvFzzdyjMJvVpnthfM=; b=QDNS0hz4p4wAylhiKfzFj1WM8Cl1iG+1a7qv3hdnFNdOAZ8juKTQIvIWcitvArl1t4 GUpW9ld+JeQRquc6JhmjZ5e9d37q+47GhH8PPoQD7yVU10F0q127y3/8fv4Bpp25Bbvz 0FehelGxl0l6RLAtfMT2zuPsVs1k3fPX7o+11gmAUumP2abQTJQRGqwXNV6FIjp4RN+H LKNGACjvfEMw5dWYmJF2gfsQq+vhUQW3C6KLYOu9GDsP5S2Nv7t8mHSnPnN48QbpoHOZ ksAsyma+XATzMh1zPbCHnDXMPFLIvu80lhu0Poy/Pr/U6W8oyxlyDwvqtYfgL8YSsErv +yaA== X-Gm-Message-State: ALoCoQnjjfP0CprfR9fUtyOWTjhLv0VmG5xWQkQbA+Ec+pktfsKfugmXyFHdywEOi2RyR96bBZuS MIME-Version: 1.0 X-Received: by 10.194.87.74 with SMTP id v10mr10603320wjz.114.1442502841838; Thu, 17 Sep 2015 08:14:01 -0700 (PDT) X-Originating-IP: [193.174.53.84] In-Reply-To: References: Date: Thu, 17 Sep 2015 17:14:01 +0200 Message-ID: Subject: Re: Match length and multibyte characters From: Erik Bernstein To: "Jun T." Cc: zsh-users@zsh.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable >> % array=3D(a =C3=A4 a) >> % print ${${(O)array//(#m)*/${#MATCH}}[1]} ${${(ON)array%%*}[1]} >> 1 2 >> >> Can maybe someone shed some light on whether the second version is >> supposed to work with multibyte characters and, > > The second version returns 2 because =C3=A4 is a 2 byte character in UTF-= 8. > This is a bug of the current zsh; all the flags N, B and E do not work > well with multibyte characters in ${...#...}, ${...%...} etc. Thanks for clearing that up. I was just unsure whether this is really a bug or if there's another flag that I have to apply in order to make it work with unicode characters, too. > The patch below may fix the bug. This is what I get after applying your patch: /home/debian/zsh-5.0.7/obj/Src/../../Src/glob.c:2489: undefined reference to `MB_METASTRLEN2END' /home/debian/zsh-5.0.7/obj/Src/../../Src/glob.c:2495: undefined reference to `MB_METASTRLEN2END' /home/debian/zsh-5.0.7/obj/Src/../../Src/glob.c:2483: undefined reference to `MB_METASTRLEN2END' Might be due to my old version of 5.0.7, I didn't try 5.1.1. In any case, I'd rather work around this bug until it gets fixed upstream than patch each zsh on all of my machines individually. > BTW, in your example, it is better to replace the flag (O) by (On) True. I've used (On) during my tests but then forgot the crucial n in my posting. Best erik