From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29463 invoked by alias); 27 Sep 2015 08:13:35 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36654 Received: (qmail 17388 invoked from network); 27 Sep 2015 08:13:34 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=3a5Cg9t4VB0nXwWHBXkW3hfZGrpK0fsTWwAd9PbJfl4=; b=wmWs+2l7f3IQKgTBDO+NLbaFNPHJSLE9MqSkkmDzg4DAuZqsw27J/D8acrY025DDPQ UYw+3SE0HhA65tEYmmwc5tZy1ARgX5/d1tyW5AkHm6KcwA4HM094VJDa0L5TQM2/ia4k L2vboOMvhZhZfmAG2PtiwDVjSdXBRxDdfyMGmg4fKbxJXrJzPMuzI2Yze+g0faRa6ZBw UPi3BOmWnl4TifNwbTUzB6nHsY8ZQq6u6gvYpGP9tDHu0Y97xbQ9smsjH0fle3bvhU6a JlAMhiYxNP6csPcSjL0pzOfljQWXdbxVTwOmBiloxoFH/AvO7YtjjxgLvPsUln/Fouqo OY7w== X-Received: by 10.152.6.133 with SMTP id b5mr3754105laa.98.1443341611352; Sun, 27 Sep 2015 01:13:31 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <150926134410.ZM17546@torch.brasslantern.com> References: <150926134410.ZM17546@torch.brasslantern.com> From: Sebastian Gniazdowski Date: Sun, 27 Sep 2015 10:13:11 +0200 Message-ID: Subject: Re: Substitution ${...///} slows down when certain UTF character occurs To: zsh-workers@zsh.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 26 September 2015 at 22:44, Bart Schaefer wr= ote: > On Sep 26, 2:19pm, Sebastian Gniazdowski wrote: > } > } I attach a script that does ${...///} substitution. > > I worry that the attachement hasn't come through correctly? When I > unpack the base64 into text, I get (in part) > > str=3D"c4d5148ca6 ce3a2d24203abfb385 30f5fe85434ae ... 5d468f6" > > Is the value of $str supposed to look like that? So the pattern in > the ${str//...} replacement never matches? Yes. I attached the string instead of code that generated it: # cat /dev/urandom | env LC_CTYPE=3DC tr -cd 'a-f0-9 ' | head -c 120000 > } It is very slow for some chars and very fast for others. How to explai= n > } and hopefully fix this? > > Each time pattryrefs() fails to find a match, it increments the area > to be searched by one character and then tries the entire pattern > match again. So for a 120000-character string, it's doing a non- > matching search 120000 times. That's a huge plus that it's still instant fast for strings of that length if there is no unlucky unicode character. > I rewrote your test to use "float SECONDS" + "print $SECONDS" instead > of forking off subshells for "time" and to use loops so I didn't have > to comment things in and out. Observations: > > 1. It's only fast for the Yen symbol, which is the only one that does > not have a byte with the high-order bit set. This case is avoiding > this block in pattern.c: For me (OSX / zsh 5.0.2) it was fast for characters at even positions in what I attached, i.e. for chars =C2=A5,=C5=81,=C7=9E. Didn't thought it = can differ for different environments, I now ran the test on different machines. Ubuntu 12.10 / zsh 5.0.0 is the same. For FreeBSD / zsh 5.1.1-dev-0 (HEAD 50721a1 and 8d5c0c) it's different, fast characters are: =C2=A5, =C5= =81. For zsh-5.1.1-dev-0 (HEAD 50721a1 and 8d5c0c) on OSX it's the same as the FreeBSD case. Best regards, Sebastian Gniazdowski