From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21727 invoked by alias); 28 Sep 2015 08:52:17 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36672 Received: (qmail 28453 invoked from network); 28 Sep 2015 08:52:16 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-AuditID: cbfec7f4-f79c56d0000012ee-41-5608ffbb621b Date: Mon, 28 Sep 2015 09:51:42 +0100 From: Peter Stephenson To: zsh-workers@zsh.org Subject: Re: Substitution ${...///} slows down when certain UTF character occurs Message-id: <20150928095142.385a33eb@pwslap01u.europe.root.pri> In-reply-to: <150927091121.ZM25721@torch.brasslantern.com> References: <150926134410.ZM17546@torch.brasslantern.com> <150927091121.ZM25721@torch.brasslantern.com> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsVy+t/xy7q7/3OEGXw/KmJxsPkhkwOjx6qD H5gCGKO4bFJSczLLUov07RK4MvZdOcResI6r4vu0WSwNjOs5uhg5OSQETCT+7VjPCmGLSVy4 t56ti5GLQ0hgKaPE06uHWEASQgIzmCRadyZAJLYxSjyYdowJJMEioCrx8GQ/O4jNJmAoMXXT bEYQW0RAXOLs2vNAzRwcwgIBErs/B4OEeQXsJX4euA42k1PASmLL7ZOMEDN/Mkr8ufAPrJdf QF/i6t9PTBAX2UvMvHKGEaJZUOLH5HtgzcwCWhKbtzWxQtjyEpvXvGWGOFRd4sbd3ewTGIVm IWmZhaRlFpKWBYzMqxhFU0uTC4qT0nMN9YoTc4tL89L1kvNzNzFCgvbLDsbFx6wOMQpwMCrx 8M5Q5wgTYk0sK67MPcQowcGsJML79wdQiDclsbIqtSg/vqg0J7X4EKM0B4uSOO/cXe9DhATS E0tSs1NTC1KLYLJMHJxSDYw8ansK824aKYqWfaisn+B+WEcp8EvfF91LW5iM1iy3fiRwTZp5 9/z1llN22GrPYFv+U9Vk/QP3yd2rQ+962S24UiiQEMHaYzoj6erWwh+e65xs1Ms+tZZLV2bu tO078+D2+ijRyyc3fV2dNHtDpsX89H/774XP9J3G+N625I/aAotUhU7RztVKLMUZiYZazEXF iQCr/AVwVgIAAA== On Sun, 27 Sep 2015 09:11:21 -0700 Bart Schaefer wrote: > Still I think the biggest issue is that unmetafication happening too > low down. Since pattry*() is being called repeatedly with the same two > first arguments (prog and string) it might be possible to cache the > unmetafied string after the first call. It would be good to optimise the cases where the calling code in glob.c (for the parameter-style operators only) matches at different places along the string, too, which means unmetafying at a higher level. That's quite a lot of work in the glob.c code, though, because we'll need to deal with length and switch the multibyte handlers to umetafied. I think a reasonable strategy would be to change the call sequence for pattrylen() and pattryrefs(), which are the key ones, to pass in an optional unmatefied string; some of the remaining calls in glob.c could be premoted to pattrylen which is a strict superset of pattry. That would leave pattry() untouched for the majority of cases doing one-off matching. Ideally we only want to pass in either a metafied or unmetafied string. I don't know off the top of my head how much work it is to fix up the PAT_PURES optimisation where we've got an already unmetafied string but it shouldn't be too much. pws