From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3519 invoked by alias); 28 Sep 2015 11:30:37 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36674 Received: (qmail 1240 invoked from network); 28 Sep 2015 11:30:36 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-AuditID: cbfec7f5-f794b6d000001495-57-560924d91490 Date: Mon, 28 Sep 2015 12:30:24 +0100 From: Peter Stephenson To: zsh-workers@zsh.org Subject: Re: Substitution ${...///} slows down when certain UTF character occurs Message-id: <20150928123024.4f6bd65c@pwslap01u.europe.root.pri> In-reply-to: <20150928095142.385a33eb@pwslap01u.europe.root.pri> References: <150926134410.ZM17546@torch.brasslantern.com> <150927091121.ZM25721@torch.brasslantern.com> <20150928095142.385a33eb@pwslap01u.europe.root.pri> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsVy+t/xK7o3VTjDDNb/VrQ42PyQyYHRY9XB D0wBjFFcNimpOZllqUX6dglcGVvbF7IXfBCq6PoU1cDYzt/FyMkhIWAisejhIhYIW0ziwr31 bF2MXBxCAksZJXoXH2GBcGYwSZy/3wXlbGOUWPFjCTtIC4uAqsStXbeZQWw2AUOJqZtmM4LY IgLiEmfXngdq4OAQFgiQ2P05GCTMK2Av8bzzNlgrp4CDxKs5zewQM9czSSycM4MJJMEvoC9x 9e8nJoiT7CVmXjnDCNEsKPFj8j2wU5kFtCQ2b2tihbDlJTaveQt2g5CAusSNu7vZJzAKzULS MgtJyywkLQsYmVcxiqaWJhcUJ6XnGukVJ+YWl+al6yXn525ihATt1x2MS49ZHWIU4GBU4uGd qc4RJsSaWFZcmXuIUYKDWUmE10mUM0yINyWxsiq1KD++qDQntfgQozQHi5I478xd70OEBNIT S1KzU1MLUotgskwcnFINjJyCa5d/sLaf+9G3mv2dWk5H3QVFQ5nI2yu8fvFPcA9yZFke8Mo5 U/nxn+POEwoN8y2EzzmwxXBEr268vSk3IeHX9gCtuTP7F6csfRGZteztLIsYgyS/3y0qdUYz JFvXujY+cDpkZal3rqmjSORKtmTfv4UOxZ9WzenX2LzXu89tR62gyX5fJZbijERDLeai4kQA nVCAMVYCAAA= On Mon, 28 Sep 2015 09:51:42 +0100 Peter Stephenson wrote: > I think a reasonable strategy would be to change the call sequence for > pattrylen() and pattryrefs(), which are the key ones, to pass in an > optional unmatefied string; some of the remaining calls in glob.c could > be premoted to pattrylen which is a strict superset of pattry. That > would leave pattry() untouched for the majority of cases doing one-off > matching. > > Ideally we only want to pass in either a metafied or unmetafied string. > I don't know off the top of my head how much work it is to fix up the > PAT_PURES optimisation where we've got an already unmetafied string but > it shouldn't be too much. The problem here is we're comparing against a string compiled into the pattern which is metafied and now we have an unmetafied trial string. So we can't do a direct comparison any more without some extra work. 1. Give up on the optimisation when we have an unmetafied string. That is, we'll still be comparing characters, but in the bowels of the pattern code --- we won't optimise to a strcmp(). This seems a bad thing to do when the whole point of the change is as an optimisation. 2. Use a partial optimisation by unmetafying the pattern string on the fly. So we're not using memcmp any more, but we'll have a tight loop over characters and this can be done with local code at the point where we currently do the memcmp(). 3. Compile both metafied and unmetafied variants into the pattern. This is wasteful. 4. Have both metafied and unmetafied variants for the pattern when using a pure string, but only produce, and cache, the unmetafied version when needed for comparison. This is more effective than caching the trial string because the pattern is only compiled once for many uses of it --- we only lose out here if somebody is looping over a pattern (not just a trial string as in the glob code) many times i.e. either redoing patcompile() or using a pre-compiled pattern, and the latter isn't all that common in the code (I'm not sure where it does happen if it does). This seems to push the inefficiency out of inner loops to a frequency where it's probably not a noticeable factor any more. 5. Deal with both metafied and unmetafied strings in the calling code. This is a messy last resort. I think both 2. and 4. look promising. pws