zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <p.stephenson@samsung.com>
To: zsh-workers@zsh.org
Subject: Re: Substitution ${...///} slows down when certain UTF character occurs
Date: Mon, 28 Sep 2015 12:30:24 +0100	[thread overview]
Message-ID: <20150928123024.4f6bd65c@pwslap01u.europe.root.pri> (raw)
In-Reply-To: <20150928095142.385a33eb@pwslap01u.europe.root.pri>

On Mon, 28 Sep 2015 09:51:42 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> I think a reasonable strategy would be to change the call sequence for
> pattrylen() and pattryrefs(), which are the key ones, to pass in an
> optional unmatefied string; some of the remaining calls in glob.c could
> be premoted to pattrylen which is a strict superset of pattry.  That
> would leave pattry() untouched for the majority of cases doing one-off
> matching.
> 
> Ideally we only want to pass in either a metafied or unmetafied string.
> I don't know off the top of my head how much work it is to fix up the
> PAT_PURES optimisation where we've got an already unmetafied string but
> it shouldn't be too much.

The problem here is we're comparing against a string compiled into the
pattern which is metafied and now we have an unmetafied trial string.
So we can't do a direct comparison any more without some extra work.

1. Give up on the optimisation when we have an unmetafied string.  That
is, we'll still be comparing characters, but in the bowels of the
pattern code --- we won't optimise to a strcmp().  This seems a bad
thing to do when the whole point of the change is as an optimisation.

2. Use a partial optimisation by unmetafying the pattern string on the
fly.  So we're not using memcmp any more, but we'll have a tight loop
over characters and this can be done with local code at the point where
we currently do the memcmp().

3. Compile both metafied and unmetafied variants into the pattern.  This
is wasteful.

4. Have both metafied and unmetafied variants for the pattern when using
a pure string, but only produce, and cache, the unmetafied version when
needed for comparison.  This is more effective than caching the trial
string because the pattern is only compiled once for many uses of it ---
we only lose out here if somebody is looping over a pattern (not just a
trial string as in the glob code) many times i.e. either redoing
patcompile() or using a pre-compiled pattern, and the latter isn't all that
common in the code (I'm not sure where it does happen if it does).
This seems to push the inefficiency out of inner loops to a frequency
where it's probably not a noticeable factor any more.

5. Deal with both metafied and unmetafied strings in the calling code.
This is a messy last resort.

I think both 2. and 4. look promising.

pws


  reply	other threads:[~2015-09-28 11:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-26 12:19 Sebastian Gniazdowski
2015-09-26 20:44 ` Bart Schaefer
2015-09-27  8:13   ` Sebastian Gniazdowski
2015-09-27 16:11     ` Bart Schaefer
2015-09-28  8:51       ` Peter Stephenson
2015-09-28 11:30         ` Peter Stephenson [this message]
2015-09-28 19:23         ` Peter Stephenson
2015-09-29  8:44           ` Peter Stephenson
2015-09-29 18:37             ` Peter Stephenson
2015-09-29 19:23               ` Bart Schaefer
2015-09-30  8:59                 ` Peter Stephenson
2015-09-30 14:04                   ` Peter Stephenson
2015-09-30 21:19                     ` Bart Schaefer
2015-10-01  8:41                       ` Peter Stephenson
2015-10-01 14:28                         ` Heap corruption [the thread formerly known as substitution] Peter Stephenson
2015-10-01 15:07                           ` Bart Schaefer
2015-10-01 15:13                           ` Peter Stephenson
2015-10-03 18:59                             ` Peter Stephenson
2015-10-01 13:45       ` Substitution ${...///} slows down when certain UTF character occurs Sebastian Gniazdowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150928123024.4f6bd65c@pwslap01u.europe.root.pri \
    --to=p.stephenson@samsung.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).