Re: compmatch behaviour - Peter Stephenson

zsh-workers
 help / color / mirror / code / Atom feed

From: Peter Stephenson <pws@csr.com>
To: zsh-workers@sunsite.dk (Zsh hackers list)
Subject: Re: compmatch behaviour
Date: Mon, 19 May 2008 11:34:21 +0100	[thread overview]
Message-ID: <20080519113421.2ded4a42@news01> (raw)
In-Reply-To: <080518165753.ZM4385@torch.brasslantern.com>

On Sun, 18 May 2008 16:57:53 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> So if I comprehend your question, it's not that you need help figuring
> out what the code is doing.  You want help figuring out what it should
> do instead, because the space of all possible wide characters is much
> to big to brute-force it the way Sven originally did.
> 
> Right?

Exactly.  Thanks for looking.

> There are two situations being handled
> simultaneously here, and maybe the first thing to do is to separate
> them.  The first situation is where wpat is a correspondence class
> and we need to select the corresponding position out of lpat.  The
> second case is where lpat is an equivalence class and we need to try
> every possible character in the class at line position *lp.

Hmm... terminology first... Sven's "correspondence class" appears to be the
one with the "equiv" flag set, i.e. {...}.  So here the characters are
numbered and we are searching for a particular one.  So currently (no
generic character sets) we can do this without looping (conceptually---the
implementation needs a loop to find the character but only needs to test
that one character).

However, in my rewrite I want to be able to say "any upper case character"
so that it can match the corresponding lower case character.  I can
count through the characters or [:stuff:] elements taking account of
ranges specially... but then I may just have a pattern such as [:upper:],
so I still need to match against the line.  So this becomes like the other
case.

(In case that's not clear: in the existing implementation we have "{a-z}"
where a is indexed 1, b is 2, etc. etc.  Now I have "{[:upper:]}".  Here
the only index is 1, but it comes along with a pattern matching instruction
"match any upper case character", and on the other side of the correpondence
"{[:lower:]}" is interpreted specially when we call pattern_match() as "the
corresponding lower case character".)

> I think the first think that's needed is to change the Cpattern struct
> from a dense array indexed by the ascii value of a character, into ...
> well, something else, so that it's not necessary to iterate over it
> in the first case, and so the iteration is more sparse in the second
> case.  Of course, that may make pattern_match() more complicated ....

I've already done this, that's the easy bit.  The two flies in the ointment
are this loop and wide character modifications.

> I'm not sure there's any way to avoid iterating in the second case.
> Besides handling two possible ways of interpreting the character class,
> I think (without tracing very far up the call chain) that this is also
> constructing the prefix string that's going to be shown to the user in
> the event of a common prefix shared by ambiguous matches.  It's not
> enough to check whether the test character is in the class, you might
> have to know *which* character it is in the class, so to speak.

(so you can see how this reflects on my "{[:upper:]}" problem for
correspondence classes.)

Even if iteration is necessary, which is plausible, I can't help thinking
there should be a way of iterating over positions in the word/line and
trying to match at each point with the pattern, looking to see which
character matched, rather than blindly trying characters at each position,
which won't work for every upper case character in Unicode.  If we can work
out how to change it along those lines then I think we've solved it.

(This is certainly a different problem from quadratic behaviour; it's more
along the lines of the way the ${foo//.../...} code works, or at least I
hope it is because that problem is soluble.)

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070

next prev parent reply	other threads:[~2008-05-19 10:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-18 19:01 Peter Stephenson
2008-05-18 23:57 ` Bart Schaefer
2008-05-19 10:34   ` Peter Stephenson [this message]
2008-05-19 15:11     ` Bart Schaefer
2008-05-21 17:28 ` Andrey Borzenkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080519113421.2ded4a42@news01 \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).