zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <p.stephenson@samsung.com>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: invalid characters and multi-byte [x-y] ranges
Date: Thu, 03 Sep 2015 10:00:37 +0100	[thread overview]
Message-ID: <20150903100037.6e6ac852@pwslap01u.europe.root.pri> (raw)
In-Reply-To: <20150902230711.GA4967@chaz.gmail.com>

On Thu, 3 Sep 2015 00:07:11 +0100
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> is this (in a UTF-8 locale):
> 
> $ zsh -c $'[[ \xcc = [\uaa-\udd] ]]' && echo yes
> yes
> 
> expected or desirable?

This comes from the function charref() in pattern.c.  We discover the
sequence is incomplete / invalid and don't know what to do with it, so we
simply treat the single byte as a character:

	return (wchar_t) STOUC(*x);

(the macro ensures we get an unsigned value to cast).  Typically this
will do what you see (though wchar_t isn't guaranteed to have that
property).

I'm not sure what else to do here.  The function is used all over the
pattern code so anything other than tweak the code locally to return
another character (what?) is horrific to get consistent.  We don't want
[[ $'\xcc' = $'\xdd' ]] to succeed, but ideally we do want [[ $'\xcc' =
$'\xcc' ]] to succeed comparing raw bytes (we're not morally forced to
do that in a UTF-8 locale, I don't think, but it wouldn't be very
helpful if it didn't work).

If wchar_t is 32 bits (the only place where it wasn't used to be Cygwin
but I think that's changed) we could cheat by adding (wchar_t)0x7FFFFF00
to it... that would fix your problem and (I hope) keep the two above
working, and minimsie the likelihood of generating a valid character...
that's about the least horrific I can come up with.

pws


  reply	other threads:[~2015-09-03  9:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-02 23:07 Stephane Chazelas
2015-09-03  9:00 ` Peter Stephenson [this message]
2015-09-03 10:09   ` Stephane Chazelas
2015-09-03 14:18     ` Peter Stephenson
2015-09-04 10:53       ` Ismail Donmez
2015-09-04 11:47         ` Peter Stephenson
2015-09-04 12:35           ` Peter Stephenson
2015-09-04 15:02             ` Ismail Donmez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150903100037.6e6ac852@pwslap01u.europe.root.pri \
    --to=p.stephenson@samsung.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).