zsh-workers
 help / color / mirror / code / Atom feed
From: Stephane Chazelas <stephane.chazelas@gmail.com>
To: Sebastian Gniazdowski <sgniazdowski@gmail.com>
Cc: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: [PATCH] [[:blank:]] only matches on SPC and TAB
Date: Mon, 14 May 2018 07:36:11 +0100	[thread overview]
Message-ID: <20180514063611.GA7263@chaz.gmail.com> (raw)
In-Reply-To: <CAKc7PVDyrTMsmBSEDcMC=CNVCjOnEDVtywRYA0=UnNCBpF=7JQ@mail.gmail.com>

2018-05-14 04:27:46 +0200, Sebastian Gniazdowski:
> On 13 May 2018 at 23:25, Stephane Chazelas <stephane.chazelas@gmail.com>
> wrote:
> 
> > I noticed that [[:blank:]] was not matching on non-ASCII blank
> > characters. In a typical UTF-8 GNU locale, [[:blank:]] normally
> > includes
> >
> 
> Let's be conservative. [[:blank:]] matches 2 characters, [[:space:]]
> matches Unicode ones that you want to add.
[...]

That's not true.

[[:blank:]] is horizontal spacing characters (like \h in perl),
[[:space:]] is all spacing characters (like \s in perl),
including vertical ones like \v, \f, \n...

On some systems (like the ones that follow ISO/IEC 30112 such as
GNU), that's excluding the ones that should not be considered as
delimiters (like U+00A0 the non-breaking space).

[[:blank:]], [[:space:]]... are POSIX character classes,
supported by most utilities that do wildcard or regexp matching.

I know of no other utility than zsh whose [[:space:]] includes
all the characters classified as "space" in the locale and where
[[:blank:]] doesn't include all the "blank" ones.

That struck me as very odd when I found that out yesterday and
is inconsistent with all other shells. But because that meant
extra code was added for that, I wondered if maybe that was
intentional.

It seems to me that if you wanted to match on only SPC and TAB
and not the other horizontal spacing characters classified as
such in the locale, you should use [ $'\t']. See also [[:IFS:]]
and [[:IFSSPACE:]] though they depend on the value of $IFS and
include \n by default (and \0 for [[:IFS:]]).

Now it's true that most people only care about SPC and TAB, and
since there's so much variation between systems as to what is
classified as "blank" (same for "alpha"... for that matters), it
probably doesn't matter that much. U+00A0 is probably the only
other horizontal spacing character that people are likely to
find in text that zsh is going to match [[:blank:]] against and
every other system doesn't consider it as "blank" (or "space"
for that matters).

-- 
Stephane


  parent reply	other threads:[~2018-05-14  6:36 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-13 21:25 Stephane Chazelas
2018-05-13 21:49 ` [PATCH v2] " Stephane Chazelas
2018-05-14  2:27 ` [PATCH] " Sebastian Gniazdowski
2018-05-14  4:41   ` Sebastian Gniazdowski
2018-05-14  6:36   ` Stephane Chazelas [this message]
2018-05-14  6:44     ` Stephane Chazelas
2018-05-14  8:47       ` Peter Stephenson
2018-05-14 12:34         ` Stephane Chazelas
2018-05-14 13:50           ` Peter Stephenson
2018-05-14 15:51             ` Stephane Chazelas
2018-05-14 16:31               ` Sebastian Gniazdowski
2018-05-14 16:50                 ` Bart Schaefer
2018-05-14 19:52                   ` Daniel Tameling
2018-05-14 20:42                     ` Stephane Chazelas
2018-05-15 18:12                       ` Stephane Chazelas
2018-05-16  4:18                         ` Sebastian Gniazdowski
2018-05-15 19:06               ` Oliver Kiddle
2018-05-16 13:15                 ` Stephane Chazelas
2018-05-16 13:40                   ` Peter Stephenson
2018-05-16 16:31                     ` Stephane Chazelas
2018-05-16 21:02                       ` [PATCH v4] " Stephane Chazelas
2018-05-17  8:29                         ` Peter Stephenson
2018-05-17 22:05                       ` [PATCH] " Oliver Kiddle
2018-05-17  9:03           ` Sebastian Gniazdowski
2018-05-17 10:10             ` Sebastian Gniazdowski
2018-05-14  8:11     ` Sebastian Gniazdowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180514063611.GA7263@chaz.gmail.com \
    --to=stephane.chazelas@gmail.com \
    --cc=sgniazdowski@gmail.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).