zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] [[:blank:]] only matches on SPC and TAB
@ 2018-05-13 21:25 Stephane Chazelas
  2018-05-13 21:49 ` [PATCH v2] " Stephane Chazelas
  2018-05-14  2:27 ` [PATCH] " Sebastian Gniazdowski
  0 siblings, 2 replies; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-13 21:25 UTC (permalink / raw)
  To: Zsh hackers list

I noticed that [[:blank:]] was not matching on non-ASCII blank
characters. In a typical UTF-8 GNU locale, [[:blank:]] normally
includes

 U+0009 CHARACTER TABULATION
 U+0020 SPACE
 U+1680 OGHAM SPACE MARK
 U+2000 EN QUAD
 U+2001 EM QUAD
 U+2002 EN SPACE
 U+2003 EM SPACE
 U+2004 THREE-PER-EM SPACE
 U+2005 FOUR-PER-EM SPACE
 U+2006 SIX-PER-EM SPACE
 U+2008 PUNCTUATION SPACE
 U+2009 THIN SPACE
 U+200A HAIR SPACE
 U+205F MEDIUM MATHEMATICAL SPACE
 U+3000 IDEOGRAPHIC SPACE

On FreeBSD:

 U+0009 CHARACTER TABULATION
 U+0020 SPACE
 U+00A0 NO-BREAK SPACE
 U+FEFF ZERO WIDTH NO-BREAK SPACE

(Strangely enough U+00A0 is not classified as blank in single
byte charsets like ISO8859-1 there)

The code indeed matches on SPC and TAB explicitly both in the
multibyte and singlebyte cases (the non-breaking space is one
non-ASCII character that appears in a few singlebyte charsets
and is considered as blank on some systems (not GNU ones)).

In case that was not intentional, this patch should fix it:

diff --git a/Src/pattern.c b/Src/pattern.c
index fc7c737..d3eac44 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3605,7 +3605,7 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == L' ' || ch == L'\t')
+		if (iswblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
@@ -3840,7 +3840,7 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == ' ' || ch == '\t')
+		if (isblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2] [[:blank:]] only matches on SPC and TAB
  2018-05-13 21:25 [PATCH] [[:blank:]] only matches on SPC and TAB Stephane Chazelas
@ 2018-05-13 21:49 ` Stephane Chazelas
  2018-05-14  2:27 ` [PATCH] " Sebastian Gniazdowski
  1 sibling, 0 replies; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-13 21:49 UTC (permalink / raw)
  To: Zsh hackers list

2018-05-13 22:25:53 +0100, Stephane Chazelas:
[...]
> In case that was not intentional, this patch should fix it:
[...]

It was missing the autoconf check:

diff --git a/Src/pattern.c b/Src/pattern.c
index fc7c737..d3eac44 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3605,7 +3605,7 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == L' ' || ch == L'\t')
+		if (iswblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
@@ -3840,7 +3840,7 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == ' ' || ch == '\t')
+		if (isblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
diff --git a/configure.ac b/configure.ac
index d15a6cd..4f1eab8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2563,10 +2563,10 @@ AC_HELP_STRING([--enable-multibyte], [support multibyte characters]),
 [AC_CACHE_VAL(zsh_cv_c_unicode_support,
   AC_MSG_NOTICE([checking for functions supporting multibyte characters])
   [zfuncs_absent=
-   for zfunc in iswalnum iswcntrl iswdigit iswgraph iswlower iswprint \
-iswpunct iswspace iswupper iswxdigit mbrlen mbrtowc towupper towlower \
-wcschr wcscpy wcslen wcsncmp wcsncpy wcrtomb wcwidth wmemchr wmemcmp \
-wmemcpy wmemmove wmemset; do
+   for zfunc in iswalnum iswblank iswcntrl iswdigit iswgraph iswlower \
+     iswprint iswpunct iswspace iswupper iswxdigit mbrlen mbrtowc \
+     towupper towlower wcschr wcscpy wcslen wcsncmp wcsncpy wcrtomb \
+     wcwidth wmemchr wmemcmp wmemcpy wmemmove wmemset; do
      AC_CHECK_FUNC($zfunc,
      [:], [zfuncs_absent="$zfuncs_absent $zfunc"])
     done


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-13 21:25 [PATCH] [[:blank:]] only matches on SPC and TAB Stephane Chazelas
  2018-05-13 21:49 ` [PATCH v2] " Stephane Chazelas
@ 2018-05-14  2:27 ` Sebastian Gniazdowski
  2018-05-14  4:41   ` Sebastian Gniazdowski
  2018-05-14  6:36   ` Stephane Chazelas
  1 sibling, 2 replies; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-14  2:27 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 641 bytes --]

On 13 May 2018 at 23:25, Stephane Chazelas <stephane.chazelas@gmail.com>
wrote:

> I noticed that [[:blank:]] was not matching on non-ASCII blank
> characters. In a typical UTF-8 GNU locale, [[:blank:]] normally
> includes
>

Let's be conservative. [[:blank:]] matches 2 characters, [[:space:]]
matches Unicode ones that you want to add. We have a choice. Existing code
that was written with ascii [[:blank:]] in mind. Something might break. I'm
currently coding a new plugin and literally have chosen [[:blank:]] because
it's not unicode spaces. Such platform-like things shouldn't change this
way.

-- 
Best regards,
Sebastian Gniazdowski

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14  2:27 ` [PATCH] " Sebastian Gniazdowski
@ 2018-05-14  4:41   ` Sebastian Gniazdowski
  2018-05-14  6:36   ` Stephane Chazelas
  1 sibling, 0 replies; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-14  4:41 UTC (permalink / raw)
  To: Zsh hackers list

On 14 May 2018 at 04:27, Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:
> I'm currently coding a new plugin and literally have chosen [[:blank:]] because it's not unicode spaces.


I thought that I'll prove this – follows main code from my ini-file parser:

while read -r -t 1 __line; do
if [[ "$__line" = (#b)[[:blank:]]#\[([^\]]##)\][[:blank:]]# ]]; then
__cur_section="${match[1]}"
elif [[ "$__line" =
(#b)[[:blank:]]#([^[:blank:]=]##)[[:blank:]]#[=][[:blank:]]#(*) ]];
then
match[2]="${match[2]%"${match[2]##*[! $'\t']}"}"
__access_string="${__out_hash}[${__key_prefix}<$__cur_section>_${match[1]}]"
: "${(P)__access_string::=${match[2]}}"
fi
done < "$__ini_file"

[[:blank:]] is like a platform. I've really gone into paranoid state
that my platform will change, so I'm even proving this. If I would
want users to use unicode spaces in ini-file, I would use [[:space:]].
Let's not discard this degree of freedom.

Whole code:
https://github.com/zdharma/the-z-invoker/blob/master/-zflai_read_ini_file
-- 
Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14  2:27 ` [PATCH] " Sebastian Gniazdowski
  2018-05-14  4:41   ` Sebastian Gniazdowski
@ 2018-05-14  6:36   ` Stephane Chazelas
  2018-05-14  6:44     ` Stephane Chazelas
  2018-05-14  8:11     ` Sebastian Gniazdowski
  1 sibling, 2 replies; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-14  6:36 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh hackers list

2018-05-14 04:27:46 +0200, Sebastian Gniazdowski:
> On 13 May 2018 at 23:25, Stephane Chazelas <stephane.chazelas@gmail.com>
> wrote:
> 
> > I noticed that [[:blank:]] was not matching on non-ASCII blank
> > characters. In a typical UTF-8 GNU locale, [[:blank:]] normally
> > includes
> >
> 
> Let's be conservative. [[:blank:]] matches 2 characters, [[:space:]]
> matches Unicode ones that you want to add.
[...]

That's not true.

[[:blank:]] is horizontal spacing characters (like \h in perl),
[[:space:]] is all spacing characters (like \s in perl),
including vertical ones like \v, \f, \n...

On some systems (like the ones that follow ISO/IEC 30112 such as
GNU), that's excluding the ones that should not be considered as
delimiters (like U+00A0 the non-breaking space).

[[:blank:]], [[:space:]]... are POSIX character classes,
supported by most utilities that do wildcard or regexp matching.

I know of no other utility than zsh whose [[:space:]] includes
all the characters classified as "space" in the locale and where
[[:blank:]] doesn't include all the "blank" ones.

That struck me as very odd when I found that out yesterday and
is inconsistent with all other shells. But because that meant
extra code was added for that, I wondered if maybe that was
intentional.

It seems to me that if you wanted to match on only SPC and TAB
and not the other horizontal spacing characters classified as
such in the locale, you should use [ $'\t']. See also [[:IFS:]]
and [[:IFSSPACE:]] though they depend on the value of $IFS and
include \n by default (and \0 for [[:IFS:]]).

Now it's true that most people only care about SPC and TAB, and
since there's so much variation between systems as to what is
classified as "blank" (same for "alpha"... for that matters), it
probably doesn't matter that much. U+00A0 is probably the only
other horizontal spacing character that people are likely to
find in text that zsh is going to match [[:blank:]] against and
every other system doesn't consider it as "blank" (or "space"
for that matters).

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14  6:36   ` Stephane Chazelas
@ 2018-05-14  6:44     ` Stephane Chazelas
  2018-05-14  8:47       ` Peter Stephenson
  2018-05-14  8:11     ` Sebastian Gniazdowski
  1 sibling, 1 reply; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-14  6:44 UTC (permalink / raw)
  To: Sebastian Gniazdowski, Zsh hackers list

2018-05-14 07:36:11 +0100, Stephane Chazelas:
[...]
> That struck me as very odd when I found that out yesterday and
> is inconsistent with all other shells. But because that meant
> extra code was added for that, I wondered if maybe that was
> intentional.
[...]

Looking at the Changelog, I see:

Tue Oct 13 21:42:47 1998  Andrew Main  <zefram@zsh.org>

        * Doc/Zsh/expn.yo, Src/glob.c: Add the [:blank:] character class
          required by POSIX, which has no corresponding ctype macro.

        * Doc/Zsh/expn.yo, Misc/globtests, Src/glob.c, Src/lex.c:
          Add POSIX globbing character classes ([:alnum:] etc.).
          (pws, 4209+4212)


Which explains why it's not using isblank() and strongly
suggests that it was not intentional.

Looking at POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/isblank.html

> First released in Issue 6. Derived from the ISO/IEC 9899:1999 standard.

So it's /relatively/ recent (late 90s). Do we also need an
autoconf check for isblank() or can we assume that all systems
zsh is supported on have it?

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14  6:36   ` Stephane Chazelas
  2018-05-14  6:44     ` Stephane Chazelas
@ 2018-05-14  8:11     ` Sebastian Gniazdowski
  1 sibling, 0 replies; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-14  8:11 UTC (permalink / raw)
  To: Sebastian Gniazdowski, Zsh hackers list

On 14 May 2018 at 08:36, Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> [[:blank:]], [[:space:]]... are POSIX character classes,
> supported by most utilities that do wildcard or regexp matching.
>
> I know of no other utility than zsh whose [[:space:]] includes
> all the characters classified as "space" in the locale and where
> [[:blank:]] doesn't include all the "blank" ones.

Do you think some middle-way is possible? I mean, enabling :blank: to
that much new characters and observing ML for user reports (who knows,
maybe there wouldn't be many or any, but yeah, "who knows") is like
compiling 32 bit product on 64 bit compiler and continue selling it
without break. Well, to be honest, I was hired in one work when this
happened and it worked. By middle-way I mean: to look at the possible
characters, recognize ones that are crucial for lowering the oddity of
Zshell's :blank:, and include just a few in :blank:, doing some
thinking if some of chosen characters doesn't have a potential to
break something. It's a difficult situation because from one point of
view, nothing should break, exotic spaces don't occur often and even
if they did, they shouldn't break anything, the code should behave as
more robust. But from other point of view, any character added to
:blank: has its twin-code assigned that will break.

> It seems to me that if you wanted to match on only SPC and TAB
> and not the other horizontal spacing characters classified as
> such in the locale, you should use [ $'\t']. See also [[:IFS:]]
> and [[:IFSSPACE:]] though they depend on the value of $IFS and
> include \n by default (and \0 for [[:IFS:]]).

I've greped some projects to check if they used :blank::

- https://github.com/zsh-users/zsh-syntax-highlighting/blob/5b539663c0d740a0c00169d5ecbd58e47ff16252/highlighters/main/main-highlighter.zsh#L959

- https://github.com/zsh-users/zaw/blob/91c5e1a179ba543458e341a4d8e95c75f762f5c6/sources/ssh-hosts.zsh#L13

- Zshells distributed functions: _complete, _expand_alias,
_description, _main_complete, _bsd_pkg, compinstall, zcalc, etc. quite
many.

- my 3 past projects, most notably Zshelldoc, which parses scripts to
extract functions.

These uses aren't drastic, and I think all those projects would work
after replacing :blank: with :space:. But there's no way to be sure.
Who knows maybe some DevOp wanted to uplift some system at work and
proposed use of Zsh, and has :blank: in his scripts.

-- 
Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14  6:44     ` Stephane Chazelas
@ 2018-05-14  8:47       ` Peter Stephenson
  2018-05-14 12:34         ` Stephane Chazelas
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Stephenson @ 2018-05-14  8:47 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, 14 May 2018 07:44:31 +0100
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> Tue Oct 13 21:42:47 1998  Andrew Main  <zefram@zsh.org>
> 
>         * Doc/Zsh/expn.yo, Src/glob.c: Add the [:blank:] character
> class required by POSIX, which has no corresponding ctype macro.
> 
> Which explains why it's not using isblank() and strongly
> suggests that it was not intentional.

I think that's correct, but I tend to agree with Sebastian that some
caution is required here since it's not necessarily clear what action
with non-ASCII spaces is actually wanted when this is used.  I'd be
surprised if it actually broke anything, though.

pws


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14  8:47       ` Peter Stephenson
@ 2018-05-14 12:34         ` Stephane Chazelas
  2018-05-14 13:50           ` Peter Stephenson
  2018-05-17  9:03           ` Sebastian Gniazdowski
  0 siblings, 2 replies; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-14 12:34 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2018-05-14 09:47:33 +0100, Peter Stephenson:
> On Mon, 14 May 2018 07:44:31 +0100
> Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> > Tue Oct 13 21:42:47 1998  Andrew Main  <zefram@zsh.org>
> > 
> >         * Doc/Zsh/expn.yo, Src/glob.c: Add the [:blank:] character
> > class required by POSIX, which has no corresponding ctype macro.
> > 
> > Which explains why it's not using isblank() and strongly
> > suggests that it was not intentional.
> 
> I think that's correct, but I tend to agree with Sebastian that some
> caution is required here since it's not necessarily clear what action
> with non-ASCII spaces is actually wanted when this is used.  I'd be
> surprised if it actually broke anything, though.
[...]

I was going to say that surely, when someone uses [:blank:] that
means they want to trust the locale on the definition of
"blank", and I can't see why that should be different from other
character classes, but I just noticed that the documentation
actually says:

     [:blank:]
               The character is either space or tab

Instead of "horizontal whitespace". And on GNU systems,
"isblank(3)" also says its SPC and TAB:

     Returns true if C is a blank character; that is, a space or a tab.
     This function was originally a GNU extension, but was added in
     ISO C99.

While iswblank(3) is careful to refer to locale classification.

In practice, the only system where I could find a locale with a
single-byte charset with "blank" characters other than SPC and
TAB was NetBSD. And there, isblank(0xa0) under setlocale() in a
locale that uses ISO8859-1 for instance does return true (as
POSIX requires if that's how 0xa0 is classified in the locale.
However in the same locale, its sh (which is not multibyte
aware) outputs no in:

case $nbsb in
  [[:blank:][:space:]]) echo yes;;
  *) echo no
esac

(bash outputs yes for both blank and space as POSIX requires).

I don't think many people complained when multi-byte support was
added and English people were starting to have their [[:alpha:]]
match on Greek or Korean letters in addition to English ones
(fair enough as "alpha" means the first letter of the Greek
alphabet).

The main problem if we want to align with other shells and make
the shell POSIX compliant is that the documentation currently
states explicitely that  it matches on space and tab only.

The question is would any script be broken if we changed it?

People still keep using [a-z] when they mean to match English
lower case letters while in effect nowadays, except in zsh and a
very few other utilities that match ranges based on code points,
that matches on hundreds more (like à, œ, ć, if not ch, fi...), I
wouldn't be surprised if people use [[:alnum:]] thinking it only
matches on Latin letters without diacritics and Arabic decimal
degits.

But then again, that still works more or less for them, as they
use it anyway against text that only contains English data.

To me the correct way to do a strict match against ASCII blanks
(or English letters, or ASCII punctuations) would be to use the
C locale.

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 12:34         ` Stephane Chazelas
@ 2018-05-14 13:50           ` Peter Stephenson
  2018-05-14 15:51             ` Stephane Chazelas
  2018-05-17  9:03           ` Sebastian Gniazdowski
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Stephenson @ 2018-05-14 13:50 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, 14 May 2018 13:34:25 +0100
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> I don't think many people complained when multi-byte support was
> added and English people were starting to have their [[:alpha:]]
> match on Greek or Korean letters in addition to English ones
> (fair enough as "alpha" means the first letter of the Greek
> alphabet).

It's certainly true that a whole heap of things like this switched to an
extended meaning years ago when multibyte started being enabled.

It wouldn't be ridiculous to change the documentation for this case and
require "unsetopt multibyte" for strict byte-by-byte comparisions, which
is already how it works in the vast majority of other cases.

pws


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 13:50           ` Peter Stephenson
@ 2018-05-14 15:51             ` Stephane Chazelas
  2018-05-14 16:31               ` Sebastian Gniazdowski
  2018-05-15 19:06               ` Oliver Kiddle
  0 siblings, 2 replies; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-14 15:51 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2018-05-14 14:50:56 +0100, Peter Stephenson:
[...]
> It wouldn't be ridiculous to change the documentation for this case and
> require "unsetopt multibyte" for strict byte-by-byte comparisions, which
> is already how it works in the vast majority of other cases.
[...]

But note that here it's not about multibyte vs singlebyte but
whether [:blank:] honours the locale like the other POSIX
character classes (alpha, punct...) do.

There are locales on some systems (like NetBSD already
mentioned) that use a single-byte charset where more than SPC
and TAB are classified as "blank" (like 0xA0 (nbsp) in locales
using iso8859-x charsets or 0x9A in KOI8-R on NetBSD).

IMO, without the "multibyte" option, we should still call
isblank() which on most systems and most locales will match only
on SPC and TAB but is not guaranteed to (and does not in
practice like on NetBSD).

I just noticed that on NetBSD, in locales using UTF-8 or
GB18030, isblank() returns true on \v (vertical TAB), not in any
other locale! So does iswblank(). So out goes my claim that
"blank" should be for horizontal spaces. On OpenBSD (where only
UTF-8 charsets are supported in locales other than C/POSIX),
iswblank() matches on \v and \f. 

What a mess!

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 15:51             ` Stephane Chazelas
@ 2018-05-14 16:31               ` Sebastian Gniazdowski
  2018-05-14 16:50                 ` Bart Schaefer
  2018-05-15 19:06               ` Oliver Kiddle
  1 sibling, 1 reply; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-14 16:31 UTC (permalink / raw)
  To: Zsh hackers list

On 14 May 2018 at 17:51, Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> I just noticed that on NetBSD, in locales using UTF-8 or
> GB18030, isblank() returns true on \v (vertical TAB), not in any
> other locale! So does iswblank(). So out goes my claim that
> "blank" should be for horizontal spaces. On OpenBSD (where only
> UTF-8 charsets are supported in locales other than C/POSIX),
> iswblank() matches on \v and \f.
>
> What a mess!

Maybe seeing Zsh as a platform is a way. I suspect that the person
which decided to match non-horizontal space on NetBSD or OpenBSD via
:blank: was himself thinking in platform terms. So basically "in Zsh
world [[:blank:]] is ...". I feel that BSD coders like such
power-of-creation moments, "in NetBSD world it will be that way...".
Too bad character classes shine in control code, "if :space:, then,
else" and such play in a creator is actually very influential and long
term. So one option is to leave [[:blank:]] as it is, I would be happy
to use it in code and comfortably include tabs in various initially
0x20-space comparisons (BTW., [ $'\t'] doesn't work in [[ ... ]], the
space needs to be backslashed). The other way is to make it very
normal, without bumps. The other is to join systems that prevail in
numbers or market share.

-- 
Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 16:31               ` Sebastian Gniazdowski
@ 2018-05-14 16:50                 ` Bart Schaefer
  2018-05-14 19:52                   ` Daniel Tameling
  0 siblings, 1 reply; 26+ messages in thread
From: Bart Schaefer @ 2018-05-14 16:50 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh hackers list

On Mon, May 14, 2018 at 9:31 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
>
> Maybe seeing Zsh as a platform is a way.

That only works for native zsh mode.  In other emulations, zsh ought
to behave either the way other shells on the same OS do, or as the
POSIX spec requires, to the extent that is both possible and sane.  In
this case we have to decide whether it is reasonable to make a
distinction.

> (BTW., [ $'\t'] doesn't work in [[ ... ]], the
> space needs to be backslashed).

Just put the space inside the quotes:  [$' \t']


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 16:50                 ` Bart Schaefer
@ 2018-05-14 19:52                   ` Daniel Tameling
  2018-05-14 20:42                     ` Stephane Chazelas
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Tameling @ 2018-05-14 19:52 UTC (permalink / raw)
  To: zsh-workers

Stephane already quoted some man pages, but here is what the C99/C11
standards say:

"The isblank function tests for any character that is a standard blank
character or is one of a locale-specific set of characters for which
isspace is true and that is used to separate words within a line of
text. The standard blank characters are the following: space (' '),
and horizontal tab ('\t'). In the "C" locale, isblank returns true
only for the standard blank characters."

And Posix seems to say the same: it defines blank for the C locale
and states that in other locales it should at least encompass space
and tab.

So in other locales it seems to be totally undefined what a blank is,
and everybody does what they think is good choice. Thus the mess
Stephane observed. In fact, I looked at the musl library and found
this code:
int isblank(int c)
{
	return (c == ' ' || c == '\t');
}
int __isblank_l(int c, locale_t l)
{
	return isblank(c);
}
So they completely ignore the locale and just use the bare minimum
required by the standard. So after the patch, zsh would not only
behave differently on different platforms but would also change it's
behavior if you link with a different libc. 

Nevertheless, I'm slightly in favour of the patch. While defining our
own :blank: for other locales might give us consistency across
platforms, I think it will end up to be different than what everybody
else does and will thus lead to unexpected results for users -- in
particular if the libc's start to agree on isblank for different
locales. And at that point, it might be difficult to change the
behavior if it breaks backward compatibility.

In fact, it's the hope that the situation will improve in the future
that sways me towards the patch compared to the status-quo. But seeing
the mess Stephane uncovered made it a very tight race.

Finally, whether the patch gets applied or not, the documentation
should definitely be updated to reflect the issues around :blank:.

-- 
Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 19:52                   ` Daniel Tameling
@ 2018-05-14 20:42                     ` Stephane Chazelas
  2018-05-15 18:12                       ` Stephane Chazelas
  0 siblings, 1 reply; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-14 20:42 UTC (permalink / raw)
  To: zsh-workers

I agree with Daniel here.

Although I think I would prefer a [[:blank:]] that consistently
matches SPC and TAB rather than something completely random
ranging from SPC and TAB (the minimum required by POSIX) and
whatever [[:space:]] matches (POSIX requires [:blank:] to be a
subset of [:space:]), I don't think it's zsh's role to fix the
POSIX character classes.

There's also the question of the consistency between

[[ $x = [[:blank:]] ]] (using zsh's own pattern matching
implementation)

[[ $x =~ [[:blank:]] ]] (using the system's EREs, so generally
influenced by the locale)

and the same with rematchpcre, that one only matching SPC and TAB
regardless of the locale AFAICT with all the character classes
only matching the minimum ASCII characters required by POSIX (as
if using the C locale with wildcards or ERE).

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 20:42                     ` Stephane Chazelas
@ 2018-05-15 18:12                       ` Stephane Chazelas
  2018-05-16  4:18                         ` Sebastian Gniazdowski
  0 siblings, 1 reply; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-15 18:12 UTC (permalink / raw)
  To: zsh-workers

Note a "theoretical" problem with not doing iswblank() is that
[[:blank:]] is often used to parse the output of some commands.

When POSIX specifies the output of a command (generally only in
the POSIX locale) and that output has whitespace separated fields
(like in the output of id, ls -l, wc...) the separators are one
or more "blanks".

So we need to be able to match *those* "blanks" which are the
POSIX blanks.

Now in practice, I don't know of any current implementation of
any utility that would use anything but SPC or TAB in any
locale, so it's only a "theoretical" point.

Note that by some reading of the spec, and bash and yash have
made such readings, when the spec says tokens are delimited by
blanks, that's any blank in the locale.

$ yash -c $'echo\u2006test'
test

In the case of bash, that only works "properly" with single-byte
characters.

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 15:51             ` Stephane Chazelas
  2018-05-14 16:31               ` Sebastian Gniazdowski
@ 2018-05-15 19:06               ` Oliver Kiddle
  2018-05-16 13:15                 ` Stephane Chazelas
  1 sibling, 1 reply; 26+ messages in thread
From: Oliver Kiddle @ 2018-05-15 19:06 UTC (permalink / raw)
  To: Zsh hackers list

Stephane Chazelas wrote:
> [...]
> > It wouldn't be ridiculous to change the documentation for this case and
> > require "unsetopt multibyte" for strict byte-by-byte comparisions, which
> > is already how it works in the vast majority of other cases.
> [...]
>
> But note that here it's not about multibyte vs singlebyte but
> whether [:blank:] honours the locale like the other POSIX
> character classes (alpha, punct...) do.

For consistency with the other character classes, I think the best is to
follow POSIX and the other shells and have [:blank:] call iswblank().
That is apply the patch plus whatever change the documentation needs to
reflect it.

I can't see it actually breaking scripts in practice. We do at least
have the option of using [$' \t'] if we want and could add [[:BLANK:]]
or similar if needed. It does seem wrong for non-breaking spaces to be
matched but that's an issue for NetBSD or whatever.

This isn't as bad as the idiocy of [a-z] matching B-Z.

> What a mess!

Indeed.

I also wish POSIX would standardise an alternative for the C locale
that's UTF-8 aware and with ISO rather than US format dates.

Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-15 18:12                       ` Stephane Chazelas
@ 2018-05-16  4:18                         ` Sebastian Gniazdowski
  0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-16  4:18 UTC (permalink / raw)
  To: Zsh hackers list

On 15 May 2018 at 20:12, Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> Note that by some reading of the spec, and bash and yash have
> made such readings, when the spec says tokens are delimited by
> blanks, that's any blank in the locale.
>
> $ yash -c $'echo\u2006test'
> test
>
> In the case of bash, that only works "properly" with single-byte
> characters.

Doing some more or less bizarre coding, this allows to hide
information in the command line. Assuming script has access to full
command text (e.g. $jobtexts), it can check which argument is
prepended with e.g. non-breaking space(s), and which with regular
space(s). Zshell would behave as if there's no difference, while
script could decide on something, e.g. that nbs-prepended argument is
a fifo not a regular file, and some custom function show_jobs_status()
could show fifos in different color. Just doing deep-implications
survey, if e.g. SQL designers would do this properly, there wouldn't
be so many flavors of SQL today.

-- 
Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-15 19:06               ` Oliver Kiddle
@ 2018-05-16 13:15                 ` Stephane Chazelas
  2018-05-16 13:40                   ` Peter Stephenson
  0 siblings, 1 reply; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-16 13:15 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

2018-05-15 21:06:01 +0200, Oliver Kiddle:
[...]
> For consistency with the other character classes, I think the best is to
> follow POSIX and the other shells and have [:blank:] call iswblank().
> That is apply the patch plus whatever change the documentation needs to
> reflect it.
[...]

3rd version of the patch with doc update and check for
isblank().

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 8b447e2..c791097 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2004,7 +2004,7 @@ The character is 7-bit, i.e. is a single-byte character without
 the top bit set.
 )
 item(tt([:blank:]))(
-The character is either space or tab
+The character is a blank character
 )
 item(tt([:cntrl:]))(
 The character is a control character
diff --git a/NEWS b/NEWS
index 1db9da6..1786897 100644
--- a/NEWS
+++ b/NEWS
@@ -4,7 +4,14 @@ CHANGES FROM PREVIOUS VERSIONS OF ZSH
 
 Note also the list of incompatibilities in the README file.
 
-Changes from %.5 to 5.5.1
+Changes from 5.5.1 to FIXME
+---------------------------
+
+In shell patterns, [[:blank:]] now honours the locale instead of
+matching exclusively on space and tab, like for the other POSIX
+character classes or in extended regular expressions.
+
+Changes from 5.5 to 5.5.1
 -------------------------
 
 Apart from a fix for a configuration problem finding singal names from
diff --git a/Src/pattern.c b/Src/pattern.c
index fc7c737..97a6d9c 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3605,7 +3605,7 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == L' ' || ch == L'\t')
+		if (iswblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
@@ -3840,7 +3840,14 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == ' ' || ch == '\t')
+#if !defined(HAVE_ISBLANK) && !defined(isblank)
+/*
+ * isblank() is GNU and C99. There's a remote chance that some
+ * systems still don't support it.
+ */
+#define isblank(c) (c == ' ' || c == '\t')
+#endif
+		if (isblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
diff --git a/configure.ac b/configure.ac
index 4329afb..c2efda5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1304,6 +1304,7 @@ AC_CHECK_FUNCS(strftime strptime mktime timelocal \
 	       memcpy memmove strstr strerror strtoul \
 	       getrlimit getrusage \
 	       setlocale \
+	       isblank \
 	       uname \
 	       signgam tgamma \
 	       scalbn \
@@ -2564,10 +2565,10 @@ AC_HELP_STRING([--enable-multibyte], [support multibyte characters]),
 [AC_CACHE_VAL(zsh_cv_c_unicode_support,
   AC_MSG_NOTICE([checking for functions supporting multibyte characters])
   [zfuncs_absent=
-   for zfunc in iswalnum iswcntrl iswdigit iswgraph iswlower iswprint \
-iswpunct iswspace iswupper iswxdigit mbrlen mbrtowc towupper towlower \
-wcschr wcscpy wcslen wcsncmp wcsncpy wcrtomb wcwidth wmemchr wmemcmp \
-wmemcpy wmemmove wmemset; do
+   for zfunc in iswalnum iswblank iswcntrl iswdigit iswgraph iswlower \
+     iswprint iswpunct iswspace iswupper iswxdigit mbrlen mbrtowc \
+     towupper towlower wcschr wcscpy wcslen wcsncmp wcsncpy wcrtomb \
+     wcwidth wmemchr wmemcmp wmemcpy wmemmove wmemset; do
      AC_CHECK_FUNC($zfunc,
      [:], [zfuncs_absent="$zfuncs_absent $zfunc"])
     done

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-16 13:15                 ` Stephane Chazelas
@ 2018-05-16 13:40                   ` Peter Stephenson
  2018-05-16 16:31                     ` Stephane Chazelas
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Stephenson @ 2018-05-16 13:40 UTC (permalink / raw)
  To: Stephane Chazelas, Zsh hackers list

On Wed, 16 May 2018 14:15:47 +0100
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> 2018-05-15 21:06:01 +0200, Oliver Kiddle:
> [...]
> > For consistency with the other character classes, I think the best
> > is to follow POSIX and the other shells and have [:blank:] call
> > iswblank(). That is apply the patch plus whatever change the
> > documentation needs to reflect it.  
> [...]
> 
> 3rd version of the patch with doc update and check for
> isblank().

Probably slightly better with the patch than without, in an imperfect world.

Is iswblank() guaranteed to be available?  It's covered by an extra set
of #ifdef's compared with the isblank() case but none of them is forcing
it to use C99 standard headers.

pws


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-16 13:40                   ` Peter Stephenson
@ 2018-05-16 16:31                     ` Stephane Chazelas
  2018-05-16 21:02                       ` [PATCH v4] " Stephane Chazelas
  2018-05-17 22:05                       ` [PATCH] " Oliver Kiddle
  0 siblings, 2 replies; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-16 16:31 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2018-05-16 14:40:26 +0100, Peter Stephenson:
[...]
> Is iswblank() guaranteed to be available?  It's covered by an extra set
> of #ifdef's compared with the isblank() case but none of them is forcing
> it to use C99 standard headers.
[...]

In that v3 patch, I've added iswblank() in the list of functions
to check before enabling "unicode support". Maybe we should do
like for isblank() so that we can still have unicode support if
iswalpha()... are present but not iswblank() (and have
iswblank() check for spc and tab only then).

OK, I'll send a v4 patch tonight.

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] [[:blank:]] only matches on SPC and TAB
  2018-05-16 16:31                     ` Stephane Chazelas
@ 2018-05-16 21:02                       ` Stephane Chazelas
  2018-05-17  8:29                         ` Peter Stephenson
  2018-05-17 22:05                       ` [PATCH] " Oliver Kiddle
  1 sibling, 1 reply; 26+ messages in thread
From: Stephane Chazelas @ 2018-05-16 21:02 UTC (permalink / raw)
  To: Peter Stephenson, Zsh hackers list

2018-05-16 17:31:19 +0100, Stephane Chazelas:
[...]
> > Is iswblank() guaranteed to be available?  It's covered by an extra set
> > of #ifdef's compared with the isblank() case but none of them is forcing
> > it to use C99 standard headers.
[...] 

I have to admit I'm not sure what you mean by that. And those
are the kind of thing I'm not very familiar with. AFAICT, the
AC_CHECK_FUNCS() checks that the iswblank symbol is available in
the libc. And Src/zsh_system.h looks like it should enable
enough of the feature test macros for the system headers to
expose it, but I may very well misunderstand things.

> In that v3 patch, I've added iswblank() in the list of functions
> to check before enabling "unicode support". Maybe we should do
> like for isblank() so that we can still have unicode support if
> iswalpha()... are present but not iswblank() (and have
> iswblank() check for spc and tab only then).
> 
> OK, I'll send a v4 patch tonight.


diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 8b447e2..c791097 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2004,7 +2004,7 @@ The character is 7-bit, i.e. is a single-byte character without
 the top bit set.
 )
 item(tt([:blank:]))(
-The character is either space or tab
+The character is a blank character
 )
 item(tt([:cntrl:]))(
 The character is a control character
diff --git a/NEWS b/NEWS
index 1db9da6..1786897 100644
--- a/NEWS
+++ b/NEWS
@@ -4,7 +4,14 @@ CHANGES FROM PREVIOUS VERSIONS OF ZSH
 
 Note also the list of incompatibilities in the README file.
 
-Changes from %.5 to 5.5.1
+Changes from 5.5.1 to FIXME
+---------------------------
+
+In shell patterns, [[:blank:]] now honours the locale instead of
+matching exclusively on space and tab, like for the other POSIX
+character classes or for extended regular expressions.
+
+Changes from 5.5 to 5.5.1
 -------------------------
 
 Apart from a fix for a configuration problem finding singal names from
diff --git a/Src/pattern.c b/Src/pattern.c
index fc7c737..737f5cd 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3605,7 +3605,15 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == L' ' || ch == L'\t')
+#if !defined(HAVE_ISWBLANK) && !defined(iswblank)
+/*
+ * iswblank() is GNU and C99. There's a remote chance that some
+ * systems still don't support it (but would support the other ones
+ * if MULTIBYTE_SUPPORT is enabled).
+ */
+#define iswblank(c) (c == L' ' || c == L'\t')
+#endif
+		if (iswblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
@@ -3840,7 +3848,14 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == ' ' || ch == '\t')
+#if !defined(HAVE_ISBLANK) && !defined(isblank)
+/*
+ * isblank() is GNU and C99. There's a remote chance that some
+ * systems still don't support it.
+ */
+#define isblank(c) (c == ' ' || c == '\t')
+#endif
+		if (isblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
diff --git a/configure.ac b/configure.ac
index 4329afb..00c7318 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1304,6 +1304,7 @@ AC_CHECK_FUNCS(strftime strptime mktime timelocal \
 	       memcpy memmove strstr strerror strtoul \
 	       getrlimit getrusage \
 	       setlocale \
+	       isblank iswblank \
 	       uname \
 	       signgam tgamma \
 	       scalbn \
@@ -2564,6 +2565,12 @@ AC_HELP_STRING([--enable-multibyte], [support multibyte characters]),
 [AC_CACHE_VAL(zsh_cv_c_unicode_support,
   AC_MSG_NOTICE([checking for functions supporting multibyte characters])
   [zfuncs_absent=
+dnl
+dnl Note that iswblank is not included and checked separately.
+dnl As iswblank() was added to C long after the others, we still
+dnl want to enabled unicode support even if iswblank is not available
+dnl (we then just do the SPC+TAB approximation)
+dnl
    for zfunc in iswalnum iswcntrl iswdigit iswgraph iswlower iswprint \
 iswpunct iswspace iswupper iswxdigit mbrlen mbrtowc towupper towlower \
 wcschr wcscpy wcslen wcsncmp wcsncpy wcrtomb wcwidth wmemchr wmemcmp \

-- 
Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] [[:blank:]] only matches on SPC and TAB
  2018-05-16 21:02                       ` [PATCH v4] " Stephane Chazelas
@ 2018-05-17  8:29                         ` Peter Stephenson
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Stephenson @ 2018-05-17  8:29 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, 16 May 2018 22:02:51 +0100
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> 2018-05-16 17:31:19 +0100, Stephane Chazelas:
> [...]
> > > Is iswblank() guaranteed to be available?  It's covered by an
> > > extra set of #ifdef's compared with the isblank() case but none
> > > of them is forcing it to use C99 standard headers.  
> [...] 
> 
> I have to admit I'm not sure what you mean by that. And those
> are the kind of thing I'm not very familiar with. AFAICT, the
> AC_CHECK_FUNCS() checks that the iswblank symbol is available in
> the libc. And Src/zsh_system.h looks like it should enable
> enough of the feature test macros for the system headers to
> expose it, but I may very well misunderstand things.

I think what you've done covers it --- I just wasn't sure if it was safe
to bundle it with other stuff.

pws


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-14 12:34         ` Stephane Chazelas
  2018-05-14 13:50           ` Peter Stephenson
@ 2018-05-17  9:03           ` Sebastian Gniazdowski
  2018-05-17 10:10             ` Sebastian Gniazdowski
  1 sibling, 1 reply; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-17  9:03 UTC (permalink / raw)
  To: Peter Stephenson, Zsh hackers list

On 14 May 2018 at 14:34, Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> I don't think many people complained when multi-byte support was
> added and English people were starting to have their [[:alpha:]]
> match on Greek or Korean letters in addition to English ones
> (fair enough as "alpha" means the first letter of the Greek
> alphabet).

This is a very interesting point. Think bank-systems. I think no one
ever predicted that control code of deployed programs will be
influenced from outside. The [[:alpha:]] case should find its way to
books on computer science as an example of something unbelievable. The
same as changing libc and fopen() to return NULL also when, say ...,
disk is near-full. I wonder how this compares to y2k situation hehe.
That said, I think that what is hidden behind those "upgrades" of
standard libraries, is a motivation to do a 1-time bungee jump,
risking breaking bones, but hoping to fix past mistakes. The past
(ignoring of non-ascii strings) had to haunt enough people so that
this happened.

> --
> Stephane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-17  9:03           ` Sebastian Gniazdowski
@ 2018-05-17 10:10             ` Sebastian Gniazdowski
  0 siblings, 0 replies; 26+ messages in thread
From: Sebastian Gniazdowski @ 2018-05-17 10:10 UTC (permalink / raw)
  To: Zsh hackers list

On 17 May 2018 at 11:03, Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:
> This is a very interesting point. Think bank-systems. I think no one
> ever predicted that control code of deployed programs will be
> influenced from outside. The [[:alpha:]] case should find its way to
> books on computer science as an example of something unbelievable.

Just one more thing. This situation is like Windows NT vs. Linux.
Windows engineers have put a large effort to implement the
micro-kernel paradigm, focusing on i.a. driver - kernel API. While
Linus, as kernel evolved, was changing the APIs because he could patch
source tree. Thus, he could fix past mistakes. So we now have a better
C with those new unicode glyphs, I've used them explicitly in my
project giturl and I can tell that C ecosystem will last longer thanks
to those upgrades. But that are bungee jumps hehe.

-- 
Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] [[:blank:]] only matches on SPC and TAB
  2018-05-16 16:31                     ` Stephane Chazelas
  2018-05-16 21:02                       ` [PATCH v4] " Stephane Chazelas
@ 2018-05-17 22:05                       ` Oliver Kiddle
  1 sibling, 0 replies; 26+ messages in thread
From: Oliver Kiddle @ 2018-05-17 22:05 UTC (permalink / raw)
  To: Zsh hackers list

Stephane Chazelas wrote:
> In that v3 patch, I've added iswblank() in the list of functions
> to check before enabling "unicode support". Maybe we should do
> like for isblank() so that we can still have unicode support if
> iswalpha()... are present but not iswblank() (and have
> iswblank() check for spc and tab only then).
>
> OK, I'll send a v4 patch tonight.

I've pushed the v4 patch. Thanks

Checking symbol versions in libc on Solaris confirms that iswblank was
added much later than others such as iswspace – only a well patched
Solaris 10 will have it while the others are in Solaris 9 and perhaps
earlier. Whether anyone still builds zsh on a system that lacks it is
harder to tell.

Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-05-17 22:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-13 21:25 [PATCH] [[:blank:]] only matches on SPC and TAB Stephane Chazelas
2018-05-13 21:49 ` [PATCH v2] " Stephane Chazelas
2018-05-14  2:27 ` [PATCH] " Sebastian Gniazdowski
2018-05-14  4:41   ` Sebastian Gniazdowski
2018-05-14  6:36   ` Stephane Chazelas
2018-05-14  6:44     ` Stephane Chazelas
2018-05-14  8:47       ` Peter Stephenson
2018-05-14 12:34         ` Stephane Chazelas
2018-05-14 13:50           ` Peter Stephenson
2018-05-14 15:51             ` Stephane Chazelas
2018-05-14 16:31               ` Sebastian Gniazdowski
2018-05-14 16:50                 ` Bart Schaefer
2018-05-14 19:52                   ` Daniel Tameling
2018-05-14 20:42                     ` Stephane Chazelas
2018-05-15 18:12                       ` Stephane Chazelas
2018-05-16  4:18                         ` Sebastian Gniazdowski
2018-05-15 19:06               ` Oliver Kiddle
2018-05-16 13:15                 ` Stephane Chazelas
2018-05-16 13:40                   ` Peter Stephenson
2018-05-16 16:31                     ` Stephane Chazelas
2018-05-16 21:02                       ` [PATCH v4] " Stephane Chazelas
2018-05-17  8:29                         ` Peter Stephenson
2018-05-17 22:05                       ` [PATCH] " Oliver Kiddle
2018-05-17  9:03           ` Sebastian Gniazdowski
2018-05-17 10:10             ` Sebastian Gniazdowski
2018-05-14  8:11     ` Sebastian Gniazdowski

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).