From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19836 invoked by alias); 13 May 2018 21:26:01 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42763 Received: (qmail 3286 invoked by uid 1010); 13 May 2018 21:26:01 -0000 X-Qmail-Scanner-Diagnostics: from mail-wm0-f47.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(74.125.82.47):SA:0(-1.9/5.0):. Processed in 1.354185 secs); 13 May 2018 21:26:01 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS, T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: stephane.chazelas@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=N3oHC+mVVHV7ipvLVrMpGwJgxJowP/FnlDFYeBG2cho=; b=e0+L1fFFHn9CSaxAzWGzXpls7BHXCtiOyde++r37S5dg2K9eQv16z7YZF7JMsDEAL0 0u+sptIT5tH1uTLq7KZiqvkaKomGIJ1XvcO+Z7Kyc7vAhq1mtmxO/n5o4Kyt94nFNZpt 75mm09/ak8APD8CUtpcBjdSe2UPiS6F7rD1ujNc5HiRbqwOBOBLNLASgNDE0lKTJa9Lm NHLV6u96kQotOT9zGD0Y3RjxJrO8H79kMhqJ0uRqOt9YSAsJSgGPMBbr5Fjvm03UvaAd VZFJIuPULJ4pb0RkE/n/i2vNTdpmKefpxAcbJduMm0DLazwokgB27SdnqDDHgEBEGpV3 vsMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mail-followup-to :mime-version:content-disposition:user-agent; bh=N3oHC+mVVHV7ipvLVrMpGwJgxJowP/FnlDFYeBG2cho=; b=RspKO8sW9ZpY3ENZbcCUUsBpzzDZxagfTjMROe51OJox9i5qy9SkFARgzl9ZFR+EtY V8wmZJhxKpII7UhZsniIXKUNzxjX0B6mr2eYSY/glN4e/PylOkbTAo07ZMWtZG7XgKcS o6q9lbkTdmPLgCD9xilt7bFV3IAldDHbUROsU3y4Foo31q5kIG5FWu7Ku42xiVAshJiC a04hSwquVFCdBsQFzthW7YYEfrf2rZOHVxj5jY+vJttpBHC5SPbUfcCcHWglB1uPV8HD UXxbdGqwtCOYz6mTnKMGlKzM2Wd3JR0GVfpLvkqbRblCxbAPVx4f+y5Co58drpA5LBt7 108Q== X-Gm-Message-State: ALKqPwdt3qNofx9YAvoKrFtXYwCPqqm75HJkFUXbJwSBinoVkPc/dyBH HO3oysBc6pXaL5g+Hs3v5xz/fg== X-Google-Smtp-Source: AB8JxZrffI+KvgRSrMD+vcWnU4iGKQguU0uVwgVKQRvMU4QdXrfj96wH2UPWzpH+wWch52QquuJayQ== X-Received: by 2002:a1c:e485:: with SMTP id b127-v6mr3490541wmh.83.1526246755549; Sun, 13 May 2018 14:25:55 -0700 (PDT) Date: Sun, 13 May 2018 22:25:53 +0100 From: Stephane Chazelas To: Zsh hackers list Subject: [PATCH] [[:blank:]] only matches on SPC and TAB Message-ID: <20180513212553.GA29028@chaz.gmail.com> Mail-Followup-To: Zsh hackers list MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) I noticed that [[:blank:]] was not matching on non-ASCII blank characters. In a typical UTF-8 GNU locale, [[:blank:]] normally includes U+0009 CHARACTER TABULATION U+0020 SPACE U+1680 OGHAM SPACE MARK U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE U+2004 THREE-PER-EM SPACE U+2005 FOUR-PER-EM SPACE U+2006 SIX-PER-EM SPACE U+2008 PUNCTUATION SPACE U+2009 THIN SPACE U+200A HAIR SPACE U+205F MEDIUM MATHEMATICAL SPACE U+3000 IDEOGRAPHIC SPACE On FreeBSD: U+0009 CHARACTER TABULATION U+0020 SPACE U+00A0 NO-BREAK SPACE U+FEFF ZERO WIDTH NO-BREAK SPACE (Strangely enough U+00A0 is not classified as blank in single byte charsets like ISO8859-1 there) The code indeed matches on SPC and TAB explicitly both in the multibyte and singlebyte cases (the non-breaking space is one non-ASCII character that appears in a few singlebyte charsets and is considered as blank on some systems (not GNU ones)). In case that was not intentional, this patch should fix it: diff --git a/Src/pattern.c b/Src/pattern.c index fc7c737..d3eac44 100644 --- a/Src/pattern.c +++ b/Src/pattern.c @@ -3605,7 +3605,7 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp) return 1; break; case PP_BLANK: - if (ch == L' ' || ch == L'\t') + if (iswblank(ch)) return 1; break; case PP_CNTRL: @@ -3840,7 +3840,7 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp) return 1; break; case PP_BLANK: - if (ch == ' ' || ch == '\t') + if (isblank(ch)) return 1; break; case PP_CNTRL: -- Stephane