From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28117 invoked from network); 28 Apr 2005 11:44:35 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 28 Apr 2005 11:44:35 -0000 Received: (qmail 80090 invoked from network); 28 Apr 2005 11:44:29 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 28 Apr 2005 11:44:29 -0000 Received: (qmail 16545 invoked by alias); 28 Apr 2005 11:44:26 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 21205 Received: (qmail 16534 invoked from network); 28 Apr 2005 11:44:26 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 28 Apr 2005 11:44:26 -0000 Received: (qmail 79781 invoked from network); 28 Apr 2005 11:44:26 -0000 Received: from mailhost1.csr.com (HELO MAILSWEEPER01.csr.com) (81.105.217.43) by a.mx.sunsite.dk with SMTP; 28 Apr 2005 11:44:21 -0000 Received: from exchange03.csr.com (unverified [10.100.137.60]) by MAILSWEEPER01.csr.com (Content Technologies SMTPRS 4.3.12) with ESMTP id for ; Thu, 28 Apr 2005 12:42:38 +0100 Received: from news01.csr.com ([10.103.143.38]) by exchange03.csr.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 28 Apr 2005 12:41:54 +0100 Received: from news01.csr.com (localhost.localdomain [127.0.0.1]) by news01.csr.com (8.13.1/8.12.11) with ESMTP id j3SBfIns019991 for ; Thu, 28 Apr 2005 12:41:18 +0100 Received: from csr.com (pws@localhost) by news01.csr.com (8.13.1/8.13.1/Submit) with ESMTP id j3SBfI4g019987 for ; Thu, 28 Apr 2005 12:41:18 +0100 Message-Id: <200504281141.j3SBfI4g019987@news01.csr.com> X-Authentication-Warning: news01.csr.com: pws owned process doing -bs To: zsh-workers@sunsite.dk (Zsh hackers list) Subject: PATCH: character sets for internal zsh tests Date: Thu, 28 Apr 2005 12:41:18 +0100 From: Peter Stephenson X-OriginalArrivalTime: 28 Apr 2005 11:41:54.0218 (UTC) FILETIME=[467EF0A0:01C54BE7] X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.6 required=6.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.6 After the last mail I sent, I was just thinking about quoting of separators between array elements in vared, and I drifted into thinking about how it would be useful to have tests for whether a character was a separator, etc. You can do things like [$IFS], but (1) they are a bit fraught with difficulty because in general IFS can contain pretty much anything including a "-" or a "!" (2) you need to apply additional rules in some cases such as "IFS whitespace" or word characters which always include alphanumerics. (See my hacks for [$WORDCHARS] in the Zle function match-words-by-style, for example.) This patch adds [[:sep:]], [[:wsep:]], [[:ident:]], [[:word:]]. These are trivial because the tests are already available internally, so we can get quite a lot from little effort. The names are simply borrowed from the internal macros; let me know if you think there are better names. I think the last two are OK but maybe [[:ifs:]] and [[:ifsw:]] or [[:ifsspace:]] would be better for the first two. Then I will add tests. Index: Doc/Zsh/expn.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v retrieving revision 1.53 diff -u -r1.53 expn.yo --- Doc/Zsh/expn.yo 24 Apr 2005 18:38:04 -0000 1.53 +++ Doc/Zsh/expn.yo 28 Apr 2005 11:30:17 -0000 @@ -1224,19 +1224,81 @@ first character in the list. cindex(character classes) There are also several named classes of characters, in the form -`tt([:)var(name)tt(:])' with the following meanings: `tt([:alnum:])' -alphanumeric, `tt([:alpha:])' alphabetic, -`tt([:ascii:])' 7-bit, -`tt([:blank:])' space or tab, -`tt([:cntrl:])' control character, `tt([:digit:])' decimal -digit, `tt([:graph:])' printable character except whitespace, -`tt([:lower:])' lowercase letter, `tt([:print:])' printable character, -`tt([:punct:])' printable character neither alphanumeric nor whitespace, -`tt([:space:])' whitespace character, `tt([:upper:])' uppercase letter, -`tt([:xdigit:])' hexadecimal digit. These use the macros provided by +`tt([:)var(name)tt(:])' with the following meanings. +The first set use the macros provided by the operating system to test for the given character combinations, -including any modifications due to local language settings: see -manref(ctype)(3). Note that the square brackets are additional +including any modifications due to local language settings, see +manref(ctype)(3): + +startitem() +item(tt([:alnum:]))( +The character is alphanumeric +) +item(tt([:alpha:])) +( +The character is alphabetic +) +item(tt([:ascii:]))( +The character is 7-bit, i.e. is a single-byte character without +the top bit set. +) +item(tt([:blank:]))( +The character is either space or tab +) +item(tt([:cntrl:]))( +The character is a control character +) +item(tt([:digit:]))( +The character is a decimal digit +) +item(tt([:graph:]))( +The character is a printable character other than whitespace +) +item(tt([:lower:]))(l +The character is a lowercase letter +) +item(tt([:print:]))( +The character is printable +) +item(tt([:punct:]))( +The character is printable but neither alphanumeric nor whitespace +) +item(tt([:space:]))( +The character is whitespace +) +item(tt([:upper:]))( +The character is an uppercase letter +) +item(tt([:xdigit:]))( +The character is a hexadecimal digit +) +enditem() + +Another set of tests are handled internally by the shell and +are not sensitive to the locale: + +startitem() +item(tt([:ident:]))( +The character is allowed to form part of a shell identifier, such +as a parameter name +) +item(tt([:sep:]))( +The character is a separator, i.e. is contained in the tt(IFS) parameter +) +item(tt([:word:]))( +The character is treated as part of a word; this test is sensitive +to the value of the tt(WORDCHARS) parameter +) +item(tt([:wsep:]))( +The character is an IFS white space character; see the documentation +for tt(IFS) in +ifzman(the zmanref(zshparams) manual page)\ +ifnzman(noderef(Parameters Used By The Shell))\ +. +) +enditem() + +Note that the square brackets are additional to those enclosing the whole set of characters, so to test for a single alphanumeric character you need `tt([[:alnum:]])'. Named character sets can be used alongside other types, Index: Src/pattern.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v retrieving revision 1.26 diff -u -r1.26 pattern.c --- Src/pattern.c 26 Apr 2005 09:51:29 -0000 1.26 +++ Src/pattern.c 28 Apr 2005 11:30:19 -0000 @@ -193,8 +193,12 @@ #define PP_SPACE 11 #define PP_UPPER 12 #define PP_XDIGIT 13 -#define PP_UNKWN 14 -#define PP_RANGE 15 +#define PP_IDENT 14 +#define PP_SEP 15 +#define PP_WORD 16 +#define PP_WSEP 17 +#define PP_UNKWN 18 +#define PP_RANGE 19 #define P_OP(p) ((p)->l & 0xff) #define P_NEXT(p) ((p)->l >> 8) @@ -1118,6 +1122,14 @@ ch = PP_UPPER; else if (!strncmp(patparse, "xdigit", len)) ch = PP_XDIGIT; + else if (!strncmp(patparse, "ident", len)) + ch = PP_IDENT; + else if (!strncmp(patparse, "sep", len)) + ch = PP_SEP; + else if (!strncmp(patparse, "word", len)) + ch = PP_WORD; + else if (!strncmp(patparse, "wsep", len)) + ch = PP_WSEP; else ch = PP_UNKWN; patparse = nptr + 2; @@ -2724,6 +2736,22 @@ if (isxdigit(ch)) return 1; break; + case PP_IDENT: + if (iident(ch)) + return 1; + break; + case PP_SEP: + if (isep(ch)) + return 1; + break; + case PP_WORD: + if (iword(ch)) + return 1; + break; + case PP_WSEP: + if (iwsep(ch)) + return 1; + break; case PP_RANGE: range++; r1 = STOUC(UNMETA(range)); -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. **********************************************************************