From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24429 invoked from network); 28 Apr 2005 16:10:56 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 28 Apr 2005 16:10:56 -0000 Received: (qmail 58873 invoked from network); 28 Apr 2005 16:10:50 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 28 Apr 2005 16:10:50 -0000 Received: (qmail 21410 invoked by alias); 28 Apr 2005 16:10:46 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 21211 Received: (qmail 21401 invoked from network); 28 Apr 2005 16:10:45 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 28 Apr 2005 16:10:45 -0000 Received: (qmail 58492 invoked from network); 28 Apr 2005 16:10:45 -0000 Received: from mailhost1.csr.com (HELO MAILSWEEPER01.csr.com) (81.105.217.43) by a.mx.sunsite.dk with SMTP; 28 Apr 2005 16:10:37 -0000 Received: from exchange03.csr.com (unverified [10.100.137.60]) by MAILSWEEPER01.csr.com (Content Technologies SMTPRS 4.3.12) with ESMTP id for ; Thu, 28 Apr 2005 17:08:54 +0100 Received: from news01.csr.com ([10.103.143.38]) by exchange03.csr.com with Microsoft SMTPSVC(5.0.2195.6713); Thu, 28 Apr 2005 17:11:10 +0100 Received: from news01.csr.com (localhost.localdomain [127.0.0.1]) by news01.csr.com (8.13.1/8.12.11) with ESMTP id j3SGAZwl027080 for ; Thu, 28 Apr 2005 17:10:36 +0100 Received: from csr.com (pws@localhost) by news01.csr.com (8.13.1/8.13.1/Submit) with ESMTP id j3SGAZIg027077 for ; Thu, 28 Apr 2005 17:10:35 +0100 Message-Id: <200504281610.j3SGAZIg027077@news01.csr.com> X-Authentication-Warning: news01.csr.com: pws owned process doing -bs To: zsh-workers@sunsite.dk (Zsh hackers list) Subject: Re: PATCH: character sets for internal zsh tests In-reply-to: <1050428152622.ZM31757@candle.brasslantern.com> References: <200504281141.j3SBfI4g019987@news01.csr.com> <1050428145443.ZM31609@candle.brasslantern.com> <200504281509.j3SF9VUn025694@news01.csr.com> <1050428152622.ZM31757@candle.brasslantern.com> Date: Thu, 28 Apr 2005 17:10:34 +0100 From: Peter Stephenson X-OriginalArrivalTime: 28 Apr 2005 16:11:11.0032 (UTC) FILETIME=[E4B51B80:01C54C0C] X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.6 required=6.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.6 Bart Schaefer wrote: > Just in case I wasn't clear before, in all-caps I think [:IFS:] and > [:IFSSPACE:] are OK. I didn't like lower-case [:ifs:] or [:ifsw:]. OK, let's keep the link between [:IFS:] and $IFS explicit. Index: Doc/Zsh/expn.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v retrieving revision 1.53 diff -u -r1.53 expn.yo --- Doc/Zsh/expn.yo 24 Apr 2005 18:38:04 -0000 1.53 +++ Doc/Zsh/expn.yo 28 Apr 2005 16:08:18 -0000 @@ -1224,19 +1224,82 @@ first character in the list. cindex(character classes) There are also several named classes of characters, in the form -`tt([:)var(name)tt(:])' with the following meanings: `tt([:alnum:])' -alphanumeric, `tt([:alpha:])' alphabetic, -`tt([:ascii:])' 7-bit, -`tt([:blank:])' space or tab, -`tt([:cntrl:])' control character, `tt([:digit:])' decimal -digit, `tt([:graph:])' printable character except whitespace, -`tt([:lower:])' lowercase letter, `tt([:print:])' printable character, -`tt([:punct:])' printable character neither alphanumeric nor whitespace, -`tt([:space:])' whitespace character, `tt([:upper:])' uppercase letter, -`tt([:xdigit:])' hexadecimal digit. These use the macros provided by +`tt([:)var(name)tt(:])' with the following meanings. +The first set use the macros provided by the operating system to test for the given character combinations, -including any modifications due to local language settings: see -manref(ctype)(3). Note that the square brackets are additional +including any modifications due to local language settings, see +manref(ctype)(3): + +startitem() +item(tt([:alnum:]))( +The character is alphanumeric +) +item(tt([:alpha:])) +( +The character is alphabetic +) +item(tt([:ascii:]))( +The character is 7-bit, i.e. is a single-byte character without +the top bit set. +) +item(tt([:blank:]))( +The character is either space or tab +) +item(tt([:cntrl:]))( +The character is a control character +) +item(tt([:digit:]))( +The character is a decimal digit +) +item(tt([:graph:]))( +The character is a printable character other than whitespace +) +item(tt([:lower:]))(l +The character is a lowercase letter +) +item(tt([:print:]))( +The character is printable +) +item(tt([:punct:]))( +The character is printable but neither alphanumeric nor whitespace +) +item(tt([:space:]))( +The character is whitespace +) +item(tt([:upper:]))( +The character is an uppercase letter +) +item(tt([:xdigit:]))( +The character is a hexadecimal digit +) +enditem() + +Another set of named classes is handled internally by the shell and +is not sensitive to the locale: + +startitem() +item(tt([:IDENT:]))( +The character is allowed to form part of a shell identifier, such +as a parameter name +) +item(tt([:IFS:]))( +The character is used as an input field separator, i.e. is contained in the +tt(IFS) parameter +) +item(tt([:IFSSPACE:]))( +The character is an IFS white space character; see the documentation +for tt(IFS) in +ifzman(the zmanref(zshparams) manual page)\ +ifnzman(noderef(Parameters Used By The Shell))\ +. +) +item(tt([:WORD:]))( +The character is treated as part of a word; this test is sensitive +to the value of the tt(WORDCHARS) parameter +) +enditem() + +Note that the square brackets are additional to those enclosing the whole set of characters, so to test for a single alphanumeric character you need `tt([[:alnum:]])'. Named character sets can be used alongside other types, Index: Src/pattern.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v retrieving revision 1.26 diff -u -r1.26 pattern.c --- Src/pattern.c 26 Apr 2005 09:51:29 -0000 1.26 +++ Src/pattern.c 28 Apr 2005 16:08:18 -0000 @@ -193,8 +193,12 @@ #define PP_SPACE 11 #define PP_UPPER 12 #define PP_XDIGIT 13 -#define PP_UNKWN 14 -#define PP_RANGE 15 +#define PP_IDENT 14 +#define PP_IFS 15 +#define PP_IFSSPACE 16 +#define PP_WORD 17 +#define PP_UNKWN 18 +#define PP_RANGE 19 #define P_OP(p) ((p)->l & 0xff) #define P_NEXT(p) ((p)->l >> 8) @@ -1118,6 +1122,14 @@ ch = PP_UPPER; else if (!strncmp(patparse, "xdigit", len)) ch = PP_XDIGIT; + else if (!strncmp(patparse, "IDENT", len)) + ch = PP_IDENT; + else if (!strncmp(patparse, "IFS", len)) + ch = PP_IFS; + else if (!strncmp(patparse, "IFSSPACE", len)) + ch = PP_IFSSPACE; + else if (!strncmp(patparse, "WORD", len)) + ch = PP_WORD; else ch = PP_UNKWN; patparse = nptr + 2; @@ -2724,6 +2736,22 @@ if (isxdigit(ch)) return 1; break; + case PP_IDENT: + if (iident(ch)) + return 1; + break; + case PP_IFS: + if (isep(ch)) + return 1; + break; + case PP_IFSSPACE: + if (iwsep(ch)) + return 1; + break; + case PP_WORD: + if (iword(ch)) + return 1; + break; case PP_RANGE: range++; r1 = STOUC(UNMETA(range)); Index: Test/D02glob.ztst =================================================================== RCS file: /cvsroot/zsh/zsh/Test/D02glob.ztst,v retrieving revision 1.9 diff -u -r1.9 D02glob.ztst --- Test/D02glob.ztst 16 Mar 2005 11:51:15 -0000 1.9 +++ Test/D02glob.ztst 28 Apr 2005 16:08:18 -0000 @@ -323,3 +323,28 @@ print glob.tmp/ra=1.0_et=3.5/??? 0:Bug with intermediate paths with plain strings but tokenized characters >glob.tmp/ra=1.0_et=3.5/foo + + doesmatch() { + setopt localoptions extendedglob + print -n $1 $2\ + if [[ $1 = $~2 ]]; then print yes; else print no; fi; + } + doesmatch MY_IDENTIFIER '[[:IDENT:]]##' + doesmatch YOUR:IDENTIFIER '[[:IDENT:]]##' + IFS=$'\n' doesmatch $'\n' '[[:IFS:]]' + IFS=' ' doesmatch $'\n' '[[:IFS:]]' + IFS=':' doesmatch : '[[:IFSSPACE:]]' + IFS=' ' doesmatch ' ' '[[:IFSSPACE:]]' + WORDCHARS="" doesmatch / '[[:WORD:]]' + WORDCHARS="/" doesmatch / '[[:WORD:]]' +0:Named character sets handled internally +>MY_IDENTIFIER [[:IDENT:]]## yes +>YOUR:IDENTIFIER [[:IDENT:]]## no +> +> [[:IFS:]] yes +> +> [[:IFS:]] no +>: [[:IFSSPACE:]] no +> [[:IFSSPACE:]] yes +>/ [[:WORD:]] no +>/ [[:WORD:]] yes -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. **********************************************************************