From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23756 invoked by alias); 4 Sep 2015 10:53:33 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36419 Received: (qmail 26197 invoked from network); 4 Sep 2015 10:53:30 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.0 X-Injected-Via-Gmane: http://gmane.org/ To: zsh-workers@zsh.org From: Ismail Donmez Subject: Re: invalid characters and multi-byte [x-y] ranges Date: Fri, 4 Sep 2015 10:53:14 +0000 (UTC) Message-ID: References: <20150902230711.GA4967@chaz.gmail.com> <20150903100037.6e6ac852@pwslap01u.europe.root.pri> <20150903100943.GB7821@chaz.gmail.com> <20150903151811.557a40ec@pwslap01u.europe.root.pri> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 213.14.100.128 (Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36) Hi, Peter Stephenson samsung.com> writes: > > On Thu, 3 Sep 2015 11:09:44 +0100 > Stephane Chazelas gmail.com> wrote: > > A discussed approach there was to internally represent bytes not > > forming part of a valid character as code points in the range > > D800-DFFF (specifically DC80 DCFF for bytes 0x80 to 0xff) > > That's easy if wchar_t is actually Unicode. > > I'm not sure how to do it otherwise. We could treat it identically to > the Unicode conversion of 0xdC00 + STOUCH(ch) to wchar_t, e.g. iconv > UCS-4 to WCHAR_T, but is that guranteed to work? This needs to be a > robust fallback and it's not clear relying on iconv is the right thing > to do. > > The safe option would be only to use this if #ifdef __STDC_ISO_10646__. > > On the other hand, it's probably not going to be worse than the previous > code... This seems to break glob tests: ./D02glob.ztst: starting. *** /tmp/zsh.ztst.out.6468 Fri Sep 4 13:52:19 2015 --- /tmp/zsh.ztst.tout.6468 Fri Sep 4 13:52:19 2015 *************** *** 91,107 **** 0: [[ [ = [[] ]] 0: [[ ] = []] ]] 0: [[ [] = [^]]] ]] ! 0: [[ fooxx = (#i)FOOXX ]] 1: [[ fooxx = (#l)FOOXX ]] ! 0: [[ FOOXX = (#l)fooxx ]] 1: [[ fooxx = (#i)FOO(#I)X(#i)X ]] ! 0: [[ fooXx = (#i)FOO(#I)X(#i)X ]] ! 0: [[ fooxx = ((#i)FOOX)x ]] 1: [[ fooxx = ((#i)FOOX)X ]] 1: [[ BAR = (bar|(#i)foo) ]] ! 0: [[ FOO = (bar|(#i)foo) ]] ! 0: [[ Modules = (#i)*m* ]] ! 0: [[ fooGRUD = (#i)(bar|(#I)foo|(#i)rod)grud ]] 1: [[ FOOGRUD = (#i)(bar|(#I)foo|(#i)rod)grud ]] 0: [[ readme = (#i)readme~README|readme ]] 0: [[ readme = (#i)readme~README|readme~README ]] --- 91,114 ---- 0: [[ [ = [[] ]] 0: [[ ] = []] ]] 0: [[ [] = [^]]] ]] ! 1: [[ fooxx = (#i)FOOXX ]] ! Test failed: [[ fooxx = (#i)FOOXX ]] 1: [[ fooxx = (#l)FOOXX ]] ! 1: [[ FOOXX = (#l)fooxx ]] ! Test failed: [[ FOOXX = (#l)fooxx ]] 1: [[ fooxx = (#i)FOO(#I)X(#i)X ]] ! 1: [[ fooXx = (#i)FOO(#I)X(#i)X ]] ! Test failed: [[ fooXx = (#i)FOO(#I)X(#i)X ]] ! 1: [[ fooxx = ((#i)FOOX)x ]] ! Test failed: [[ fooxx = ((#i)FOOX)x ]] 1: [[ fooxx = ((#i)FOOX)X ]] 1: [[ BAR = (bar|(#i)foo) ]] ! 1: [[ FOO = (bar|(#i)foo) ]] ! Test failed: [[ FOO = (bar|(#i)foo) ]] ! 1: [[ Modules = (#i)*m* ]] ! Test failed: [[ Modules = (#i)*m* ]] ! 1: [[ fooGRUD = (#i)(bar|(#I)foo|(#i)rod)grud ]] ! Test failed: [[ fooGRUD = (#i)(bar|(#I)foo|(#i)rod)grud ]] 1: [[ FOOGRUD = (#i)(bar|(#I)foo|(#i)rod)grud ]] 0: [[ readme = (#i)readme~README|readme ]] 0: [[ readme = (#i)readme~README|readme~README ]] *************** *** 110,122 **** 0: [[ 633 = <1->33 ]] 0: [[ 633 = <->33 ]] 0: [[ 12345678901234567890123456789012345678901234567890123456789012345678901234 567890foo = <42->foo ]] ! 0: [[ READ.ME = (#ia1)readme ]] 1: [[ READ..ME = (#ia1)readme ]] ! 0: [[ README = (#ia1)readm ]] ! 0: [[ READM = (#ia1)readme ]] ! 0: [[ README = (#ia1)eadme ]] ! 0: [[ EADME = (#ia1)readme ]] ! 0: [[ READEM = (#ia1)readme ]] 1: [[ ADME = (#ia1)readme ]] 1: [[ README = (#ia1)read ]] 0: [[ bob = (#a1)[b][b] ]] --- 117,135 ---- 0: [[ 633 = <1->33 ]] 0: [[ 633 = <->33 ]] 0: [[ 12345678901234567890123456789012345678901234567890123456789012345678901234 567890foo = <42->foo ]] ! 1: [[ READ.ME = (#ia1)readme ]] ! Test failed: [[ READ.ME = (#ia1)readme ]] 1: [[ READ..ME = (#ia1)readme ]] ! 1: [[ README = (#ia1)readm ]] ! Test failed: [[ README = (#ia1)readm ]] ! 1: [[ READM = (#ia1)readme ]] ! Test failed: [[ READM = (#ia1)readme ]] ! 1: [[ README = (#ia1)eadme ]] ! Test failed: [[ README = (#ia1)eadme ]] ! 1: [[ EADME = (#ia1)readme ]] ! Test failed: [[ EADME = (#ia1)readme ]] ! 1: [[ READEM = (#ia1)readme ]] ! Test failed: [[ READEM = (#ia1)readme ]] 1: [[ ADME = (#ia1)readme ]] 1: [[ README = (#ia1)read ]] 0: [[ bob = (#a1)[b][b] ]] *************** *** 138,144 **** 0: [[ aaaXaaabY = (#a1)(a##b)##Y ]] 0: [[ aaaXbaabY = (#a1)(a##b)##Y ]] 1: [[ read.me = (#ia1)README~READ.ME ]] ! 0: [[ read.me = (#ia1)README~READ_ME ]] 1: [[ read.me = (#ia1)README~(#a1)READ_ME ]] 0: [[ test = *((#s)|/)test((#e)|/)* ]] 0: [[ test/path = *((#s)|/)test((#e)|/)* ]] --- 151,158 ---- 0: [[ aaaXaaabY = (#a1)(a##b)##Y ]] 0: [[ aaaXbaabY = (#a1)(a##b)##Y ]] 1: [[ read.me = (#ia1)README~READ.ME ]] ! 1: [[ read.me = (#ia1)README~READ_ME ]] ! Test failed: [[ read.me = (#ia1)README~READ_ME ]] 1: [[ read.me = (#ia1)README~(#a1)READ_ME ]] 0: [[ test = *((#s)|/)test((#e)|/)* ]] 0: [[ test/path = *((#s)|/)test((#e)|/)* ]] *************** *** 177,180 **** 0: [[ test.bash = *.?(#c1,2)sh ]] 0: [[ test.bash = *.?(#c1,)sh ]] 0: [[ test.zsh = *.?(#c1,)sh ]] ! 0 tests failed. --- 191,194 ---- 0: [[ test.bash = *.?(#c1,2)sh ]] 0: [[ test.bash = *.?(#c1,)sh ]] 0: [[ test.zsh = *.?(#c1,)sh ]] ! 14 tests failed. Test ./D02glob.ztst failed: output differs from expected as shown above for: globtest globtests Was testing: zsh globbing