From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23619 invoked by alias); 22 Feb 2015 18:26:28 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 34604 Received: (qmail 3465 invoked from network); 22 Feb 2015 18:26:25 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 X-Originating-IP: [86.6.153.127] X-Spam: 0 X-Authority: v=2.1 cv=cpwVkjIi c=1 sm=1 tr=0 a=39NrsSuza2clQiZR/7fYWQ==:117 a=39NrsSuza2clQiZR/7fYWQ==:17 a=kj9zAlcOel0A:10 a=NLZqzBF-AAAA:8 a=pGLkceISAAAA:8 a=cHAR2GGfU9pYkqRJI6gA:9 a=CjuIK1q_8ugA:10 Date: Sun, 22 Feb 2015 18:26:19 +0000 From: Peter Stephenson To: "Zsh Hackers' List" Subject: Re: PATCH: parse from even deeper in hell Message-ID: <20150222182619.1851e983@ntlworld.com> In-Reply-To: References: <20150219101315.477f7f95@pwslap01u.europe.root.pri> <20150219220311.7dfdc4ec@ntlworld.com> <20150220100006.24224469@pwslap01u.europe.root.pri> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 20 Feb 2015 11:12:39 +0100 Mikael Magnusson wrote: > > The question is where to put this in on history read. I think it's > > going to affect non-lexical history, too, but the error on reading won't > > be flagged up. > > I don't think so, unmetafy() doesn't care about the table. And as I > checked earlier, both the old and new version of the string in my > history file is unmetafied to the correct UTF-8 string. The 'only' > problem is that the lexer is looking at some bytes before it's > unmetafied and some stuff that should have been metafied to avoid > being parsed as tokens, isn't, because they weren't special in the old > version. That's why I think running unmetafy before lexing is > needed... And if the lexer wants metafied text then we'd just have to > metafy it again right away. See if this fixes the problems, then. Note we're almost out of meta characters with this limitation --- we can't expand beyond the range of 32 we currently reserve if we need to keep compatibility with history. We're only just getting away with it with 0xa0 because 0x80 isn't a meta character, as for historical reasons they start at 0x83. pws diff --git a/Src/hist.c b/Src/hist.c index 381c7e2..acc4259 100644 --- a/Src/hist.c +++ b/Src/hist.c @@ -3377,11 +3377,45 @@ histsplitwords(char *lineptr, short **wordsp, int *nwordsp, int *nwordposp, char *start = lineptr; if (uselex) { - LinkList wordlist = bufferwords(NULL, lineptr, NULL, - LEXFLAGS_COMMENTS_KEEP); + LinkList wordlist; LinkNode wordnode; - int nwords_max; + int nwords_max, remeta = 0; + char *ptr; + + /* + * Handle the special case that we're reading from an + * old shell with fewer meta characters, so we need to + * metafy some more. (It's not clear why the history + * file is metafied at all; some would say this is plain + * stupid. But we're stuck with it now without some + * hairy workarounds for compatibility). + * + * This is rare so doesn't need to be that efficient; just + * allocate space off the heap. + * + * Note that our it's currently believed this all comes out in + * the wash in the non-uselex case owing to where unmetafication + * and metafication happen. + */ + for (ptr = lineptr; *ptr; ptr++) { + if (*ptr != Meta && imeta(*ptr)) + remeta++; + } + if (remeta) { + char *ptr2, *line2; + ptr2 = line2 = (char *)zhalloc((ptr - lineptr) + remeta + 1); + for (ptr = lineptr; *ptr; ptr++) { + if (*ptr != Meta && imeta(*ptr)) { + *ptr2++ = Meta; + *ptr2++ = *ptr ^ 32; + } else + *ptr2++ = *ptr; + } + lineptr = line2; + } + wordlist = bufferwords(NULL, lineptr, NULL, + LEXFLAGS_COMMENTS_KEEP); nwords_max = 2 * countlinknodes(wordlist); if (nwords_max > nwords) { *nwordsp = nwords = nwords_max;