* [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name [not found] ` <CAH+w=7aqAO37qpSvtmJXMPWq3zP=pUAZCe3i-BFyn0knRmFjVQ@mail.gmail.com> @ 2020-05-23 19:33 ` Bart Schaefer 2020-05-23 19:44 ` Peter Stephenson 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2020-05-23 19:33 UTC (permalink / raw) To: Zsh hackers list [Moving to -workers from -users] On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > So, the following? Or should this also have the equivalent of > > if (*w == Tilde || *w == Equals || *w == String) > *x = *w; > > like zle_tricky.c? Any comment on this? > diff --git a/Src/lex.c b/Src/lex.c > index a541def..615da5a 100644 > --- a/Src/lex.c > +++ b/Src/lex.c > @@ -1868,8 +1868,12 @@ exalias(void) > hwend(); > if (interact && isset(SHINSTDIN) && !strin && incasepat <= 0 && > tok == STRING && !nocorrect && !(inbufflags & INP_ALIAS) && > - (isset(CORRECTALL) || (isset(CORRECT) && incmdpos))) > - spckword(&tokstr, 1, incmdpos, 1); > + (isset(CORRECTALL) || (isset(CORRECT) && incmdpos))) { > + char *x = dupstring(tokstr); > + untokenize(x); > + spckword(&x, 1, incmdpos, 1); > + tokenize(tokstr = x); > + } > > if (!tokstr) { > zshlextext = tokstrings[tok]; ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-23 19:33 ` [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name Bart Schaefer @ 2020-05-23 19:44 ` Peter Stephenson 2020-05-23 19:54 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Peter Stephenson @ 2020-05-23 19:44 UTC (permalink / raw) To: zsh-workers On Sat, 2020-05-23 at 12:33 -0700, Bart Schaefer wrote: > [Moving to -workers from -users] > > On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > So, the following? Or should this also have the equivalent of > > > > if (*w == Tilde || *w == Equals || *w == String) > > *x = *w; > > > > like zle_tricky.c? > > Any comment on this? I don't think you need to put back tokens. If the spell check worked, the result is just a normal word. If the spell check didn't work, just leave it the way it originally was --- that may need some work but I'd say it was a better bet than removing and re-adding tokens, which is bound to hit some nasty cases. pws ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-23 19:44 ` Peter Stephenson @ 2020-05-23 19:54 ` Bart Schaefer 2020-05-23 20:30 ` Peter Stephenson 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2020-05-23 19:54 UTC (permalink / raw) To: zsh-workers On Sat, May 23, 2020 at 12:45 PM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > On Sat, 2020-05-23 at 12:33 -0700, Bart Schaefer wrote: > > [Moving to -workers from -users] > > > > On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > > > So, the following? Or should this also have the equivalent of > > > > > > if (*w == Tilde || *w == Equals || *w == String) > > > *x = *w; > > > > Any comment on this? > > I don't think you need to put back tokens. Good point. What about the special handling for Tilde et al? That precedes the spckword() call in zle_tricky.c ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-23 19:54 ` Bart Schaefer @ 2020-05-23 20:30 ` Peter Stephenson 2020-05-23 21:18 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Peter Stephenson @ 2020-05-23 20:30 UTC (permalink / raw) To: zsh-workers On Sat, 2020-05-23 at 12:54 -0700, Bart Schaefer wrote: > On Sat, May 23, 2020 at 12:45 PM Peter Stephenson > <p.w.stephenson@ntlworld.com> wrote: > > > > On Sat, 2020-05-23 at 12:33 -0700, Bart Schaefer wrote: > > > [Moving to -workers from -users] > > > > > > On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > > > > > So, the following? Or should this also have the equivalent of > > > > > > > > if (*w == Tilde || *w == Equals || *w == String) > > > > *x = *w; > > > > > > Any comment on this? > > > > I don't think you need to put back tokens. > > Good point. > > What about the special handling for Tilde et al? That precedes the > spckword() call in zle_tricky.c Yes, I guess you need to skip over those before untokenizing. That matches this line in spckword(): if (*t == Tilde || *t == Equals || *t == String) t++; The next block is then skipping other tokens. I wonder if that would be the right place to put the surgery? E.g. if thereʼs a Dash or anything else we want to be able to treat literally, dupstring at that point within spckword() and replace those? However, thatʼs late enough that itʼs past the point where we check the word isnʼt already found in a hash table --- and presumably we need at least to have the dashes fixed by that point. But still not sure we actually need two separate token-based interventions. pws ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-23 20:30 ` Peter Stephenson @ 2020-05-23 21:18 ` Bart Schaefer 2020-05-24 17:11 ` Peter Stephenson 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2020-05-23 21:18 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-workers On Sat, May 23, 2020 at 1:31 PM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > On Sat, 2020-05-23 at 12:54 -0700, Bart Schaefer wrote: > > > > > > > > > > if (*w == Tilde || *w == Equals || *w == String) > > > > > *x = *w; > > Yes, I guess you need to skip over those before untokenizing. zle_tricky is actually re-tokenizing them again after untokenizing but before calling spckword(). > That matches this line in spckword(): > > if (*t == Tilde || *t == Equals || *t == String) > t++; Right, and then it uses those tokens to decide which hash table to search. OK, I think that answers the question. > But still not sure we > actually need two separate token-based interventions. Which two are we talking about? You mean you think we can unify lex.c and zle_tricky.c inside spckword()? Yes, we could move the whole thing to utils.c, and then spckword() would be expecting to take a tokenized string rather than an untokenized one. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-23 21:18 ` Bart Schaefer @ 2020-05-24 17:11 ` Peter Stephenson 2020-05-24 18:39 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Peter Stephenson @ 2020-05-24 17:11 UTC (permalink / raw) To: zsh-workers On Sat, 2020-05-23 at 14:18 -0700, Bart Schaefer wrote: > > But still not sure we > > actually need two separate token-based interventions. > > Which two are we talking about? You mean you think we can unify lex.c > and zle_tricky.c inside spckword()? Yes, we could move the whole > thing to utils.c, and then spckword() would be expecting to take a > tokenized string rather than an untokenized one. Yes, at the moment spckword() is expecting a sort of slightly tokenised here and there string, which isn't a particular great interface. pws ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-24 17:11 ` Peter Stephenson @ 2020-05-24 18:39 ` Bart Schaefer 2020-05-25 5:46 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2020-05-24 18:39 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-workers On Sun, May 24, 2020 at 10:12 AM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > > You mean you think we can unify lex.c > > and zle_tricky.c inside spckword()? > > Yes, at the moment spckword() is expecting a sort of slightly tokenised > here and there string, which isn't a particular great interface. There's a problem with this to which I don't see an immediate solution: spckword() modifies the original input string in place, to return the best guess. In both lex.c and zle_tricky.c, that result string is then used in later code, which means if it goes in tokenized it ought to come back out that way as well. In zle_tricky.c it is actually strcmp()'d against the input string, even though zle_tricky.c then immediately untokenizes it again to actually insert it on the command line. You said: > If the spell check worked, > the result is just a normal word. We'll at least have to change spckword() from a void function to one that returns a status of whether it found a guess, because otherwise "check worked" is a matter of that strcmp(). lex.c seems to want it tokenized, because it then does something similar to untokenize() except it also converts Nularg (which untokenize() skips over). Maybe there are guaranteed not to be any Nularg at that point. This is all very messy. Also, confirm that hash tables are stored untokenized? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-24 18:39 ` Bart Schaefer @ 2020-05-25 5:46 ` Bart Schaefer 2020-05-25 16:35 ` Peter Stephenson 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2020-05-25 5:46 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-workers > spckword() modifies the original input string in place, to return the > best guess. In both lex.c and zle_tricky.c, that result string is > then used in later code, which means if it goes in tokenized it ought > to come back out that way as well. Maybe this is just the best way to deal with this mess. diff --git a/Src/utils.c b/Src/utils.c index 5158a70..118e132 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -3151,8 +3151,12 @@ spckword(char **s, int hist, int cmd, int ask) if (*t == Tilde || *t == Equals || *t == String) t++; for (; *t; t++) - if (itok(*t)) - return; + if (itok(*t)) { + if (*t == Dash) + *t = '-'; + else + return; + } best = NULL; for (t = *s; *t; t++) if (*t == '/') ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-25 5:46 ` Bart Schaefer @ 2020-05-25 16:35 ` Peter Stephenson 2020-05-25 21:45 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Peter Stephenson @ 2020-05-25 16:35 UTC (permalink / raw) To: zsh-workers On Sun, 2020-05-24 at 22:46 -0700, Bart Schaefer wrote: > > spckword() modifies the original input string in place, to return the > > best guess. In both lex.c and zle_tricky.c, that result string is > > then used in later code, which means if it goes in tokenized it ought > > to come back out that way as well. > > Maybe this is just the best way to deal with this mess. I think this might be a good approach, given all the complexities the other way, but I'd suggest a dupstring if we find a Dash --- three aren't so many cases that need it, mostly limited to idiotic parsing requirements in POSIX-compliant contexts, but I don't think we can guarantee they don't intersect with cases where spell checking is live. pws > diff --git a/Src/utils.c b/Src/utils.c > index 5158a70..118e132 100644 > --- a/Src/utils.c > +++ b/Src/utils.c > @@ -3151,8 +3151,12 @@ spckword(char **s, int hist, int cmd, int ask) > if (*t == Tilde || *t == Equals || *t == String) > t++; > for (; *t; t++) > - if (itok(*t)) > - return; > + if (itok(*t)) { > + if (*t == Dash) > + *t = '-'; > + else > + return; > + } > best = NULL; > for (t = *s; *t; t++) > if (*t == '/') ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-25 16:35 ` Peter Stephenson @ 2020-05-25 21:45 ` Bart Schaefer 2020-05-30 21:25 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2020-05-25 21:45 UTC (permalink / raw) To: zsh-workers On Mon, May 25, 2020 at 9:36 AM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > I think this might be a good approach, given all the complexities > the other way, but I'd suggest a dupstring if we find a Dash --- > [...] I don't think we can guarantee they don't intersect > with cases where spell checking is live. This definitely doesn't apply to zle_tricky.c, because it unconditionally untokenizes both before and after spckword() and then inserts the string returned. Which leaves lex.c when CORRECT_ALL is in effect. I've tried several variations but can't engineer a situation where spckword() is called on a pattern where Dash matters, because it's only called for tokens of type STRING and when !incasepat. The string would also have to not include any other tokens preceding the Dash, because spckword() bails out on the first successful itok(). I don't believe there's any legitimate pattern that starts with a Dash? But a leading Dash has to be handled separately anyway, I missed that. Hmm, so a command word starting with a hyphen has never been spell-checked? Should we fix that, as below? Gmail is probably going to mess up lines again, but this still might not be the final form of the patch. A couple of early bail-out cases reordered for minor efficiency improvement. diff --git a/Src/utils.c b/Src/utils.c index 5158a70..5151b89 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -3128,11 +3128,13 @@ spckword(char **s, int hist, int cmd, int ask) int preflen = 0; int autocd = cmd && isset(AUTOCD) && strcmp(*s, ".") && strcmp(*s, ".."); - if ((histdone & HISTFLAG_NOEXEC) || **s == '-' || **s == '%') + if (!(*s)[0] || !(*s)[1]) return; - if (!strcmp(*s, "in")) + if ((histdone & HISTFLAG_NOEXEC) || + /* Leading % is a job, else leading hyphen is an option */ + (cmd ? **s == '%' : (**s == '-' || **s == Dash))) return; - if (!(*s)[0] || !(*s)[1]) + if (!strcmp(*s, "in")) return; if (cmd) { if (shfunctab->getnode(shfunctab, *s) || @@ -3151,8 +3153,12 @@ spckword(char **s, int hist, int cmd, int ask) if (*t == Tilde || *t == Equals || *t == String) t++; for (; *t; t++) - if (itok(*t)) - return; + if (itok(*t)) { + if (*t == Dash) + *t = '-'; + else + return; + } best = NULL; for (t = *s; *t; t++) if (*t == '/') ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name 2020-05-25 21:45 ` Bart Schaefer @ 2020-05-30 21:25 ` Bart Schaefer 0 siblings, 0 replies; 11+ messages in thread From: Bart Schaefer @ 2020-05-30 21:25 UTC (permalink / raw) To: zsh-workers On Mon, May 25, 2020 at 2:45 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > Hmm, so a command word starting with a hyphen has never been > spell-checked? Should we fix that, as below? It's been 5 days without comment on this, so I'm going to commit it and see if anything/anyone squawks. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-05-30 21:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAOoO2vg=-9P7v=ATOzrbh6VF35o_xzK_-yF+EA6OgTHsQBik-A@mail.gmail.com> [not found] ` <CAH+w=7bKpm7GO2ZhbA5Apr0sBDz5=kmX1=WK2X0nTcWxpkHPPQ@mail.gmail.com> [not found] ` <ab8b7655-83e8-45b2-889f-043314faea62@www.fastmail.com> [not found] ` <CAH+w=7aqAO37qpSvtmJXMPWq3zP=pUAZCe3i-BFyn0knRmFjVQ@mail.gmail.com> 2020-05-23 19:33 ` [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name Bart Schaefer 2020-05-23 19:44 ` Peter Stephenson 2020-05-23 19:54 ` Bart Schaefer 2020-05-23 20:30 ` Peter Stephenson 2020-05-23 21:18 ` Bart Schaefer 2020-05-24 17:11 ` Peter Stephenson 2020-05-24 18:39 ` Bart Schaefer 2020-05-25 5:46 ` Bart Schaefer 2020-05-25 16:35 ` Peter Stephenson 2020-05-25 21:45 ` Bart Schaefer 2020-05-30 21:25 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).