zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
       [not found]     ` <CAH+w=7aqAO37qpSvtmJXMPWq3zP=pUAZCe3i-BFyn0knRmFjVQ@mail.gmail.com>
@ 2020-05-23 19:33       ` Bart Schaefer
  2020-05-23 19:44         ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2020-05-23 19:33 UTC (permalink / raw)
  To: Zsh hackers list

[Moving to -workers from -users]

On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> So, the following?  Or should this also have the equivalent of
>
>             if (*w == Tilde || *w == Equals || *w == String)
>                 *x = *w;
>
> like zle_tricky.c?

Any comment on this?

> diff --git a/Src/lex.c b/Src/lex.c
> index a541def..615da5a 100644
> --- a/Src/lex.c
> +++ b/Src/lex.c
> @@ -1868,8 +1868,12 @@ exalias(void)
>      hwend();
>      if (interact && isset(SHINSTDIN) && !strin && incasepat <= 0 &&
>         tok == STRING && !nocorrect && !(inbufflags & INP_ALIAS) &&
> -       (isset(CORRECTALL) || (isset(CORRECT) && incmdpos)))
> -       spckword(&tokstr, 1, incmdpos, 1);
> +       (isset(CORRECTALL) || (isset(CORRECT) && incmdpos))) {
> +       char *x = dupstring(tokstr);
> +       untokenize(x);
> +       spckword(&x, 1, incmdpos, 1);
> +       tokenize(tokstr = x);
> +    }
>
>      if (!tokstr) {
>         zshlextext = tokstrings[tok];

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-23 19:33       ` [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name Bart Schaefer
@ 2020-05-23 19:44         ` Peter Stephenson
  2020-05-23 19:54           ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2020-05-23 19:44 UTC (permalink / raw)
  To: zsh-workers

On Sat, 2020-05-23 at 12:33 -0700, Bart Schaefer wrote:
> [Moving to -workers from -users]
> 
> On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > 
> > So, the following?  Or should this also have the equivalent of
> > 
> >             if (*w == Tilde || *w == Equals || *w == String)
> >                 *x = *w;
> > 
> > like zle_tricky.c?
> 
> Any comment on this?

I don't think you need to put back tokens.  If the spell check worked,
the result is just a normal word. If the spell check didn't work, just
leave it the way it originally was --- that may need some work but I'd
say it was a better bet than removing and re-adding tokens, which is
bound to hit some nasty cases.

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-23 19:44         ` Peter Stephenson
@ 2020-05-23 19:54           ` Bart Schaefer
  2020-05-23 20:30             ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2020-05-23 19:54 UTC (permalink / raw)
  To: zsh-workers

On Sat, May 23, 2020 at 12:45 PM Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
> On Sat, 2020-05-23 at 12:33 -0700, Bart Schaefer wrote:
> > [Moving to -workers from -users]
> >
> > On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > >
> > > So, the following?  Or should this also have the equivalent of
> > >
> > >             if (*w == Tilde || *w == Equals || *w == String)
> > >                 *x = *w;
> >
> > Any comment on this?
>
> I don't think you need to put back tokens.

Good point.

What about the special handling for Tilde et al?  That precedes the
spckword() call in zle_tricky.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-23 19:54           ` Bart Schaefer
@ 2020-05-23 20:30             ` Peter Stephenson
  2020-05-23 21:18               ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2020-05-23 20:30 UTC (permalink / raw)
  To: zsh-workers

On Sat, 2020-05-23 at 12:54 -0700, Bart Schaefer wrote:
> On Sat, May 23, 2020 at 12:45 PM Peter Stephenson
> <p.w.stephenson@ntlworld.com> wrote:
> > 
> > On Sat, 2020-05-23 at 12:33 -0700, Bart Schaefer wrote:
> > > [Moving to -workers from -users]
> > > 
> > > On Tue, May 19, 2020 at 2:25 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > > > 
> > > > So, the following?  Or should this also have the equivalent of
> > > > 
> > > >             if (*w == Tilde || *w == Equals || *w == String)
> > > >                 *x = *w;
> > > 
> > > Any comment on this?
> > 
> > I don't think you need to put back tokens.
> 
> Good point.
> 
> What about the special handling for Tilde et al?  That precedes the
> spckword() call in zle_tricky.c

Yes, I guess you need to skip over those before untokenizing.  That matches
this line in spckword():

    if (*t == Tilde || *t == Equals || *t == String)
	t++;

The next block is then skipping other tokens.  I wonder if that
would be the right place to put the surgery?  E.g. if thereʼs
a Dash or anything else we want to be able to treat literally, dupstring
at that point within spckword() and replace those?

However, thatʼs late enough that itʼs past the point where we check the
word isnʼt already found in a hash table --- and presumably we need at
least to have the dashes fixed by that point.  But still not sure we
actually need two separate token-based interventions.

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-23 20:30             ` Peter Stephenson
@ 2020-05-23 21:18               ` Bart Schaefer
  2020-05-24 17:11                 ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2020-05-23 21:18 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

On Sat, May 23, 2020 at 1:31 PM Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
> On Sat, 2020-05-23 at 12:54 -0700, Bart Schaefer wrote:
> > > > >
> > > > >             if (*w == Tilde || *w == Equals || *w == String)
> > > > >                 *x = *w;
>
> Yes, I guess you need to skip over those before untokenizing.

zle_tricky is actually re-tokenizing them again after untokenizing but
before calling spckword().

> That matches this line in spckword():
>
>     if (*t == Tilde || *t == Equals || *t == String)
>         t++;

Right, and then it uses those tokens to decide which hash table to
search.  OK, I think that answers the question.

> But still not sure we
> actually need two separate token-based interventions.

Which two are we talking about?  You mean you think we can unify lex.c
and zle_tricky.c inside spckword()?  Yes, we could move the whole
thing to utils.c, and then spckword() would be expecting to take a
tokenized string rather than an untokenized one.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-23 21:18               ` Bart Schaefer
@ 2020-05-24 17:11                 ` Peter Stephenson
  2020-05-24 18:39                   ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2020-05-24 17:11 UTC (permalink / raw)
  To: zsh-workers

On Sat, 2020-05-23 at 14:18 -0700, Bart Schaefer wrote:
> > But still not sure we
> > actually need two separate token-based interventions.
> 
> Which two are we talking about?  You mean you think we can unify lex.c
> and zle_tricky.c inside spckword()?  Yes, we could move the whole
> thing to utils.c, and then spckword() would be expecting to take a
> tokenized string rather than an untokenized one.

Yes, at the moment spckword() is expecting a sort of slightly tokenised
here and there string, which isn't a particular great interface.

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-24 17:11                 ` Peter Stephenson
@ 2020-05-24 18:39                   ` Bart Schaefer
  2020-05-25  5:46                     ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2020-05-24 18:39 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

On Sun, May 24, 2020 at 10:12 AM Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
> > You mean you think we can unify lex.c
> > and zle_tricky.c inside spckword()?
>
> Yes, at the moment spckword() is expecting a sort of slightly tokenised
> here and there string, which isn't a particular great interface.

There's a problem with this to which I don't see an immediate solution:

spckword() modifies the original input string in place, to return the
best guess.  In both lex.c and zle_tricky.c, that result string is
then used in later code, which means if it goes in tokenized it ought
to come back out that way as well.

In zle_tricky.c it is actually strcmp()'d against the input string,
even though zle_tricky.c then immediately untokenizes it again to
actually insert it on the command line.  You said:
> If the spell check worked,
> the result is just a normal word.
We'll at least have to change spckword() from a void function to one
that returns a status of whether it found a guess, because otherwise
"check worked" is a matter of that strcmp().

lex.c seems to want it tokenized, because it then does something
similar to untokenize() except it also converts Nularg (which
untokenize() skips over).  Maybe there are guaranteed not to be any
Nularg at that point.

This is all very messy.  Also, confirm that hash tables are stored untokenized?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-24 18:39                   ` Bart Schaefer
@ 2020-05-25  5:46                     ` Bart Schaefer
  2020-05-25 16:35                       ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2020-05-25  5:46 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

> spckword() modifies the original input string in place, to return the
> best guess.  In both lex.c and zle_tricky.c, that result string is
> then used in later code, which means if it goes in tokenized it ought
> to come back out that way as well.

Maybe this is just the best way to deal with this mess.

diff --git a/Src/utils.c b/Src/utils.c
index 5158a70..118e132 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -3151,8 +3151,12 @@ spckword(char **s, int hist, int cmd, int ask)
     if (*t == Tilde || *t == Equals || *t == String)
     t++;
     for (; *t; t++)
-    if (itok(*t))
-        return;
+    if (itok(*t)) {
+        if (*t == Dash)
+        *t = '-';
+        else
+        return;
+    }
     best = NULL;
     for (t = *s; *t; t++)
     if (*t == '/')

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-25  5:46                     ` Bart Schaefer
@ 2020-05-25 16:35                       ` Peter Stephenson
  2020-05-25 21:45                         ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2020-05-25 16:35 UTC (permalink / raw)
  To: zsh-workers

On Sun, 2020-05-24 at 22:46 -0700, Bart Schaefer wrote:
> > spckword() modifies the original input string in place, to return the
> > best guess.  In both lex.c and zle_tricky.c, that result string is
> > then used in later code, which means if it goes in tokenized it ought
> > to come back out that way as well.
> 
> Maybe this is just the best way to deal with this mess.

I think this might be a good approach, given all the complexities
the other way, but I'd suggest a dupstring if we find a Dash ---
three aren't so many cases that need it, mostly limited to
idiotic parsing requirements in POSIX-compliant contexts,
but I don't think we can guarantee they don't intersect
with cases where spell checking is live.

pws

> diff --git a/Src/utils.c b/Src/utils.c
> index 5158a70..118e132 100644
> --- a/Src/utils.c
> +++ b/Src/utils.c
> @@ -3151,8 +3151,12 @@ spckword(char **s, int hist, int cmd, int ask)
>      if (*t == Tilde || *t == Equals || *t == String)
>      t++;
>      for (; *t; t++)
> -    if (itok(*t))
> -        return;
> +    if (itok(*t)) {
> +        if (*t == Dash)
> +        *t = '-';
> +        else
> +        return;
> +    }
>      best = NULL;
>      for (t = *s; *t; t++)
>      if (*t == '/')


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-25 16:35                       ` Peter Stephenson
@ 2020-05-25 21:45                         ` Bart Schaefer
  2020-05-30 21:25                           ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2020-05-25 21:45 UTC (permalink / raw)
  To: zsh-workers

On Mon, May 25, 2020 at 9:36 AM Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
> I think this might be a good approach, given all the complexities
> the other way, but I'd suggest a dupstring if we find a Dash ---
> [...] I don't think we can guarantee they don't intersect
> with cases where spell checking is live.

This definitely doesn't apply to zle_tricky.c, because it
unconditionally untokenizes both before and after spckword() and then
inserts the string returned.

Which leaves lex.c when CORRECT_ALL is in effect.  I've tried several
variations but can't engineer a situation where spckword() is called
on a pattern where Dash matters, because it's only called for tokens
of type STRING and when !incasepat.  The string would also have to not
include any other tokens preceding the Dash, because spckword() bails
out on the first successful itok().  I don't believe there's any
legitimate pattern that starts with a Dash?  But a leading Dash has to
be handled separately anyway, I missed that.

Hmm, so a command word starting with a hyphen has never been
spell-checked?  Should we fix that, as below?

Gmail is probably going to mess up lines again, but this still might
not be the final form of the patch.  A couple of early bail-out cases
reordered for minor efficiency improvement.

diff --git a/Src/utils.c b/Src/utils.c
index 5158a70..5151b89 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -3128,11 +3128,13 @@ spckword(char **s, int hist, int cmd, int ask)
     int preflen = 0;
     int autocd = cmd && isset(AUTOCD) && strcmp(*s, ".") && strcmp(*s, "..");

-    if ((histdone & HISTFLAG_NOEXEC) || **s == '-' || **s == '%')
+    if (!(*s)[0] || !(*s)[1])
     return;
-    if (!strcmp(*s, "in"))
+    if ((histdone & HISTFLAG_NOEXEC) ||
+    /* Leading % is a job, else leading hyphen is an option */
+    (cmd ? **s == '%' : (**s == '-' || **s == Dash)))
     return;
-    if (!(*s)[0] || !(*s)[1])
+    if (!strcmp(*s, "in"))
     return;
     if (cmd) {
     if (shfunctab->getnode(shfunctab, *s) ||
@@ -3151,8 +3153,12 @@ spckword(char **s, int hist, int cmd, int ask)
     if (*t == Tilde || *t == Equals || *t == String)
     t++;
     for (; *t; t++)
-    if (itok(*t))
-        return;
+    if (itok(*t)) {
+        if (*t == Dash)
+        *t = '-';
+        else
+        return;
+    }
     best = NULL;
     for (t = *s; *t; t++)
     if (*t == '/')

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name
  2020-05-25 21:45                         ` Bart Schaefer
@ 2020-05-30 21:25                           ` Bart Schaefer
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2020-05-30 21:25 UTC (permalink / raw)
  To: zsh-workers

On Mon, May 25, 2020 at 2:45 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Hmm, so a command word starting with a hyphen has never been
> spell-checked?  Should we fix that, as below?

It's been 5 days without comment on this, so I'm going to commit it
and see if anything/anyone squawks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-05-30 21:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOoO2vg=-9P7v=ATOzrbh6VF35o_xzK_-yF+EA6OgTHsQBik-A@mail.gmail.com>
     [not found] ` <CAH+w=7bKpm7GO2ZhbA5Apr0sBDz5=kmX1=WK2X0nTcWxpkHPPQ@mail.gmail.com>
     [not found]   ` <ab8b7655-83e8-45b2-889f-043314faea62@www.fastmail.com>
     [not found]     ` <CAH+w=7aqAO37qpSvtmJXMPWq3zP=pUAZCe3i-BFyn0knRmFjVQ@mail.gmail.com>
2020-05-23 19:33       ` [PATCH?] Re: Autocorrect for commands with a hyphen (dash) in the name Bart Schaefer
2020-05-23 19:44         ` Peter Stephenson
2020-05-23 19:54           ` Bart Schaefer
2020-05-23 20:30             ` Peter Stephenson
2020-05-23 21:18               ` Bart Schaefer
2020-05-24 17:11                 ` Peter Stephenson
2020-05-24 18:39                   ` Bart Schaefer
2020-05-25  5:46                     ` Bart Schaefer
2020-05-25 16:35                       ` Peter Stephenson
2020-05-25 21:45                         ` Bart Schaefer
2020-05-30 21:25                           ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).