zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: Apply spell correction to autocd
@ 2005-02-27 20:44 Bart Schaefer
  2005-02-28  6:54 ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2005-02-27 20:44 UTC (permalink / raw)
  To: zsh-workers

I don't know whether this is going to require tweaking for wide-char file
names, but it's at least as good as the current bin_cd() implementation.

Index: Src/utils.c
===================================================================
RCS file: /extra/cvsroot/zsh/zsh-4.0/Src/utils.c,v
retrieving revision 1.21
diff -c -r1.21 utils.c
--- Src/utils.c	18 Feb 2005 17:05:17 -0000	1.21
+++ Src/utils.c	27 Feb 2005 20:35:06 -0000
@@ -1652,6 +1664,7 @@
     char ic = '\0';
     int ne;
     int preflen = 0;
+    int autocd = cmd && isset(AUTOCD) && strcmp(*s, ".") && strcmp(*s, "..");
 
     if ((histdone & HISTFLAG_NOEXEC) || **s == '-' || **s == '%')
 	return;
@@ -1720,6 +1733,19 @@
 	if (!*t && cmd) {
 	    if (hashcmd(guess, pathchecked))
 		return;
+	    if (autocd) {
+		char **pp, *g = guess;
+		for (pp = cdpath; *pp; pp++) {
+		    char *buf = zhtricat(*pp, "/", *s);
+		    spckword(&buf, 0, 0, 0);
+		    if (best && strcmp(best, guess)) {
+			best = buf + strlen(*pp) + 1;
+			break;
+		    } else if (u != g)
+			best = u;
+		}
+		guess = g;
+	    }
 	    d = 100;
 	    scanhashtable(reswdtab, 1, 0, 0, spscan, 0);
 	    scanhashtable(aliastab, 1, 0, 0, spscan, 0);

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net   


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Apply spell correction to autocd
  2005-02-27 20:44 PATCH: Apply spell correction to autocd Bart Schaefer
@ 2005-02-28  6:54 ` Bart Schaefer
  2005-02-28 10:44   ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2005-02-28  6:54 UTC (permalink / raw)
  To: zsh-workers

On Feb 27,  8:44pm, Bart Schaefer wrote:
} Subject: PATCH: Apply spell correction to autocd
}
} I don't know whether this is going to require tweaking for wide-char file
} names, but it's at least as good as the current bin_cd() implementation.

I was in a bit of a hurry when I worked out that patch, and it occurred to
me laterthat this implementation prefers names by cdpath order rather than
by comparison distance, so I went back to look again, and found several
interesting things.

The first is this snippet of spckword():

	if ((u = spname(guess)) != guess)
	    best = u;

The condition tested here is always true, because spname() never returns
anything other than NULL or a pointer to an internal static buffer.  This
might as well be:

	best = spname(guess);

However, I'm not sure that's the intended semantic, which might be:

	if ((u = spname(guess)) && strcmp(u, guess))
	    best = u;

The next thing that I noticed is that there's no way to recover the comp
distance computed by spname().  Which probably doesn't matter as it's
always less than 3 if spname() returned anything useful.  This is a bit
different than the scheme applied to scanning the hash tables, which
uses a threshold distance of 1/4 of the length of the input.  In other
words, zsh can correct more mistakes in hashed strings than in file
paths, unless the component directory names are very short.

The reason I was interested in the distance computed by spname() was that
it seemed reasonable to loop over the entire cdpath to find the best of
all possible matches, and also to use that distance as the starting value
of d in the next section of spckword():

 	    d = 100;
 	    scanhashtable(reswdtab, 1, 0, 0, spscan, 0);

That is, I'd prefer not to choose something from the hash tables if
there's a cdpath directory that's a better fit.  Presently (even before
my patch) zsh always prefers the hash table unless there's an exact
match from spname(), even if the hashed value is a less precise match.

Finally, spname() is a bit inconsistent, because it returns NULL if it
finds a match with a distance >= 3 in any leading path component, but
returns a copy of the input string even when it finds no match at all
in the final path component.  I suppose that's intended to allow one to
create new files in existing directories, correcting only the existing
part of the path, but it makes spname() ugly to use (and CORRECT_ALL
less useful from the user's perspective) in any case where the final
component is required to exist, such as for "cd".

So I'm not going to commit that patch -- which would be better off not
having to call spckword() recursively in any case -- pending resolution
of some of these issues.  Anybody have any comments?

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net   


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Apply spell correction to autocd
  2005-02-28  6:54 ` Bart Schaefer
@ 2005-02-28 10:44   ` Peter Stephenson
  2005-02-28 18:14     ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2005-02-28 10:44 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote:
> So I'm not going to commit that patch -- which would be better off not
> having to call spckword() recursively in any case -- pending resolution
> of some of these issues.  Anybody have any comments?

I don't think the internal spellchecking stuff has ever had a major
overhaul (as distinct from having extra bits grafted on).  It's not
surprising if it's weird.  I expect tidying it up would be a good idea.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Apply spell correction to autocd
  2005-02-28 10:44   ` Peter Stephenson
@ 2005-02-28 18:14     ` Bart Schaefer
  2005-02-28 18:17       ` Clint Adams
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2005-02-28 18:14 UTC (permalink / raw)
  To: zsh-workers

On Feb 28, 10:44am, Peter Stephenson wrote:
}
} I don't think the internal spellchecking stuff has ever had a major
} overhaul (as distinct from having extra bits grafted on).  It's not
} surprising if it's weird.  I expect tidying it up would be a good idea.

OK, here's a stab at it.  See embedded comments (gasp).  Apply this instead
of the previous (20882) patch, not on top of it.

Index: Src/utils.c
===================================================================
RCS file: /extra/cvsroot/zsh/zsh-4.0/Src/utils.c,v
retrieving revision 1.21
diff -c -r1.21 utils.c
--- Src/utils.c	18 Feb 2005 17:05:17 -0000	1.21
+++ Src/utils.c	28 Feb 2005 18:06:44 -0000
@@ -1647,11 +1659,12 @@
 mod_export void
 spckword(char **s, int hist, int cmd, int ask)
 {
-    char *t, *u;
+    char *t;
     int x;
     char ic = '\0';
     int ne;
     int preflen = 0;
+    int autocd = cmd && isset(AUTOCD) && strcmp(*s, ".") && strcmp(*s, "..");
 
     if ((histdone & HISTFLAG_NOEXEC) || **s == '-' || **s == '%')
 	return;
@@ -1715,8 +1728,7 @@
 	}
 	if (access(unmeta(guess), F_OK) == 0)
 	    return;
-	if ((u = spname(guess)) != guess)
-	    best = u;
+	best = spname(guess);
 	if (!*t && cmd) {
 	    if (hashcmd(guess, pathchecked))
 		return;
@@ -1726,12 +1738,28 @@
 	    scanhashtable(shfunctab, 1, 0, 0, spscan, 0);
 	    scanhashtable(builtintab, 1, 0, 0, spscan, 0);
 	    scanhashtable(cmdnamtab, 1, 0, 0, spscan, 0);
+	    if (autocd) {
+		char **pp;
+		for (pp = cdpath; *pp; pp++) {
+		    char bestcd[PATH_MAX + 1];
+		    int thisdist;
+		    /* Less than d here, instead of less than or equal  *
+		     * as used in spscan(), so that an autocd is chosen *
+		     * only when it is better than anything so far, and *
+		     * so we prefer directories earlier in the cdpath.  */
+		    if ((thisdist = mindist(*pp, *s, bestcd)) < d) {
+			best = dupstring(bestcd);
+			d = thisdist;
+		    }
+		}
+	    }
 	}
     }
     if (errflag)
 	return;
     if (best && (int)strlen(best) > 1 && strcmp(best, guess)) {
 	if (ic) {
+	    char *u;
 	    if (preflen) {
 		/* do not correct the result of an expansion */
 		if (strncmp(guess, best, preflen))
@@ -2421,10 +2449,14 @@
 {
     char *p, spnameguess[PATH_MAX + 1], spnamebest[PATH_MAX + 1];
     static char newname[PATH_MAX + 1];
-    char *new = newname, *old;
-    int bestdist = 200, thisdist;
+    char *new = newname, *old = oldname;
+    int bestdist = 0, thisdist, thresh, maxthresh = 0;
 
-    old = oldname;
+    /* This loop corrects each directory component of the path, stopping *
+     * when any correction distance would exceed the distance threshold. *
+     * NULL is returned only if the first component cannot be corrected; *
+     * otherwise a copy of oldname with a corrected prefix is returned.  *
+     * Rationale for this, if there ever was any, has been forgotten.    */
     for (;;) {
 	while (*old == '/')
 	    *new++ = *old++;
@@ -2436,15 +2468,29 @@
 	    if (p < spnameguess + PATH_MAX)
 		*p++ = *old;
 	*p = '\0';
-	if ((thisdist = mindist(newname, spnameguess, spnamebest)) >= 3) {
-	    if (bestdist < 3) {
+	/* Every component is allowed a single distance 2 correction or two *
+	 * distance 1 corrections.  Longer ones get additional corrections. */
+	thresh = (int)(p - spnameguess) / 4 + 1;
+	if (thresh < 3)
+	    thresh = 3;
+	if ((thisdist = mindist(newname, spnameguess, spnamebest)) >= thresh) {
+	    /* The next test is always true, except for the first path    *
+	     * component.  We could initialize bestdist to some large     *
+	     * constant instead, and then compare to that constant here,  *
+	     * because an invariant is that we've never exceeded the      *
+	     * threshold for any component so far; but I think that looks *
+	     * odd to the human reader, and we may make use of the total  *
+	     * distance for all corrections at some point in the future.  */
+	    if (bestdist < maxthresh) {
 		strcpy(new, spnameguess);
 		strcat(new, old);
 		return newname;
 	    } else
 	    	return NULL;
-	} else
-	    bestdist = thisdist;
+	} else {
+	    maxthresh = bestdist + thresh;
+	    bestdist += thisdist;
+	}
 	for (p = spnamebest; (*new = *p++);)
 	    new++;
     }
@@ -2487,6 +2533,7 @@
 static int
 spdist(char *s, char *t, int thresh)
 {
+    /* TODO: Correction for non-ASCII and multibyte-input keyboards. */
     char *p, *q;
     const char qwertykeymap[] =
     "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\
@@ -2520,7 +2567,7 @@
 
     if (!strcmp(s, t))
 	return 0;
-/* any number of upper/lower mistakes allowed (dist = 1) */
+    /* any number of upper/lower mistakes allowed (dist = 1) */
     for (p = s, q = t; *p && tulower(*p) == tulower(*q); p++, q++);
     if (!*p && !*q)
 	return 1;
@@ -2544,7 +2591,7 @@
 	    int t0;
 	    char *z;
 
-	/* mistyped letter */
+	    /* mistyped letter */
 
 	    if (!(z = strchr(keymap, p[0])) || *z == '\n' || *z == '\t')
 		return spdist(p + 1, q + 1, thresh - 1) + 1;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Apply spell correction to autocd
  2005-02-28 18:14     ` Bart Schaefer
@ 2005-02-28 18:17       ` Clint Adams
  0 siblings, 0 replies; 5+ messages in thread
From: Clint Adams @ 2005-02-28 18:17 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

>  spdist(char *s, char *t, int thresh)
>  {
> +    /* TODO: Correction for non-ASCII and multibyte-input keyboards. */
>      char *p, *q;
>      const char qwertykeymap[] =
>      "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\

How hard would it be to call an Eprog here?


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-02-28 18:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-27 20:44 PATCH: Apply spell correction to autocd Bart Schaefer
2005-02-28  6:54 ` Bart Schaefer
2005-02-28 10:44   ` Peter Stephenson
2005-02-28 18:14     ` Bart Schaefer
2005-02-28 18:17       ` Clint Adams

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).