* PATCH: Apply spell correction to autocd @ 2005-02-27 20:44 Bart Schaefer 2005-02-28 6:54 ` Bart Schaefer 0 siblings, 1 reply; 5+ messages in thread From: Bart Schaefer @ 2005-02-27 20:44 UTC (permalink / raw) To: zsh-workers I don't know whether this is going to require tweaking for wide-char file names, but it's at least as good as the current bin_cd() implementation. Index: Src/utils.c =================================================================== RCS file: /extra/cvsroot/zsh/zsh-4.0/Src/utils.c,v retrieving revision 1.21 diff -c -r1.21 utils.c --- Src/utils.c 18 Feb 2005 17:05:17 -0000 1.21 +++ Src/utils.c 27 Feb 2005 20:35:06 -0000 @@ -1652,6 +1664,7 @@ char ic = '\0'; int ne; int preflen = 0; + int autocd = cmd && isset(AUTOCD) && strcmp(*s, ".") && strcmp(*s, ".."); if ((histdone & HISTFLAG_NOEXEC) || **s == '-' || **s == '%') return; @@ -1720,6 +1733,19 @@ if (!*t && cmd) { if (hashcmd(guess, pathchecked)) return; + if (autocd) { + char **pp, *g = guess; + for (pp = cdpath; *pp; pp++) { + char *buf = zhtricat(*pp, "/", *s); + spckword(&buf, 0, 0, 0); + if (best && strcmp(best, guess)) { + best = buf + strlen(*pp) + 1; + break; + } else if (u != g) + best = u; + } + guess = g; + } d = 100; scanhashtable(reswdtab, 1, 0, 0, spscan, 0); scanhashtable(aliastab, 1, 0, 0, spscan, 0); -- Bart Schaefer Brass Lantern Enterprises http://www.well.com/user/barts http://www.brasslantern.com Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PATCH: Apply spell correction to autocd 2005-02-27 20:44 PATCH: Apply spell correction to autocd Bart Schaefer @ 2005-02-28 6:54 ` Bart Schaefer 2005-02-28 10:44 ` Peter Stephenson 0 siblings, 1 reply; 5+ messages in thread From: Bart Schaefer @ 2005-02-28 6:54 UTC (permalink / raw) To: zsh-workers On Feb 27, 8:44pm, Bart Schaefer wrote: } Subject: PATCH: Apply spell correction to autocd } } I don't know whether this is going to require tweaking for wide-char file } names, but it's at least as good as the current bin_cd() implementation. I was in a bit of a hurry when I worked out that patch, and it occurred to me laterthat this implementation prefers names by cdpath order rather than by comparison distance, so I went back to look again, and found several interesting things. The first is this snippet of spckword(): if ((u = spname(guess)) != guess) best = u; The condition tested here is always true, because spname() never returns anything other than NULL or a pointer to an internal static buffer. This might as well be: best = spname(guess); However, I'm not sure that's the intended semantic, which might be: if ((u = spname(guess)) && strcmp(u, guess)) best = u; The next thing that I noticed is that there's no way to recover the comp distance computed by spname(). Which probably doesn't matter as it's always less than 3 if spname() returned anything useful. This is a bit different than the scheme applied to scanning the hash tables, which uses a threshold distance of 1/4 of the length of the input. In other words, zsh can correct more mistakes in hashed strings than in file paths, unless the component directory names are very short. The reason I was interested in the distance computed by spname() was that it seemed reasonable to loop over the entire cdpath to find the best of all possible matches, and also to use that distance as the starting value of d in the next section of spckword(): d = 100; scanhashtable(reswdtab, 1, 0, 0, spscan, 0); That is, I'd prefer not to choose something from the hash tables if there's a cdpath directory that's a better fit. Presently (even before my patch) zsh always prefers the hash table unless there's an exact match from spname(), even if the hashed value is a less precise match. Finally, spname() is a bit inconsistent, because it returns NULL if it finds a match with a distance >= 3 in any leading path component, but returns a copy of the input string even when it finds no match at all in the final path component. I suppose that's intended to allow one to create new files in existing directories, correcting only the existing part of the path, but it makes spname() ugly to use (and CORRECT_ALL less useful from the user's perspective) in any case where the final component is required to exist, such as for "cd". So I'm not going to commit that patch -- which would be better off not having to call spckword() recursively in any case -- pending resolution of some of these issues. Anybody have any comments? -- Bart Schaefer Brass Lantern Enterprises http://www.well.com/user/barts http://www.brasslantern.com Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PATCH: Apply spell correction to autocd 2005-02-28 6:54 ` Bart Schaefer @ 2005-02-28 10:44 ` Peter Stephenson 2005-02-28 18:14 ` Bart Schaefer 0 siblings, 1 reply; 5+ messages in thread From: Peter Stephenson @ 2005-02-28 10:44 UTC (permalink / raw) To: zsh-workers Bart Schaefer wrote: > So I'm not going to commit that patch -- which would be better off not > having to call spckword() recursively in any case -- pending resolution > of some of these issues. Anybody have any comments? I don't think the internal spellchecking stuff has ever had a major overhaul (as distinct from having extra bits grafted on). It's not surprising if it's weird. I expect tidying it up would be a good idea. -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PATCH: Apply spell correction to autocd 2005-02-28 10:44 ` Peter Stephenson @ 2005-02-28 18:14 ` Bart Schaefer 2005-02-28 18:17 ` Clint Adams 0 siblings, 1 reply; 5+ messages in thread From: Bart Schaefer @ 2005-02-28 18:14 UTC (permalink / raw) To: zsh-workers On Feb 28, 10:44am, Peter Stephenson wrote: } } I don't think the internal spellchecking stuff has ever had a major } overhaul (as distinct from having extra bits grafted on). It's not } surprising if it's weird. I expect tidying it up would be a good idea. OK, here's a stab at it. See embedded comments (gasp). Apply this instead of the previous (20882) patch, not on top of it. Index: Src/utils.c =================================================================== RCS file: /extra/cvsroot/zsh/zsh-4.0/Src/utils.c,v retrieving revision 1.21 diff -c -r1.21 utils.c --- Src/utils.c 18 Feb 2005 17:05:17 -0000 1.21 +++ Src/utils.c 28 Feb 2005 18:06:44 -0000 @@ -1647,11 +1659,12 @@ mod_export void spckword(char **s, int hist, int cmd, int ask) { - char *t, *u; + char *t; int x; char ic = '\0'; int ne; int preflen = 0; + int autocd = cmd && isset(AUTOCD) && strcmp(*s, ".") && strcmp(*s, ".."); if ((histdone & HISTFLAG_NOEXEC) || **s == '-' || **s == '%') return; @@ -1715,8 +1728,7 @@ } if (access(unmeta(guess), F_OK) == 0) return; - if ((u = spname(guess)) != guess) - best = u; + best = spname(guess); if (!*t && cmd) { if (hashcmd(guess, pathchecked)) return; @@ -1726,12 +1738,28 @@ scanhashtable(shfunctab, 1, 0, 0, spscan, 0); scanhashtable(builtintab, 1, 0, 0, spscan, 0); scanhashtable(cmdnamtab, 1, 0, 0, spscan, 0); + if (autocd) { + char **pp; + for (pp = cdpath; *pp; pp++) { + char bestcd[PATH_MAX + 1]; + int thisdist; + /* Less than d here, instead of less than or equal * + * as used in spscan(), so that an autocd is chosen * + * only when it is better than anything so far, and * + * so we prefer directories earlier in the cdpath. */ + if ((thisdist = mindist(*pp, *s, bestcd)) < d) { + best = dupstring(bestcd); + d = thisdist; + } + } + } } } if (errflag) return; if (best && (int)strlen(best) > 1 && strcmp(best, guess)) { if (ic) { + char *u; if (preflen) { /* do not correct the result of an expansion */ if (strncmp(guess, best, preflen)) @@ -2421,10 +2449,14 @@ { char *p, spnameguess[PATH_MAX + 1], spnamebest[PATH_MAX + 1]; static char newname[PATH_MAX + 1]; - char *new = newname, *old; - int bestdist = 200, thisdist; + char *new = newname, *old = oldname; + int bestdist = 0, thisdist, thresh, maxthresh = 0; - old = oldname; + /* This loop corrects each directory component of the path, stopping * + * when any correction distance would exceed the distance threshold. * + * NULL is returned only if the first component cannot be corrected; * + * otherwise a copy of oldname with a corrected prefix is returned. * + * Rationale for this, if there ever was any, has been forgotten. */ for (;;) { while (*old == '/') *new++ = *old++; @@ -2436,15 +2468,29 @@ if (p < spnameguess + PATH_MAX) *p++ = *old; *p = '\0'; - if ((thisdist = mindist(newname, spnameguess, spnamebest)) >= 3) { - if (bestdist < 3) { + /* Every component is allowed a single distance 2 correction or two * + * distance 1 corrections. Longer ones get additional corrections. */ + thresh = (int)(p - spnameguess) / 4 + 1; + if (thresh < 3) + thresh = 3; + if ((thisdist = mindist(newname, spnameguess, spnamebest)) >= thresh) { + /* The next test is always true, except for the first path * + * component. We could initialize bestdist to some large * + * constant instead, and then compare to that constant here, * + * because an invariant is that we've never exceeded the * + * threshold for any component so far; but I think that looks * + * odd to the human reader, and we may make use of the total * + * distance for all corrections at some point in the future. */ + if (bestdist < maxthresh) { strcpy(new, spnameguess); strcat(new, old); return newname; } else return NULL; - } else - bestdist = thisdist; + } else { + maxthresh = bestdist + thresh; + bestdist += thisdist; + } for (p = spnamebest; (*new = *p++);) new++; } @@ -2487,6 +2533,7 @@ static int spdist(char *s, char *t, int thresh) { + /* TODO: Correction for non-ASCII and multibyte-input keyboards. */ char *p, *q; const char qwertykeymap[] = "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ @@ -2520,7 +2567,7 @@ if (!strcmp(s, t)) return 0; -/* any number of upper/lower mistakes allowed (dist = 1) */ + /* any number of upper/lower mistakes allowed (dist = 1) */ for (p = s, q = t; *p && tulower(*p) == tulower(*q); p++, q++); if (!*p && !*q) return 1; @@ -2544,7 +2591,7 @@ int t0; char *z; - /* mistyped letter */ + /* mistyped letter */ if (!(z = strchr(keymap, p[0])) || *z == '\n' || *z == '\t') return spdist(p + 1, q + 1, thresh - 1) + 1; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PATCH: Apply spell correction to autocd 2005-02-28 18:14 ` Bart Schaefer @ 2005-02-28 18:17 ` Clint Adams 0 siblings, 0 replies; 5+ messages in thread From: Clint Adams @ 2005-02-28 18:17 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-workers > spdist(char *s, char *t, int thresh) > { > + /* TODO: Correction for non-ASCII and multibyte-input keyboards. */ > char *p, *q; > const char qwertykeymap[] = > "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ How hard would it be to call an Eprog here? ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-02-28 18:17 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-02-27 20:44 PATCH: Apply spell correction to autocd Bart Schaefer 2005-02-28 6:54 ` Bart Schaefer 2005-02-28 10:44 ` Peter Stephenson 2005-02-28 18:14 ` Bart Schaefer 2005-02-28 18:17 ` Clint Adams
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).