From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 29E40BC6B for ; Tue, 30 Oct 2007 01:17:38 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAABYVJkeAcIhIh2dsb2JhbACCcottAQEBCAopgRM X-IronPort-AV: E=Sophos;i="4.21,345,1188770400"; d="scan'208,217";a="3810110" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 30 Oct 2007 01:17:36 +0100 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l9U0HYLc025299 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Tue, 30 Oct 2007 01:17:35 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAABYVJkeAcIhIh2dsb2JhbACCcottAQEBCAopgRM X-IronPort-AV: E=Sophos;i="4.21,345,1188770400"; d="scan'208,217";a="3810107" Received: from redflag.cs.princeton.edu ([128.112.136.72]) by mail1-smtp-roc.national.inria.fr with ESMTP; 30 Oct 2007 01:17:34 +0100 Received: from [192.168.0.100] (H-135-207-240-26.research.att.com [135.207.240.26]) (authenticated bits=0) by redflag.CS.Princeton.EDU (8.13.8/8.13.8) with ESMTP id l9U0HEXi012375 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Mon, 29 Oct 2007 20:17:17 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <47264C70.5050100@gmail.com> References: <47264C70.5050100@gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-52-647866041 Message-Id: <45E766F6-29C7-4710-8E32-9C6007A516FF@cs.princeton.edu> From: Yitzhak Mandelbaum Subject: Re: [Caml-list] Patch to 3.10.0 compiler enabling simple spell-checking Date: Mon, 29 Oct 2007 20:17:10 -0400 To: Caml List X-Mailer: Apple Mail (2.752.3) X-Proofpoint-Virus-Version: vendor=nai engine=5.2.00 definitions=5151 signatures=335511 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 classifier= adjust=0 reason=safe engine=3.1.0-0708230000 definitions=main-0710290155 X-Proofpoint-Spam-Level: X-Miltered: at discorde with ID 4726781E.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; yitzhak:01 mandelbaum:01 compiler:01 compiler:01 ocamlc:01 parses:01 cheers:01 yitzhak:01 ocamlc:01 lablgtk:01 lablgtk:01 gtkinit:01 cmo:01 compilation:01 spellings:01 --Apple-Mail-52-647866041 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Very cool! Do you think there's any way you could separate it from the compiler, like Learner et al.'s SEMINAL work, which separates type error messages from the compiler?. Separation could help ensure this (and any other, similar) ideas don't accidentally introduce bugs into the compiler, and make it much easier for you to maintain. A very simple hack might be tod wrap ocamlc in a script that parses such error messages and then tokenizes the source file, looking for similar strings? Cheers, Yitzhak On Oct 29, 2007, at 5:11 PM, Edgar Friendly wrote: > One random little feature of GNAT that comes in handy for me is its > habit of, when I misspell an identifier, giving me a possible > correction > in its compile error message. Spending some time with the 3.10.0 > sources, I have created a "second draft" patch creating this > functionality in my favored language. > > Example: > ======== > > # /home/thelema/Projects/ocaml-custom/bin/ocamlc -o coml -I +lablgtk2 > lablgtk.cma gtkInit.cmo coml.ml > File "coml.ml", line 61, characters 16-25: > Unbound value is_arcive, possible misspelling of is_archive > > Impacts: > ======== > > Efficiency in the case of finding a mistake should be quite good, > although this shouldn't matter too much since the compiler quits > pretty > early in compilation when it finds an unbound identifier. > > In the case of no unbound identifiers, the cost is an extra try/with > block around the standard lookup. I haven't made any benchmarks, > though. > > I expect this code to have little long term maintenance issues - the > major source of code changes was adding a "* string list" to a > number of > exceptions to carry the list of possible correct spellings to the > point > they get output by the compiler. These exceptions are still usable as > before with an empty list in this spot. > > It's possible the code has created opportunities for uncaught > exceptions > in the compiler as I only checked for instances of "Not_found" in a > few > files -- those which dealt with the Unbound_* exceptions. Someone who > knows the internals better might find places the "Found_nearly" > exception that carries possible corrections might escape into. > > > Dedicated to: > Yaron Minsky and the team at Jane Street > > E. ----------------------------- Yitzhak Mandelbaum --Apple-Mail-52-647866041 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1
Very cool! Do = you think there's any way you could separate it from the compiler, like = Learner et al.'s SEMINAL work, which separates type error messages from = the compiler?. Separation could help ensure this (and any other, = similar) ideas don't accidentally introduce bugs into the compiler, and = make it much easier for you to maintain. A very simple hack might be tod = wrap ocamlc in a script that parses such error messages and then = tokenizes the source file, looking for similar strings?

Yitzhak

On = Oct 29, 2007, at 5:11 PM, Edgar Friendly wrote:

One random little feature of GNAT that comes in = handy for me is its
habit of, when I misspell = an identifier, giving me a possible correction
in its compile error message.=A0 Spending some time with the = 3.10.0
sources, I have created a = "second draft" patch creating this
functionality = in my favored language.

Example:
=3D=3D=3D=3D=3D=3D=3D=3D

# = /home/thelema/Projects/ocaml-custom/bin/ocamlc -o coml -I = +lablgtk2
lablgtk.cma gtkInit.cmo = coml.ml
File "coml.ml", line 61, = characters 16-25:
Unbound value is_arcive, = possible misspelling of is_archive

Impacts:
=3D=3D=3D=3D=3D=3D=3D=3D

although this shouldn't matter = too much since the compiler quits pretty
early in = compilation when it finds an unbound identifier.

In the = case of no unbound identifiers, the cost is an extra try/with
block around the standard lookup.=A0 I haven't made any = benchmarks, though.
I expect this code to have = little long term maintenance issues - the
major = source of code changes was adding a "* string list" to a number = of
exceptions to carry the list of possible = correct spellings to the point
they get = output by the compiler.=A0 = These exceptions are still usable as
before = with an empty list in this spot.

It's possible the code has = created opportunities for uncaught exceptions
in the compiler as I only checked for instances of = "Not_found" in a few
files -- those which dealt = with the Unbound_* exceptions.=A0 = Someone who
knows the internals better = might find places the "Found_nearly"
exception = that carries possible corrections might escape into.


Dedicated = to:
Yaron Minsky and the team at = Jane Street

E.



= --Apple-Mail-52-647866041--