caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: de Bruijn indices
@ 2000-10-05 23:29 Patrick M Doane
  2000-10-06  8:15 ` Markus Mottl
  0 siblings, 1 reply; 14+ messages in thread
From: Patrick M Doane @ 2000-10-05 23:29 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

> Could you provide a reference to de Bruijin indexing? I recall reading
> about it some time ago, and after having searched my books here, I
> cannot find any references to it. So I probably saw it in a paper some
> time ago... Anyway, I would be interested since I have an interpreter
> name-lookup problem heretoo.

You should take a look at Frank Pfenning's course notes for his class on
Computation and Deduction. The chapter on compilation covers the use of de
Bruijn indices as well some interesting judgements abouts the compilation
process. This chapter is located at:

  http://www.cs.cmu.edu/~fp/courses/comp-ded/notes/chap6.pdf

The rest of the course notes are excellent for those interested in
formally representing theorems about programming languages in a logical
framework. 

Patrick Doane



^ permalink raw reply	[flat|nested] 14+ messages in thread
* de Bruijn indices (Re: WWW Page of Team PLClub)
@ 2000-10-09  7:19 Eijiro Sumii
  2000-10-10 14:04 ` de Bruijn indices Gerard Huet
  0 siblings, 1 reply; 14+ messages in thread
From: Eijiro Sumii @ 2000-10-09  7:19 UTC (permalink / raw)
  To: Pierre.Weis; +Cc: Xavier.Leroy, caml-list, sumii

I don't actually use De Bruijn indices so often (because I usually use
higher-order abstract syntax whenever appropriate), but anyway...

> All this is perfectly true and correct, but in my humble opinion you
> forgot to mention the dark side of De Bruijn indices:

In my experience of writing some interpreters, compilers, type
checkers, partial evaluators, etc.,

> - they are perfect for machines but not convenient for human beings!
> (We cannot easily consider that the same variable have plenty of
> different names in an expression.)

this inconvenience can be mitigated by including _both_ a name (for
human use) and an index (for machine use) in each variable, and

> - you have to manipulate indices anyway to mimmick alpha-conversion
> when performing beta-reduction (the operation is named the ``lifting''
> of De Bruijn indices), and you know that this is at least as difficult
> and error prone as performing alpha-renaming.

such errors in lifting, shifting, etc. of De Bruijn indices are much
easier to find than those in alpha-conversion, because they lead to
immediate problems (which occurs in simple programs) rather than
subtle ones (which occurs in only programs that "shadow" variables in
particular ways).

Eijiro



^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: de Bruijn indices
@ 2000-10-10 18:09 Greg Morrisett
  2000-10-12 14:57 ` Chet Murthy
  0 siblings, 1 reply; 14+ messages in thread
From: Greg Morrisett @ 2000-10-10 18:09 UTC (permalink / raw)
  To: caml-list

Well, I can say from my own experience that de Bruijn 
isn't always best.  Neither NuPRL nor the TAL type-checker
uses indices, prefering named variables instead.  For
both, preserving original names is really useful when
debugging.  

When implementing TAL, I assumed de Bruijn would be faster,
since what we do is a lot of (a) comparisons between
terms, (b) lots of beta-reductions.  To make comparisons
go fast, it's important to make the common case pointer
equality.  Like Shao, we found hash consing to be very
effective for this.  

At first blush, it seems that using de Bruijn would be better 
in this situation, as any two alpha-equivalent terms will
hash-cons together.  However, it was fairly easy to arrange
most of the alpha-equivalent terms to use the same names
(TAL is the output of a compiler after all), so we could
get good hash-consing to start with -- all we had to do 
was preserve it.  Furthermore, there's the potential to get
more hash-consing with a named scheme.  For instance, for
(\y.\z.y z) and (\z.\y.y z), you can at least share the
"y z".  Not so for de Bruijn.

The way we made substitution work well and fast was, like
Simon, to keep track of the free variables of a term.  
(We computed this information lazily and then memoized it
in the term.)  Shao did something similar in the Flint
IL.  This had two effects:  one, we could minimize re-naming
bound variables, and two, we could cut off substitution
earlier.  I'm not sure how these optimizations translate
into a de Bruijn setting (I'm sure there's some way, but
I haven't thought it through.)

I also tried out a form of explicit substitutions with
the named scheme (Shao did this in the Flint IL), but
found that the overhead was not worth it.  I think in
part this was because we were forced to do deep
sub-type comparisons on terms, and hence, had to push
the substitutions down to the leaves.  I think if you
buy into de Bruijn, then you more or less have to buy
into explicit substitutions too.

On the other hand, we never got around to implementing
a de Bruijn version of the type-checker.  I would like
to do this someday and also try the higher-order abstract 
syntax tricks.

The folks at CMU are working on an interface that allows
you to choose which representation you want.  Like the
Flint folks and us, they hope to do some detailed
comparisons.  I really think the community needs a bunch
of "systems" paper on this topic that do a careful examination 
of the tradeoffs for important applications like type-checkers,
compilers, and theorem provers.  Like realistic techniques
for garbage collection, too much is locked in the brains of 
implementors.

-Greg



^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: de Bruijn indices
@ 2000-10-10 18:30 John R Harrison
  0 siblings, 0 replies; 14+ messages in thread
From: John R Harrison @ 2000-10-10 18:30 UTC (permalink / raw)
  To: caml-list; +Cc: John Harrison


It's been interesting for me to see a discussion of de Bruijn versus
name-carrying term representations, since it's an issue I've spent some
time thinking about and on which I've changed my opinion at least once. The
different versions of the HOL theorem proving system:

  HOL88     - http://www.cl.cam.ac.uk/Research/HVG/FTP/#hol88
  hol90     - http://www.cl.cam.ac.uk/Research/HVG/FTP/#hol90
  HOL Light - http://www.cl.cam.ac.uk/users/jrh/hol-light/index.html
  hol98     - http://www.cl.cam.ac.uk/Research/HVG/FTP/#hol98

have gone through several different term representations over the years:

  HOL88     - name-carrying
  hol90     - name-carrying at first then de Bruijn
  HOL Light - de Bruijn at first then name-carrying
  hol98     - de Bruijn at first then explicit substitutions

The original HOL88 name-carrying implementation was probably inherited
from Edinburgh or Cambridge LCF. This was quite satisfactory, but over
the years several obscure bugs were found in variable renaming which
could have caused unsoundness, an embarrassment in a system that
stresses its logical consistency. To avoid having the soundness of the
system depend on tricky programming, hol90 changed to using a de Bruijn
representation.

However, all versions of HOL maintain an abstract type of terms with a
name-carrying interface, so the de Bruijn indices must be an internal
detail that is hidden from the user. This makes de Bruijn terms grossly
inefficient for very large abstraction-rich terms. The trivial operation
of recursively descending a term can involve a quadratic amount of work,
since each time a binder is traversed, a variable must be substituted
for a de Bruijn index throughout the body. For this reason, although I
was initially enthusiastic about de Bruijn terms, I rapidly decided they
didn't fit with the HOL worldview and changed HOL Light (my small clean
reimplementation of the system in CAML Light) back to a name-carrying
approach.

Very recently, Bruno Barras has been the driving force behind yet
another term representation for hol98 that admits an efficient ML-style
reduction algorithm. See:

  @INPROCEEDINGS{tphols2000-Barras,
        crossref        = "tphols2000",
        title           = "Programming and computing in {HOL}",
        author          = "Bruno Barras",
        pages           = "17--37"}

  @PROCEEDINGS{tphols2000,
        editor          = "M. Aagaard and J. Harrison",
        booktitle       = "Theorem Proving in Higher Order Logics:
                           13th International Conference, TPHOLs 2000",
        series          = "Lecture Notes in Computer Science",
        volume          = 1869,
        year            = 2000,
        publisher       = "Springer-Verlag"}

All of this goes to show that despite intensive activity in the field,
the basic question of the most appropriate term representation -- even
for a particular narrowly focused application -- is not yet clearly
agreed on.

Cheers,

John.



^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: de Bruijn indices
@ 2000-10-11 11:26 Simon Peyton-Jones
  2000-10-11 20:12 ` Markus Mottl
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Peyton-Jones @ 2000-10-11 11:26 UTC (permalink / raw)
  To: Greg Morrisett, caml-list

| comparisons.  I really think the community needs a bunch
| of "systems" paper on this topic that do a careful examination 
| of the tradeoffs for important applications like type-checkers,
| compilers, and theorem provers.  Like realistic techniques
| for garbage collection, too much is locked in the brains of 
| implementors.

Indeed, if any participant of this discussion is so motivated, I would
dearly love to publish a paper about the practical aspects of
efficient management of substitution and bound variables in the
Journal of Functional Programming.  It doesn't have to be a "new idea".
Just articulating the unpublished experience and laying out design tradeoffs
that we've seen in this discussion would be excellent. 

Send me a paper!  (Or send it to Greg, who's also a JFP editor.)

Simon



^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: de Bruijn indices
@ 2000-10-12 17:33 Greg Morrisett
  0 siblings, 0 replies; 14+ messages in thread
From: Greg Morrisett @ 2000-10-12 17:33 UTC (permalink / raw)
  To: 'Chet Murthy'; +Cc: caml-list

> It wasn't the substitution that was killing me, but the
> alpha-conversion, and checks for alpa-conversion.

But alpha-conversion is a form of substitution...  

> Now, there _is_ possibly one way of doing explicit names that would be
> fast enough to be a contender -- where the names are chosen uniquely,
> using a global counter, so no capture check is required.  I can
> imagine that that would be fast.
> 
> Of course, it would result in uggggggly names -- uglier than deBruijn
> numbers, unless similar tricks were played in pretty-printing.

This is essentially what we did in TAL -- each variable had
a string-based name and a number field.  Numbers defaulted
to zero (and weren't printed).  When we needed to alpha-convert 
something, we incremented a global counter and used this for
the new number (keeping the string the same.)  You do end
up with foo_12349 this way, but only if something has to be
alpha-converted -- and that turned out to be pretty rare.

Again, I haven't re-implemented things with De Bruijn, and
would like too.  It may be that TAL would run faster with
a well-implemented scheme.  

Two side notes:

1. The reason I can't easily re-implement TAL using de Bruijn
instead of explicit names, is that I exposed my term datatypes
to make pattern matching easy.  If I'd been coding in something
like Java, I suspect I would've abstracted terms enough that
re-writing the code would've been easier.  One problem with
ML is that the convenience of pattern matching tends to direct
you (or at least me) to expose too much representation information.
Abstract patterns would help a lot...

2. I seem to recall that there are subtle issues with
typed calculi (especially dependently-typed calculi) and
de Bruijn?  Certainly explicit substitutions are tricky
in this setting...

-Greg



^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: de Bruijn indices
@ 2000-10-12 19:09 John R Harrison
  0 siblings, 0 replies; 14+ messages in thread
From: John R Harrison @ 2000-10-12 19:09 UTC (permalink / raw)
  To: caml-list; +Cc: John Harrison


Chet writes:

| It wasn't the substitution that was killing me, but the
| alpha-conversion, and checks for alpa-conversion.

I find that quite hard to understand. Profiles I've done in HOL Light
indicate that alpha conversion is not that common, and in the vast
majority of successful alpha-equivalence tests the two terms are
actually equal. Indeed after descending a couple of levels, subterms are
usually pointer-equal. Likewise, unsuccessful alpha-equivalence tests
almost always happen when the terms differ grossly near the top level.
Thus, in practice alpha-conversion is little done while
alpha-equivalence testing even of huge terms is not far from being
constant time. Are the typical profiles in Nuprl or Coq so very
different?

Substitution is less efficient than with dB terms, but I've found it
generally acceptable in practice, and certainly a price worth paying for
the conceptual simplicity of presenting a name-carrying interface to the
user rather than exposing floating dB indices.

| Again, I started out as an "explicit name fanatic", and was converted
| to the "creed of deBruijn indices", by hard experience.  I would
| recommend that before anyone write off deBruijn indices, they attempt
| to build a _large_ system (ok, so Coq isn't large by commercial
| standards, but by the standards of academic systems it ain't small)
| both ways.

Surely the size of the system is irrelevant: what matters is the size of
the problems it deals with. But anyway, from my own experience (and I
don't think HOL Light is smaller than Coq according to any plausible
metric) the Damascene conversion happened the other way round.

Cheers,

John.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2000-10-13 11:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-10-05 23:29 de Bruijn indices Patrick M Doane
2000-10-06  8:15 ` Markus Mottl
2000-10-09  7:19 de Bruijn indices (Re: WWW Page of Team PLClub) Eijiro Sumii
2000-10-10 14:04 ` de Bruijn indices Gerard Huet
2000-10-10 17:29   ` Chet Murthy
2000-10-11 22:35     ` John Max Skaller
2000-10-10 18:09 Greg Morrisett
2000-10-12 14:57 ` Chet Murthy
2000-10-12 18:08   ` Benjamin C. Pierce
2000-10-12 18:19   ` Trevor Jim
2000-10-10 18:30 John R Harrison
2000-10-11 11:26 Simon Peyton-Jones
2000-10-11 20:12 ` Markus Mottl
2000-10-12 17:33 Greg Morrisett
2000-10-12 19:09 John R Harrison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).