From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA00542; Tue, 18 Nov 2003 17:12:49 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id RAA00810 for <caml-list@pauillac.inria.fr>; Tue, 18 Nov 2003 17:12:48 +0100 (MET)
Received: from mail1.tpgi.com.au (mail.tpgi.com.au [203.12.160.57])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hAIGCj118903
	for <caml-list@inria.fr>; Tue, 18 Nov 2003 17:12:46 +0100 (MET)
Received: from syd-ts22-2600-136.tpgi.com.au (syd-ts22-2600-136.tpgi.com.au [203.26.30.136])
	by mail1.tpgi.com.au (8.11.6/8.11.6) with ESMTP id hAIGCbh28746;
	Wed, 19 Nov 2003 03:12:38 +1100
Subject: Re: [Caml-list] GC and file descriptors
From: skaller <skaller@ozemail.com.au>
Reply-To: skaller@ozemail.com.au
To: Brian Hurt <bhurt@spnz.org>
Cc: Caml Mailing List <caml-list@inria.fr>
In-Reply-To: <Pine.LNX.4.44.0311171422420.5009-100000@localhost.localdomain>
References: <Pine.LNX.4.44.0311171422420.5009-100000@localhost.localdomain>
Content-Type: text/plain
Message-Id: <1069168323.18363.83.camel@pelican>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) 
Date: 19 Nov 2003 02:12:03 +1100
Content-Transfer-Encoding: 7bit
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 ozemail:01 arbitrarily:01 solvable:01 finalizers:01 bug:01 model:01 dependencies:01 intending:01 finalisers:01 enforcing:01 interscript:01 'src:99 endline:01 dangling:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

On Tue, 2003-11-18 at 08:20, Brian Hurt wrote:
> On 18 Nov 2003, skaller wrote:
> 

> > But it is irrelevant, because programmers do not intend to write
> > arbitrary code. They intend to write comprehensible code, whose
> > correctness is therefore necessarily easy to prove.
> 
> What they intend to do and what they do do are often completely different.  
> It is a rare programmer who *intentionally* adds bugs to his code.  Yet 
> bugs still happen.

Of course. But my point is that a lot of the arguments
I have seen that 'we cannot do that because the general
problem is NP-complete' are irrelevant.

> But my point here is that determining the *exact* time that an object 
> becomes garbage can be arbitrarily complex.  Most of the time, I agree, it 
> won't be.  

The point is though: when it matters, the onus is on
the programmer to make sure the design is able to 
determine an 'exact enough' time, using whatever
tools the language may provide.

In this case the problem is *necessarily* solvable
(the assumption being the programmer checked that it
could be done before starting).

So we don't need something that determines when
*arbitrary* objects can't be refered to synchronously,
we need something that will do that job only in
cases that are simple enough to base a design
on it.

An obvious candidate is a pure stack protocol
(automatic store) and another is a hierarchical 
system (ref count).

When there are circularities .. due to the complexity
it would be unwise to count on synchronous finalisation,
since it isn't clear when that is (by definition it
is rather hard to determine ..)

> Once you allow for the fact that finalizers/destructors may not happen in 
> a defined order or at defined times, why not go whole-hog?  

Hard question. One job a language is often required to do
is emulate the behaviour of a program written in another
language .. that's not always easy to do when the object,
type, allocation or execution models are different.


> In the most intractible cases, neither the compiler nor the programmer may 
> be able to determine when the last reference is released.  But even in the 
> simple cases, the programmer may have the intent to release the resources 
> and simply doesn't.  It's called a bug.  

I agree. Bugs happen. Perhaps I can give an example:
in Java, variables must be initialised. Unlike Ocaml though,
they do not have to be initialised at the point of
declaration. Java requires instead 'manifest initialisation':
its a compromise between initialisation at the point
of declaration like Ocaml, or arbitrary initialisation
(onus on programmer) as in C++.

The language picked a constraint that is easy to check
for both the compiler and for humans which is more
flexible than the Ocaml model, but still assures
freedom from unitialised variable errors. It still
can't handle complex cases, which can be done in C++,
but they're rarely needed because if they seem
to be needed there is a good chance the design
is flawed.

> > 
> > However, two things spring to mind. First: if close is a primitive
> > operation, we need a high level way of not only tracking
> > what to close, but also of doing that tracking efficiently.
> > 
> > Secondly, since the resource is represented in memory,
> > and much of the time the dependencies which are *not*
> > represented in memory could be, using the code
> > of the garbage collector (and not just the same algorithm),
> > makes some sense.
> 
> Why special-case close?  

I wan't intending to, just picked an example to talk about.

> It's what we've been talking about, but it could 
> just as easily be "release a mutex", etc.

Agree.


> > You are making an incorrect assumption here. You're assuming
> > that finalisers only release resources. Consider again
> > the case where the final action in generating a document
> > is to create a table of contents (know nnow that all chapters
> > are seen).

> I wouldn't put that into a destructor.  DON'T USE DESTRUCTORS FOR 
> ENFORCING THE ORDER OF OPERATIONS. 

Normally I wouldn't. But here there was no choice.
You have to understand first that Interscript is a library
with a driver program that looks like this:

	read a line
	if at end of file, delete user space else
	if the line starts with @ execute it
	otherwise if we're in tangle mode 
		write to current source file
	otheriwse we have to be in document mode
		write to current document
	repeat


Now, the user code looks like this:

	@h = tangler('src/x.ml')
	@select(h)
	print_endline "I'm an ocaml program"
	<EOF>

Do you see the problem? The open file lives
in USER space. The language syntax does not
require the user close open files, because
that would leave a dangling reference.

But the engine has no idea which things
are files that need closing, and which
are documents that need tables of contents.

The ONLY way to finalise
each object correctly is in the destructor
method (__del__ method). I could add a 
'finalise' method to each object and call it
but that would not help -- the __del__
method is precisely that anyhow,
and it gets invoked automatically so it
relieves me of the task of finding
all the objects.

You can see that the problem arises
from a design decision not to require
the USER to close files.

> If it matters when the buffers are flushed, manually flush the buffers.  

See above. It isn't always possible: the program doesn't
always know which files are open. This is quite typical
in object oriented programs -- the program doesn't know
what kinds of objects exist, there is no master
that can call the 'finalise' method.

In C++ code, particularly GUI code, callbacks often
lead to suicide 'delete this' simply because there
is no one around to delete an object.


> > 
> > On thing is for sure .. I *hate* seeing
> > "program terminated with uncaught "Not_found" exception"
> > because I do hundreds of Hashtbl.find calls....

> There is a religous debate between returning 'a and throwing an exception 
> if the item isn't found, or returning 'a option.  

I don't think it is religious. There is a genuine problem here.
I do not know how to state it exactly, but I'll try.

Checking error codes and poping the stack is not only
tedious and obscures the normal control flow,
it does so to such an extent in certain cases
as to be totally and unequivocably out of the question.
For example in math calculations you simply cannot
afford to check every single function call either
before or after invocation:

	x = (a + b/c ^ d)

for example would turn into many lines of spagetti.
I will say:

	"the error detection is too localised"

for this kind of code. On the other hand the
uncaught exception of dynamic EH clearly
shows that that mechanism leads to

	"the error detection is too global"

What alternatives are there? 

One is to  have exception specifications on functions,
but that is known not to work very well. The first
difficulty is that once you have them, they must
become part of the type system. They then 'snowball'
through the whole program (in the same way 'const'
does to C programs).

It isn't possible to deduce what exceptions can
be thrown when functions are passed as arguments
to functions, so this pollution of the type system
would also be manifest in code in the form
of annotations .. basically higher order functions
would be screwed completely by this.

So exception specifications are out for
'engineering' reasons.

Another alternative is static exception handling.
I tried that in Felix. It is very good for a
wide class of problems. Static EH implies
block structured code with the handler visible
from the throw point... its really a structured goto.

This solves many cases where we really wanted
'alternate control flow constructs' rather
than error handling, but not all. And it doesn't
work where a more dynamic transmission of
error notifications is required.

What else? Continuations? Can monads be used?
We really do need a mechanism with better
locality (easily control scope).


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners