From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA00542; Tue, 18 Nov 2003 17:12:49 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id RAA00810 for ; Tue, 18 Nov 2003 17:12:48 +0100 (MET) Received: from mail1.tpgi.com.au (mail.tpgi.com.au [203.12.160.57]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hAIGCj118903 for ; Tue, 18 Nov 2003 17:12:46 +0100 (MET) Received: from syd-ts22-2600-136.tpgi.com.au (syd-ts22-2600-136.tpgi.com.au [203.26.30.136]) by mail1.tpgi.com.au (8.11.6/8.11.6) with ESMTP id hAIGCbh28746; Wed, 19 Nov 2003 03:12:38 +1100 Subject: Re: [Caml-list] GC and file descriptors From: skaller Reply-To: skaller@ozemail.com.au To: Brian Hurt Cc: Caml Mailing List In-Reply-To: References: Content-Type: text/plain Message-Id: <1069168323.18363.83.camel@pelican> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 19 Nov 2003 02:12:03 +1100 Content-Transfer-Encoding: 7bit X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 ozemail:01 arbitrarily:01 solvable:01 finalizers:01 bug:01 model:01 dependencies:01 intending:01 finalisers:01 enforcing:01 interscript:01 'src:99 endline:01 dangling:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Tue, 2003-11-18 at 08:20, Brian Hurt wrote: > On 18 Nov 2003, skaller wrote: > > > But it is irrelevant, because programmers do not intend to write > > arbitrary code. They intend to write comprehensible code, whose > > correctness is therefore necessarily easy to prove. > > What they intend to do and what they do do are often completely different. > It is a rare programmer who *intentionally* adds bugs to his code. Yet > bugs still happen. Of course. But my point is that a lot of the arguments I have seen that 'we cannot do that because the general problem is NP-complete' are irrelevant. > But my point here is that determining the *exact* time that an object > becomes garbage can be arbitrarily complex. Most of the time, I agree, it > won't be. The point is though: when it matters, the onus is on the programmer to make sure the design is able to determine an 'exact enough' time, using whatever tools the language may provide. In this case the problem is *necessarily* solvable (the assumption being the programmer checked that it could be done before starting). So we don't need something that determines when *arbitrary* objects can't be refered to synchronously, we need something that will do that job only in cases that are simple enough to base a design on it. An obvious candidate is a pure stack protocol (automatic store) and another is a hierarchical system (ref count). When there are circularities .. due to the complexity it would be unwise to count on synchronous finalisation, since it isn't clear when that is (by definition it is rather hard to determine ..) > Once you allow for the fact that finalizers/destructors may not happen in > a defined order or at defined times, why not go whole-hog? Hard question. One job a language is often required to do is emulate the behaviour of a program written in another language .. that's not always easy to do when the object, type, allocation or execution models are different. > In the most intractible cases, neither the compiler nor the programmer may > be able to determine when the last reference is released. But even in the > simple cases, the programmer may have the intent to release the resources > and simply doesn't. It's called a bug. I agree. Bugs happen. Perhaps I can give an example: in Java, variables must be initialised. Unlike Ocaml though, they do not have to be initialised at the point of declaration. Java requires instead 'manifest initialisation': its a compromise between initialisation at the point of declaration like Ocaml, or arbitrary initialisation (onus on programmer) as in C++. The language picked a constraint that is easy to check for both the compiler and for humans which is more flexible than the Ocaml model, but still assures freedom from unitialised variable errors. It still can't handle complex cases, which can be done in C++, but they're rarely needed because if they seem to be needed there is a good chance the design is flawed. > > > > However, two things spring to mind. First: if close is a primitive > > operation, we need a high level way of not only tracking > > what to close, but also of doing that tracking efficiently. > > > > Secondly, since the resource is represented in memory, > > and much of the time the dependencies which are *not* > > represented in memory could be, using the code > > of the garbage collector (and not just the same algorithm), > > makes some sense. > > Why special-case close? I wan't intending to, just picked an example to talk about. > It's what we've been talking about, but it could > just as easily be "release a mutex", etc. Agree. > > You are making an incorrect assumption here. You're assuming > > that finalisers only release resources. Consider again > > the case where the final action in generating a document > > is to create a table of contents (know nnow that all chapters > > are seen). > I wouldn't put that into a destructor. DON'T USE DESTRUCTORS FOR > ENFORCING THE ORDER OF OPERATIONS. Normally I wouldn't. But here there was no choice. You have to understand first that Interscript is a library with a driver program that looks like this: read a line if at end of file, delete user space else if the line starts with @ execute it otherwise if we're in tangle mode write to current source file otheriwse we have to be in document mode write to current document repeat Now, the user code looks like this: @h = tangler('src/x.ml') @select(h) print_endline "I'm an ocaml program" Do you see the problem? The open file lives in USER space. The language syntax does not require the user close open files, because that would leave a dangling reference. But the engine has no idea which things are files that need closing, and which are documents that need tables of contents. The ONLY way to finalise each object correctly is in the destructor method (__del__ method). I could add a 'finalise' method to each object and call it but that would not help -- the __del__ method is precisely that anyhow, and it gets invoked automatically so it relieves me of the task of finding all the objects. You can see that the problem arises from a design decision not to require the USER to close files. > If it matters when the buffers are flushed, manually flush the buffers. See above. It isn't always possible: the program doesn't always know which files are open. This is quite typical in object oriented programs -- the program doesn't know what kinds of objects exist, there is no master that can call the 'finalise' method. In C++ code, particularly GUI code, callbacks often lead to suicide 'delete this' simply because there is no one around to delete an object. > > > > On thing is for sure .. I *hate* seeing > > "program terminated with uncaught "Not_found" exception" > > because I do hundreds of Hashtbl.find calls.... > There is a religous debate between returning 'a and throwing an exception > if the item isn't found, or returning 'a option. I don't think it is religious. There is a genuine problem here. I do not know how to state it exactly, but I'll try. Checking error codes and poping the stack is not only tedious and obscures the normal control flow, it does so to such an extent in certain cases as to be totally and unequivocably out of the question. For example in math calculations you simply cannot afford to check every single function call either before or after invocation: x = (a + b/c ^ d) for example would turn into many lines of spagetti. I will say: "the error detection is too localised" for this kind of code. On the other hand the uncaught exception of dynamic EH clearly shows that that mechanism leads to "the error detection is too global" What alternatives are there? One is to have exception specifications on functions, but that is known not to work very well. The first difficulty is that once you have them, they must become part of the type system. They then 'snowball' through the whole program (in the same way 'const' does to C programs). It isn't possible to deduce what exceptions can be thrown when functions are passed as arguments to functions, so this pollution of the type system would also be manifest in code in the form of annotations .. basically higher order functions would be screwed completely by this. So exception specifications are out for 'engineering' reasons. Another alternative is static exception handling. I tried that in Felix. It is very good for a wide class of problems. Static EH implies block structured code with the handler visible from the throw point... its really a structured goto. This solves many cases where we really wanted 'alternate control flow constructs' rather than error handling, but not all. And it doesn't work where a more dynamic transmission of error notifications is required. What else? Continuations? Can monads be used? We really do need a mechanism with better locality (easily control scope). ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners