caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* RE: reference initialization
@ 2000-05-20 20:13 Simon Peyton-Jones
  2000-05-22 16:10 ` Pierre Weis
  0 siblings, 1 reply; 26+ messages in thread
From: Simon Peyton-Jones @ 2000-05-20 20:13 UTC (permalink / raw)
  To: Pierre Weis, hwxi; +Cc: caml-list

| val initialize :
|  int -> 'a -> ((int -> 'a) -> (int -> 'a -> unit) -> unit) -> 'a array
|  [initialize n x f] returns a fresh array of length [n],
|  with elements initialized by function [f].
|  All the elements of the new array must be assigned once and only
|  once by the function [f]. [f] received two functions as arguments,
|  one to access elements of the new array, and the other to set the
|  elements of the new array. [f] can access to element [i] of the new
|  array provided [f] has already properly initialized element [i].

This is all good stuff.  It might be worth mentioning Haskell's approach
in this context.  The main array former is

	array :: Ix ix => (ix,ix) -> [(ix,elt)] -> Array ix elt

'array' takes an upper and lower bound, both of type 'ix'.  ix is a type
variable that must be bound to a type in class Ix, which supports indexing
operations.  So that I can avoid discussion of type classes, let's
specialise
array much as 'initialize' is above, to vectors indexed by Ints:

	array :: (Int,Int) -> [(Int,elt)] -> Array Int elt

So array takes an uppper and lower bound, and a list of (index,value) pairs;
it returns an array with each element filled in.  The list should mention
each element just once, but that is not statically checked.

In conjunction with list comprehensions, this gives rise to quite nice code.
For example, to tabulate a function we might write

	array (1,n) [(i, f i) | i <- [1..n]]

Referring to existing elements is easy via recursion:

	fibs = arrray (1,n) ([(1,1),(2,1)] ++
			       [(i, fibs!(i-1) + fibs!(i-2)) | i <- [3..n]])

Notice how easily the boundary cases are specified.

Laziness means that the compiler does not need to figure out which
order to do the initialisation. In a strict language it would be a little
harder -- or perhaps one could say "it's done in the order the index,value
pairs are given".


You may wonder about the efficiency issue.  After all, it doesn't seem
great to build an intermediate list just before constructing an array.
But a bit of short-cut deforestation does the trick, and eliminates the
intermediate list.  I have to admit that a lot of things have to Work Right
in the compiler to really get the for-loop you intended, but that's what
compilers are for.


None of this is specific to Haskell or to a lazy language.  Caml could
well use it.  I mention it on this list becuase it's an aspect of the
Haskell design that I think has worked particularly well, and which 
might be of use to Camlers.

A lot of the benefit comes from the re-use (for arrays) of the 
list comprehension notation.  I forget whether Caml has such notation,
but again, it's nothing to do with laziness and it's a notation which
is really useful.

Simon





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-20 20:13 reference initialization Simon Peyton-Jones
@ 2000-05-22 16:10 ` Pierre Weis
  0 siblings, 0 replies; 26+ messages in thread
From: Pierre Weis @ 2000-05-22 16:10 UTC (permalink / raw)
  To: Simon Peyton-Jones; +Cc: caml-list

> So array takes an uppper and lower bound, and a list of (index,value) pairs;
> it returns an array with each element filled in.  The list should mention
> each element just once, but that is not statically checked.
> 
> In conjunction with list comprehensions, this gives rise to quite nice code.
> For example, to tabulate a function we might write
> 
> 	array (1,n) [(i, f i) | i <- [1..n]]
> 
> Referring to existing elements is easy via recursion:
> 
> 	fibs = arrray (1,n) ([(1,1),(2,1)] ++
> 			       [(i, fibs!(i-1) + fibs!(i-2)) | i <- [3..n]])
> 
> Notice how easily the boundary cases are specified.
> 
> Laziness means that the compiler does not need to figure out which
> order to do the initialisation. In a strict language it would be a little
> harder -- or perhaps one could say "it's done in the order the index,value
> pairs are given".

This would be perfectly reasonable.

> You may wonder about the efficiency issue.  After all, it doesn't seem
> great to build an intermediate list just before constructing an array.
> But a bit of short-cut deforestation does the trick, and eliminates the
> intermediate list.  I have to admit that a lot of things have to Work Right
> in the compiler to really get the for-loop you intended, but that's what
> compilers are for.

Wao! You've got a really impressive compiler.

> None of this is specific to Haskell or to a lazy language.  Caml could
> well use it.  I mention it on this list becuase it's an aspect of the
> Haskell design that I think has worked particularly well, and which 
> might be of use to Camlers.

Oups. I frankly doubt that we can define in Caml something like your fibs:

fibs =
 arrray (1,n) ([(1,1),(2,1)] ++ [(i, fibs!(i-1) + fibs!(i-2)) | i <- [3..n]])

Don't forget our evaluation regime that will try to completely
evaluate the list

([(1,1),(2,1)] ++ [(i, fibs!(i-1) + fibs!(i-2)) | i <- [3..n]])

before calling the array creation function. What's the meaning of
fibs! then, since fibs has no existance yet ?

I think that in these examples, lazy evaluation is important: the list
is lazyly evaluated, and the fibs array elements are assigned one at a
time, as soon as the corresponding pair of the list has been computed.

As far as I can imagine the way vectors are handled in a lazy
language, the compiler must ``initialize'' vectors elements with a
dummy ``undefined value'', and updates this dummy value as usual as
soon as a value is provided for the dummy.

This way you always have the ``not yet initialized'' test for free,
since attempts to access such an element will raise some ``bottom'' or
``undefined'' exception. However, for the ``completely initialized
test'' you also need to access all the elements of the new vector
(once more accessing an undefined element will automatically raise an
exception). And I suspect also that the ``initialized only once'' check
will be tricky... (or similar to an ML solution).

So, I don't know how to implement your elegant solution in a strict
language (unless by adding some lazy evaluation regime for some
functions, such as ... array!)

> A lot of the benefit comes from the re-use (for arrays) of the 
> list comprehension notation.  I forget whether Caml has such notation,
> but again, it's nothing to do with laziness and it's a notation which
> is really useful.

Yes this is really a interesting notation. However, I studied it once,
and as far as I remember I had the feeling that lazy evaluation was
also mandatory to handle ``real'' cases other than the trivial ones (such
as. [(i, f i) | i <- [1..n]] that can be easily done in Caml): complex
mutually recursive lists, where f and the bounds where functions of
other lists given in comprehension.

Also, I remember my strange feeling at reading pieces of code like:

 [i | i <- [1..2**32]]

which is not completely reasonable in a strict language.

Anyway, you're right that we could may give it a try ...

Thank you for your interesting remarks.

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-22 15:28               ` Pierre Weis
@ 2000-05-22 22:29                 ` Max Skaller
  0 siblings, 0 replies; 26+ messages in thread
From: Max Skaller @ 2000-05-22 22:29 UTC (permalink / raw)
  To: Pierre Weis; +Cc: caml-list

Pierre Weis wrote:
 
> So, adding a test to detect this case we can initialize the vector
> properly, without using Obj.magic.
> 
> exception Not_yet_initialized of int;;
> exception Already_initialized of int;;
> exception Never_initialized of int;;
> 
> let initialize n f =
>  if n = 0 then [||] else
>  let init_v = Array.make n false in
>  let v = ref [||] in
>  let get i = if init_v.(i) then !v.(i) else raise (Not_yet_initialized i) in
>  let set i ei =
>    if !v = [||] then v := Array.make n ei;

Hmmm. This should work, even if 'ei' has a finaliser or mutable
field: 'ei' isn't a 'dummy' value, but a real value that the client
wanted in the array. On the other hand, a dummy value the client
is forced to supply may have dire consequences where the type
is either a class instance , or finalised: here
either construction or destruction may have arbitrary
semantics.

So this (the code you gave) is much better than having to supply a dummy
value.

The same problem occurs with 'forced' variable initialisation:
dummy values may have unwanted side-effects.
There is a tension here, since ocaml is not a purely
functional language.

-- 
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au      -- at work
mailto:skaller@maxtal.com.au  -- at home




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-19  6:54             ` Max Skaller
@ 2000-05-22 15:28               ` Pierre Weis
  2000-05-22 22:29                 ` Max Skaller
  0 siblings, 1 reply; 26+ messages in thread
From: Pierre Weis @ 2000-05-22 15:28 UTC (permalink / raw)
  To: Max Skaller; +Cc: caml-list

> Pierre Weis wrote:
> > 
> > Finally I think I've done the ``Further work part'' I mentioned for
> > vector initialization!
> 
> > let initialize n x0 f =
> 
> x0 is both a problem and unnecesary: it is hard to pick
> a sensible x0 sometimes, and it is not necessary, since
> Obj.magic can be used internally: there is no loss of
> safety, since the code checks all usage, and such a magic
> value cannot be accessed.
>
> John (Max) Skaller at OTT [Open Telecommications Ltd]
> mailto:maxs@in.ot.com.au      -- at work
> mailto:skaller@maxtal.com.au  -- at home

You're right, getting rid of this x0 would be better. Still, I don't
understand how you can manage to write the f function if you cannot
figure out at least one random value of the type of the elements of
the vector you want f to initialize. Also, I don't like to use Obj.magic
to create an heterogeneous vector, even if I can prove that no Caml
program can observe it: it breaks some invariants that the runtime
system, the memory manager, or the debugger could observe.

However, since we know that the function f gives plenty of suitable initial
values: in particular at the first call to the set function.

So, adding a test to detect this case we can initialize the vector
properly, without using Obj.magic.

exception Not_yet_initialized of int;;
exception Already_initialized of int;;
exception Never_initialized of int;;

let initialize n f =
 if n = 0 then [||] else
 let init_v = Array.make n false in
 let v = ref [||] in
 let get i = if init_v.(i) then !v.(i) else raise (Not_yet_initialized i) in
 let set i ei =
   if !v = [||] then v := Array.make n ei;
   if not init_v.(i) then (!v.(i) <- ei; init_v.(i) <- true) else
   raise (Already_initialized i) in
 (f get set : unit);
 for i = 0 to n - 1 do if not init_v.(i) then raise (Never_initialized i) done;
 !v;;

(*
val initialize :
 int -> ((int -> 'a) -> (int -> 'a -> unit) -> unit) -> 'a array
 [initialize n f] returns a fresh array of length [n],
 with elements initialized by function [f].
 All the elements of the new array must be assigned once and only
 once by the function [f]. [f] received two functions as arguments,
 one to access elements of the new array, and the other to set the
 elements of the new array. [f] can access to element [i] of the new
 array provided [f] has already properly initialized element [i].
 Raise [Not_yet_initialized i] if element [i] is accessed before being
 assigned.
 Raise [Already_initialized i] if element [i] is assigned twice.
 Raise [Never_initialized i] if element [i] has never been assigned at
 the end of initialization.
 [Array.initialize n f] uses [2 n] words of heap space.
*)

Thanks for your simulating remark.

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-18 16:16           ` Pierre Weis
@ 2000-05-19  6:54             ` Max Skaller
  2000-05-22 15:28               ` Pierre Weis
  0 siblings, 1 reply; 26+ messages in thread
From: Max Skaller @ 2000-05-19  6:54 UTC (permalink / raw)
  To: Pierre Weis; +Cc: Hongwei Xi, caml-list

Pierre Weis wrote:
> 
> Finally I think I've done the ``Further work part'' I mentioned for
> vector initialization!

> let initialize n x0 f =

x0 is both a problem and unnecesary: it is hard to pick
a sensible x0 sometimes, and it is not necessary, since
Obj.magic can be used internally: there is no loss of
safety, since the code checks all usage, and such a magic
value cannot be accessed.


-- 
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au      -- at work
mailto:skaller@maxtal.com.au  -- at home




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-16  2:53         ` Hongwei Xi
@ 2000-05-18 16:16           ` Pierre Weis
  2000-05-19  6:54             ` Max Skaller
  0 siblings, 1 reply; 26+ messages in thread
From: Pierre Weis @ 2000-05-18 16:16 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

Finally I think I've done the ``Further work part'' I mentioned for
vector initialization!

We end up with a new initialization function for arrays that ensures
the proper and unique initialisation of each element of the array, but
is simpler to use and understand:

(*
val initialize :
 int -> 'a -> ((int -> 'a) -> (int -> 'a -> unit) -> unit) -> 'a array
 [initialize n x f] returns a fresh array of length [n],
 with elements initialized by function [f].
 All the elements of the new array must be assigned once and only
 once by the function [f]. [f] received two functions as arguments,
 one to access elements of the new array, and the other to set the
 elements of the new array. [f] can access to element [i] of the new
 array provided [f] has already properly initialized element [i].
 Raise [Not_yet_initialized i] if element [i] is accessed before being
 assigned.
 Raise [Already_initialized i] if element [i] is assigned twice.
 Raise [Never_initialized i] if element [i] has never been assigned at
 the end of initialization.
 [Array.initialize n x f] uses [2 n] words of heap space.
*)
exception Not_yet_initialized of int;;
exception Already_initialized of int;;
exception Never_initialized of int;;

let initialize n x0 f =
 if n = 0 then [||] else
 let init_v = Array.make n false in
 let v = Array.make n x0 in
 let get i = if init_v.(i) then v.(i) else raise (Not_yet_initialized i) in
 let set i ei =
   if not init_v.(i) then (v.(i) <- ei; init_v.(i) <- true) else
   raise (Already_initialized i) in
 (f get set : unit);
 for i = 0 to n - 1 do if not init_v.(i) then raise (Never_initialized i) done;
 v;;

The examples can now be written more naturally, with a minimum of
modification (beside correction of original bugs in chooses) :

let fibs n =
 let init_fibs get set =
   set 0 1; set 1 1;
   for i = 2 to n - 1 do
    set i (get (i - 1) + get (i - 2))
   done in
 initialize n 0 init_fibs;;

let chooses n =
 let init_chooses get set =
  let set_ei i ei =
    set i ei;
    if n - i <> i then set (n - i) ei in
  set_ei 0 1;
  for i = 1 to n / 2 do
   set_ei i (get (i - 1) * (n - i + 1) / i)
  done in
 initialize (n + 1) 0 init_chooses;;

let invs f n =
 let init_inv get set =
  for i = 0 to n - 1 do set (f i) i done in
 initialize n 0 init_inv;;

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-15 21:33       ` Pierre Weis
@ 2000-05-16  2:53         ` Hongwei Xi
  2000-05-18 16:16           ` Pierre Weis
  0 siblings, 1 reply; 26+ messages in thread
From: Hongwei Xi @ 2000-05-16  2:53 UTC (permalink / raw)
  To: Pierre Weis; +Cc: caml-list

> When you need more complex initializations, then you need a smarter
> version of Array.init (still you want to keep the same good properties
> as the simple version).
> 
> For instance, a simple generalization of Array.init, that allows
> access to the locations that are already initialized into the array,
> could be written as:
> 
>   exception Uninitialized_access of int;;
> 
>   let initialize f n =
>    if n = 0 then [||] else
>    let get v i j =
>     if j < i && j >= 0 then v.(j) else raise (Uninitialized_access j) in
>    let v = Array.make n (f (get [||] 0) 0) in
>    for i = 1 to n - 1 do
>     v.(i) <- f (get v i) i
>    done;
>    v;;
> 
> Using initialize, we can easily define a vector containing the
> fibonacci numbers:
> 
>   let init_fib g = function
>     | 0 | 1 -> 1
>     | n -> g (n - 1) + g (n - 2);;
> 
>   let fib10 = initialize init_fib 10;;
>   val fib10 : int array = [|1; 1; 2; 3; 5; 8; 13; 21; 34; 55|]

This looks pretty neat to me. Probably 'initialize' can be implemented
in C so that we can squeeze out some extra efficiency.

> Initialize provides the following properties:
> 
> If v is defined as ``let v = Array.initialize f n'' then
> 
>   1) v is completely initialized:
>      for all i, v.(i) has been assigned exactly once during the
>      initialization of v
>   2) not yet initialized locations in v are never accessed during the
>      initialization process:
>      for all i, for all j, if j > i then the location j is not accessed during
>      the initialization of v.(i)
>   3) elements of v are values computed by f:
>      for all i, v.(i) = f (fun i -> (Array.sub v 0 (i - 1)).(i)) (i)
> 
>   Once more there is no need to dynamically check accesses to a vector
>   initialized unsing the function initialize.

Yes, I thought along very much the same line.

> 3) New random_initialize function for ``random'' initialization
> ----------------------------------------------------------------
> 
> Now, you can still argue that you have even more complex
> initialisation schemes: then you need to define a more complex
> initialization function. For instance:
> 
> [random_initialize f i0 n] returns a vector [v] of length [n],
> initialized using the function [f], starting from index [i0]. The
> function [f] takes two arguments, an access function [g : int -> 'a]
> that gives access to already initialized elements of the vector [v],
> and an index [i]; given those arguments [f] returns a pair [next, a],
> where [a] is the value to be stored at index [i] of vector [v], and
> [next] the index of the next element of [v] to initialize. [f] should
> raise the predefined exception [Exit] to mean that index [i] is out of
> range, presumably at the end of initialization.
> 
> The following properties hold:
> 
> If v is defined as ``let v = random_initialize f i0 n'' then
> 
>   1) v is completely initialized:
>      for all i, v.(i) has been assigned exactly once during the
>      initialization of v
>   2) not yet initialized locations in v are never accessed during the
>      initialization process:
>      for all i, for all j, if v.(j) has not yet been assigned then the
>      location j is not accessed during the initialization of v.(i)
>   3) all elements of v are values computed by calls to f.
> 
> Implementation
> 
>   (* Already_initialized is raised when there is an attempt to
>      initialize an array location more than once. *)
>   exception Already_initialized of int;;
> 
>   (* Partial_initialization is raised when there is a location that is
>      not initialized at the end of initilization. *)
>   exception Partial_initialization of int;;
> 
>   let random_initialize f i0 n =
>    if n = 0 then [||] else
>    let initialized = Array.make n 0 in
>    let set v i a =
>     if initialized.(i) = 0 then (initialized.(i) <- 1; v.(i) <- a)
>     else raise (Already_initialized i) in
>    let get v j =
>     if j >= 0 && j < n && initialized.(j) = 1 then v.(j)
>     else raise (Uninitialized_access j) in
>    let nexti0, a0 = f (get [||]) i0 in
>    let v = Array.make n a0 in
>    set v 0 a0;
>    let rec init i =
>     let nexti, ai = f (get v) i in
>     set v i ai;
>     init nexti in
>    try init nexti0 with
>    | Exit ->
>       for i = 0 to n - 1 do
>        if initialized.(i) = 0 then raise (Partial_initialization i)
>       done;
>       Array.init n (fun i -> v.(i));;
>
> Example: we can define a suitable initialization function for your
> example of combinatorial numbers,
> 
>   let init_chooses n get i =
>    let next i j = if j <> i then j else n + 1 in
>    let ai i = get (i - 1) * (n - i + 1) / i in
>    if i = 0 then (next 0 n, 1) else
>    if i = n then (next n 1, 1) else
>    if i > n then raise Exit else 
>    if i <= n / 2 then (next i (n - i), ai i)
>    else (next i (n - i + 1), get (n - i));;
> 
>   let chooses n = random_initialize (init_chooses n) 0 (n + 1);;
> 
>   let v4 = chooses 4;;
>   val v4 : int array = [|1; 4; 6; 4; 1|]

I don't know about this. It seems a bit too involved to me
(at this moment)

> 4) Further work
> ---------------
> 
> The general vector initialization function should probably take as
> argument f a function that uses 2 functions get and set to handle the
> initialized vector. In any case, the properties 1, 2, and 3 should
> still hold.
> 
> 5) Conclusion
> -------------
> 
> I'm not sure we need a complex additional machinery to solve a problem
> that can be elegantly solved with some library functions (admittedly
> at the price of some linear additional cost at initialization time);

Me neither (at this moment).

> however, following Xavier, I would also be glad to read articles on
> dependant type systems and their applications to vector initialisation
> (if any);

Sorry. I don't have any paper on this subject :-)

> also, I would be delighted to try such an experimental type
> system, if it is tractable in practice (I mean, we don't want a type
> system that obliges the programmer to rewrite his pattern matchings in
> a strange and awful maneer, nor a type system that emits error
> messages that are incredibly more difficult to understand than those
> of usual ML typecheckers, nor a type system that is so slow that
> modern machines are turned into good old Apple IIs, don't we ?).

These are very legitimate points. The design of DML actually
addresses all these points in certain ways as our goal is also
towards having a practical programming system.

> I mean, we don't want a type system that obliges the programmer to
> rewrite his pattern matchings in a strange and awful maneer

The problem here is that unlike ML, sequentiality of pattern
matching must be taken into consideration in DML, but this can
be done automatically using Laville's approach (with some
modification).

> type system that emits error messages that are incredibly more
> difficult to understand than those of usual ML typecheckers

I find that in practice, the location of a type error in a DML
program is reported very accurately (mainly because that the user
has to provide dependent type annotations). Unless one uses the
type system to encode sophisticated information, the error messages
are not so hard to understand. Of course, one needs to gain some
exprience first but this is the same thing with ML.

> nor a type system that is so slow that modern machines are turned
> into good old Apple IIs, don't we ?).

Yes, type-checking can be slow, but this is also the case in ML.
I recently used Okasaki's parsing combinators to implement
parsers; I noticed that ML type-checking in this case is
indeed much slower (because larger higher-order types are involved)
than average cases.

A strong point of DML is its compatibility with ML. If you don't
use dependent types, you won't pay the price (no type-checking
slowdown). One can essentially just build a front-end. For OCaml,
one may simply first try whether it is effective to eliminate run-
time array bound checks; if it is good, further work can be done to
support dependent refinement types; if it is not, then it is nothing
more than a front-end. We need to make *no* changes to OCaml's backend!

> also, I would be delighted to try such an experimental type system

I would be happy to write such a front-end for OCaml, but I have
some serious difficulties:

(1) I am no expert of the OCaml front-end and often have
serious difficulty figuring out the code, which is neat but
has almost none comments.

(2) The syntax for OCaml is evolving so fast recently; this makes
it too difficult for me to support such a front-end even if it is
written.

If these problems can be addressed, such a front-end can be
quickly constructed (my estimate: no more than 3 person * months
work).

Best Regards,

--Hongwei



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-15 17:47     ` Hongwei Xi
  2000-05-15 21:33       ` Pierre Weis
@ 2000-05-15 22:20       ` Dave Mason
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Mason @ 2000-05-15 22:20 UTC (permalink / raw)
  To: caml-list

>>>>> On Mon, 15 May 2000 13:47:15 -0400 (EDT), Hongwei  Xi <hwxi@ECECS.UC.EDU> said:


> What I had in mind is actually something pretty simple (since it
> won't be very useful if it is not simple :-))

> Is it possible to have something like the following in the library:

I propose something more like:

    Array.init_with_array: int -> (int -> (int -> 'a) -> 'a) -> 'a Array

let Array.init_with_array n f =
  let a = Array.create n in
    for i = 0 to n - 1 do
      a.(i) <- f i (fun j -> if j<i then
                                a.(i)
                             else raise ``initialization error'')
    done

The second parameter could instead be a version of the array but with
the bound limited to 0..i, but this would allow the function f to save
that version somewhere which might not be good.

This would solve an additional subset of the cases... but is NOT a
complete solution (personally, I'm fine with the initial value).

../Dave



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-15 17:47     ` Hongwei Xi
@ 2000-05-15 21:33       ` Pierre Weis
  2000-05-16  2:53         ` Hongwei Xi
  2000-05-15 22:20       ` Dave Mason
  1 sibling, 1 reply; 26+ messages in thread
From: Pierre Weis @ 2000-05-15 21:33 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

I think you're right to consider proper initialization of vectors as a
desirable property.

I also agree with you that initializing a vector with a dummy value is
not a good practice and not an acceptable solution. However, the
Java's way of testing further accesses to elements of vectors does not
seem to be more appealing, since it does not ensure proper
initialization of vectors, and thus spurious exceptions, due to access
into improperly initialized vectors, can occur at anytime after the
creation of the faulty vector.

The Caml way to handle this kind of problems is to provide automatic
means to ensure maximum safety, in this case we must warranty proper
initialization of vectors. I introduced Array.init to address this
issue; however, as we already know, Array.init has some limitations;
in the following, I propose other initialization functions to overcome these
limitations, while still providing safety of vector initializations.

1) Array.init
-------------

Array.init is simple to use and convenient for simple initializations
(when the value to assign to location i can be expressed as a function
of i). Safety is provided by properties automatically ensured by
Array.init, such as:

If v is defined as ``let v = Array.init f n'' then

  1) v is completely initialized:
     for all i, v.(i) has been assigned exactly once during the
     initialization of v 
  2) elements of v are values computed by f:
     for all i, v.(i) = f (i)

  In addition, if v is defined with Array.init there is no need to
  perform runtime tests when accessing items inside vector v: v is
  properly initialized.

2) New initialize function
--------------------------

When you need more complex initializations, then you need a smarter
version of Array.init (still you want to keep the same good properties
as the simple version).

For instance, a simple generalization of Array.init, that allows
access to the locations that are already initialized into the array,
could be written as:

  exception Uninitialized_access of int;;

  let initialize f n =
   if n = 0 then [||] else
   let get v i j =
    if j < i && j >= 0 then v.(j) else raise (Uninitialized_access j) in
   let v = Array.make n (f (get [||] 0) 0) in
   for i = 1 to n - 1 do
    v.(i) <- f (get v i) i
   done;
   v;;

Using initialize, we can easily define a vector containing the
fibonacci numbers:

  let init_fib g = function
    | 0 | 1 -> 1
    | n -> g (n - 1) + g (n - 2);;

  let fib10 = initialize init_fib 10;;
  val fib10 : int array = [|1; 1; 2; 3; 5; 8; 13; 21; 34; 55|]

Initialize provides the following properties:

If v is defined as ``let v = Array.initialize f n'' then

  1) v is completely initialized:
     for all i, v.(i) has been assigned exactly once during the
     initialization of v
  2) not yet initialized locations in v are never accessed during the
     initialization process:
     for all i, for all j, if j > i then the location j is not accessed during
     the initialization of v.(i)
  3) elements of v are values computed by f:
     for all i, v.(i) = f (fun i -> (Array.sub v 0 (i - 1)).(i)) (i)

  Once more there is no need to dynamically check accesses to a vector
  initialized unsing the function initialize.

Evidently, if you want to initialize the other way round, you may
define the corresponding function, say initialize_down:

  let initialize_down f n =
   if n = 0 then [||] else
   let get v i j =
    if j > i && j < n then v.(j) else raise (Uninitialized_access j) in
   let v = Array.make n (f (get [||] 0) 0) in
   for i = n - 1 to 1 do
    v.(i) <- f (get v i) i
   done;
   v;;

3) New random_initialize function for ``random'' initialization
----------------------------------------------------------------

Now, you can still argue that you have even more complex
initialisation schemes: then you need to define a more complex
initialization function. For instance:

[random_initialize f i0 n] returns a vector [v] of length [n],
initialized using the function [f], starting from index [i0]. The
function [f] takes two arguments, an access function [g : int -> 'a]
that gives access to already initialized elements of the vector [v],
and an index [i]; given those arguments [f] returns a pair [next, a],
where [a] is the value to be stored at index [i] of vector [v], and
[next] the index of the next element of [v] to initialize. [f] should
raise the predefined exception [Exit] to mean that index [i] is out of
range, presumably at the end of initialization.

The following properties hold:

If v is defined as ``let v = random_initialize f i0 n'' then

  1) v is completely initialized:
     for all i, v.(i) has been assigned exactly once during the
     initialization of v
  2) not yet initialized locations in v are never accessed during the
     initialization process:
     for all i, for all j, if v.(j) has not yet been assigned then the
     location j is not accessed during the initialization of v.(i)
  3) all elements of v are values computed by calls to f.

Implementation

  (* Already_initialized is raised when there is an attempt to
     initialize an array location more than once. *)
  exception Already_initialized of int;;

  (* Partial_initialization is raised when there is a location that is
     not initialized at the end of initilization. *)
  exception Partial_initialization of int;;

  let random_initialize f i0 n =
   if n = 0 then [||] else
   let initialized = Array.make n 0 in
   let set v i a =
    if initialized.(i) = 0 then (initialized.(i) <- 1; v.(i) <- a)
    else raise (Already_initialized i) in
   let get v j =
    if j >= 0 && j < n && initialized.(j) = 1 then v.(j)
    else raise (Uninitialized_access j) in
   let nexti0, a0 = f (get [||]) i0 in
   let v = Array.make n a0 in
   set v 0 a0;
   let rec init i =
    let nexti, ai = f (get v) i in
    set v i ai;
    init nexti in
   try init nexti0 with
   | Exit ->
      for i = 0 to n - 1 do
       if initialized.(i) = 0 then raise (Partial_initialization i)
      done;
      Array.init n (fun i -> v.(i));;

Example: we can define a suitable initialization function for your
example of combinatorial numbers,

  let init_chooses n get i =
   let next i j = if j <> i then j else n + 1 in
   let ai i = get (i - 1) * (n - i + 1) / i in
   if i = 0 then (next 0 n, 1) else
   if i = n then (next n 1, 1) else
   if i > n then raise Exit else 
   if i <= n / 2 then (next i (n - i), ai i)
   else (next i (n - i + 1), get (n - i));;

  let chooses n = random_initialize (init_chooses n) 0 (n + 1);;

  let v4 = chooses 4;;
  val v4 : int array = [|1; 4; 6; 4; 1|]

4) Further work
---------------

The general vector initialization function should probably take as
argument f a function that uses 2 functions get and set to handle the
initialized vector. In any case, the properties 1, 2, and 3 should
still hold.

5) Conclusion
-------------

I'm not sure we need a complex additional machinery to solve a problem
that can be elegantly solved with some library functions (admittedly
at the price of some linear additional cost at initialization time);
however, following Xavier, I would also be glad to read articles on
dependant type systems and their applications to vector initialisation
(if any); also, I would be delighted to try such an experimental type
system, if it is tractable in practice (I mean, we don't want a type
system that obliges the programmer to rewrite his pattern matchings in
a strange and awful maneer, nor a type system that emits error
messages that are incredibly more difficult to understand than those
of usual ML typecheckers, nor a type system that is so slow that
modern machines are turned into good old Apple IIs, don't we ?).

Best regards,

-- 
Pierre Weis



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-15  6:58           ` Max Skaller
@ 2000-05-15 17:56             ` Hongwei Xi
  0 siblings, 0 replies; 26+ messages in thread
From: Hongwei Xi @ 2000-05-15 17:56 UTC (permalink / raw)
  To: Max Skaller; +Cc: Pierre Weis, caml-list

On Mon, 15 May 2000, Max Skaller wrote:

> Hongwei Xi wrote:
> 
> > Certainly, we can replace ? with 0. But what is really achieved?
> > I would say it is simply an illusion that a program is made safer
> > by initializing each array upon its allocation. 
> 
> What's happening here is this: ocaml is basically a _functional_
> programming language. In such a language there is no such thing
> as a variable, _everything_ is a constant. In this view,
> the notion that there can be an uninitialised variable
> is absurd, since there are no variables!

It is not absurd. You may think an uninitialized value having
type (exists alpha.alpha), or type 'top' by making every type a
subtype of 'top'.

--Hongwei




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-15  8:49   ` Xavier Leroy
@ 2000-05-15 17:47     ` Hongwei Xi
  2000-05-15 21:33       ` Pierre Weis
  2000-05-15 22:20       ` Dave Mason
  0 siblings, 2 replies; 26+ messages in thread
From: Hongwei Xi @ 2000-05-15 17:47 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: 'caml-list@inria.fr'

What I had in mind is actually something pretty simple (since it won't
be very useful if it is not simple :-))

> My gut feeling about this approach is that the type system could
> probably work well for arrays that are initialized linearly, e.g.
> 
>         let a = Array.create n in
>         for i = 0 to n - 1 do
>           a.(i) <- ...
>           (* at this point, the type system knows that 
>              a.(0)...a.(i-1) are initialized and
>              a.(i)...a.(n-1) are not *)
>         done
>         (* at this point, the type system knowns that all elements of a
>            are initialized *)
> 
> But notice that most of these cases are easily expressed using Array.init!

But one common case is not covered: when you initialize A[i], you may
need values stored in A[j] for some 0 <= j < i. Is it possible to make
'init' handle this case as well. I must say that I have problems writing
such a function. This is certainly a problem that people who are
interested in generic programming should study (if it has not be
studied yet).

> However, the type system is going to break horribly on more complex
> initialization patterns, e.g. the following code for tabulating the
> inverse of a permutation f over [0...n-1] :
> 
>         let a = Array.create n in
>         for i = 0 to n - 1 do a.(f(i)) <- i done
> 
> So, I don't think the (Caml) programmer will gain much from a type
> system/static analysis for array initialization.  (For a lower-level
> language such as TAL, the story may be different, though.)

In this case, I could imagine that there are programmers who 
would like to verify that this code indeed initialize every
array cell; this is clearly a case where initialization upon
allocation doesn't make much sense.

Is it possible to have something like the following in the library:

Array.init': int -> (int -> (int * 'a)) -> 'a Array

let Array.init' n f =
  let a = Array.create n in
  for i = 0 to n - 1 do
    let (j, v) = f i
    in a.(j) <- v
  done
  (* then check that all cells are initilized *)




^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: reference initialization
  2000-05-12 20:38 ` Hongwei Xi
  2000-05-15  8:49   ` Xavier Leroy
@ 2000-05-15  9:36   ` Eijiro Sumii
  1 sibling, 0 replies; 26+ messages in thread
From: Eijiro Sumii @ 2000-05-15  9:36 UTC (permalink / raw)
  To: caml-list; +Cc: sumii

> Yes, in theory, it requires null check at every use. However,
> I assume that a marjority of such null checks can be readily
> eliminated using flow analysis,

I'm afraid that this is not so easy, especially under separate
compilation.  For example, if we want to separately-compile a function
like "fun ref x -> !x + 1" (in the ocaml syntax), and if x could be
null, then the null check elimination would be hard.  Such situations
seem ubiquitous in practice.

On the other hand, ML types work as an interface to specify whether x
is always initialized ("int ref") or not ("int option ref").  And
static typing enforces such a protocol between modules.

> Note that Java is imperative, which makes flow analysis easier
> (compared to ML).

I wonder how flow analysis can be easier in Java than in ML.  While ML
has higher-order functions, Java has inner classes.

> A common saying is that a program is made safer if references
> are always initialized. I find this is contrary to the truth.
> In this case, a program is made safer if you insert run-time
> checks (and rely on a compiler to remove redundant checks)
> 
> If I use 'get' and 'set', is there a compiler to eliminate such
> checks (i.e., after 'set', 'get' should do no tag checking)?
> 
> Yes, ML allows the programmer to tag values (using option
> types) and thus shifts the burden to the programmer. In Java,
> this is done automatically (and we can rely on a compiler to
> remove redundant null checks).

ML could also do it automatically, by analyzing whether a value of an
"option" type always has the "Some" tag.  The analysis would be
similar to standard flow analyses or type-based analyses.

// Eijiro Sumii <sumii@saul.cis.upenn.edu>
//
// currently visiting: Department of Computer and Information Science,
// School of Engineering and Applied Science, University of Pennsylvania



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-12 20:38 ` Hongwei Xi
@ 2000-05-15  8:49   ` Xavier Leroy
  2000-05-15 17:47     ` Hongwei Xi
  2000-05-15  9:36   ` Eijiro Sumii
  1 sibling, 1 reply; 26+ messages in thread
From: Xavier Leroy @ 2000-05-15  8:49 UTC (permalink / raw)
  To: Hongwei Xi, Stephanie Weirich; +Cc: 'caml-list@inria.fr'

> [The Java solution with null pointers]
> Yes, in theory, it requires null check at every use. However,
> I assume that a marjority of such null checks can be readily
> eliminated using flow analysis, though things can become difficult
> when arrays are involved. Note that Java is imperative, which
> makes flow analysis easier (compared to ML).

Don't count too much on the Java compiler being able to eliminate null
checks.  I don't think today's Java compiler technology is good enough
to do that.  Also:

- To eliminate null checks efficiently requires global (whole program)
analysis, which is nearly unapplicable to Java due to dynamic class
loading.
- To eliminate null checks for pointers fetched from an array requires
very subtle analyses, e.g. you need to keep track of which array
elements were initialized with non-null references and which elements
still hold the default null value.  (See below.)
- Those null checks work for arrays of objects, but not for arrays of
numerical quantities, which are initialized with default values just
like in ML.

This whole discussion seems to be going in circles.  If you want the
additional safety of run-time checking for uninitialized array
entries, you can get it in Caml by using option types, at a
performance cost.  If you'd rather avoid the performance cost, you
have to be a little more careful in your coding.  Finally, in many
cases you can have your cake and it eat too by avoiding low-level
operations such as "allocate array then fill it" and using
higher-level operations such as "tabulate this function in an array"
(a la Array.init).

Knowing the background of Hongwei Xi, I think I've guessed where he is
getting at: one could envision a refined type system for arrays that
keep track of (a conservative estimate of) the indices of elements
that have been initialized.  TAL does it for heap-allocated tuples,
and it would be interesting to see whether DML-style dependent types
allow such a type-checking for the more general case of arrays.
So, Hongwei, we'll read your next paper on this topic with interest :-)

My gut feeling about this approach is that the type system could
probably work well for arrays that are initialized linearly, e.g.

        let a = Array.create n in
        for i = 0 to n - 1 do
          a.(i) <- ...
          (* at this point, the type system knows that 
             a.(0)...a.(i-1) are initialized and
             a.(i)...a.(n-1) are not *)
        done
        (* at this point, the type system knowns that all elements of a
           are initialized *)

But notice that most of these cases are easily expressed using Array.init!

However, the type system is going to break horribly on more complex
initialization patterns, e.g. the following code for tabulating the
inverse of a permutation f over [0...n-1] :

        let a = Array.create n in
        for i = 0 to n - 1 do a.(f(i)) <- i done

So, I don't think the (Caml) programmer will gain much from a type
system/static analysis for array initialization.  (For a lower-level
language such as TAL, the story may be different, though.)

- Xavier Leroy



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-12 19:59         ` Hongwei Xi
@ 2000-05-15  6:58           ` Max Skaller
  2000-05-15 17:56             ` Hongwei Xi
  0 siblings, 1 reply; 26+ messages in thread
From: Max Skaller @ 2000-05-15  6:58 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: Pierre Weis, caml-list

Hongwei Xi wrote:

> Certainly, we can replace ? with 0. But what is really achieved?
> I would say it is simply an illusion that a program is made safer
> by initializing each array upon its allocation. 

What's happening here is this: ocaml is basically a _functional_
programming language. In such a language there is no such thing
as a variable, _everything_ is a constant. In this view,
the notion that there can be an uninitialised variable
is absurd, since there are no variables!

This is not the case for procedural programming, IHMO.
But the framework of ocaml is functional first, with
adaptions for procedural programming. In particular,
because of the way the underlying run-time system works,
uninitialised pointers would cause the garbage collector to core dump.
(and almost all ocaml values are represented by pointers).


-- 
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au      -- at work
mailto:skaller@maxtal.com.au  -- at home



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-12 17:07       ` Pierre Weis
  2000-05-12 19:59         ` Hongwei Xi
@ 2000-05-14 14:37         ` John Max Skaller
  1 sibling, 0 replies; 26+ messages in thread
From: John Max Skaller @ 2000-05-14 14:37 UTC (permalink / raw)
  To: Pierre Weis; +Cc: Hongwei Xi, caml-list

Pierre Weis wrote:
 
> You should consider that there is a general initialisation function in
> the Array module, named Array.init, that allocates for you a fresh
> array then fill it with values obtained by calling an arbitrary
> supplied function:
> 
> # Array.init;;
> - : int -> f:(int -> 'a) -> 'a array = <fun>
> 
> Using it, you don't need to bother with any dummy initialization value:
> 
> let combine_arrays a b =
>   let alen = Array.length a in
>   let blen = Array.length b in
>   let init i = if i < alen then a.(i) else b.(i) in
>   Array.init (alen + blen) init
> 
> This code ensures the ``always initialized strategy'' of ML, and seems
> to me elegant and clear (but note that it uses higher-order
> functionality). Have I missed something ?

I think so: the init function is a callback; code which
'naturally' initialises an array by storing into it needs
to be 'control inverted': in general this is very hard,
if not impossible, to do by hand. The general case requires
the client to write an 'interpreter' for the initialisation
algorithm, maintaining state in the callback's closure.

If we consider a simple initialisation like:

	for i=0 to n-1 do x.(i) <- i done

the control inverse is

	fun i -> i

which is simple. However, for more complex algorithms,
such as a sort, this seems non-trivial. For example,
the naive functional solution to fibonacci:

	let rec fib i = match i with 
		| 0 | 1 -> 1 
		| _ -> fib (i-1) + fib (i-2)

works, but is unacceptably inefficient. The actual control
inverse of the usual procedural array initialisation algorithm,
which keeps track of the last two fibonacci numbers, requires
a wrapper function to create the storage in a closure:

	let fib_creator () =
		let a = ref 1 and b = ref 1 in
		let fib i = (* ignore the i *)
			let result = !a + !b in
			b := !a; a := result;
			result
		in fib

Here, fib_creator is a lazy data structure, and fib an iterator
over it. A 'purer' solution could be constructed
using streams I think.

I have written a tool which performs control inversion mechanically,
although in a different context (procedural code 'reading' messages
is control inverted to be event driven). There is some sort of
deep duality theorem here.

But I think the effect is: code which is easy to write procedurally
must be converted to use continuation passing, and that isn't
a trivial task.

[On the other hand, it is usually easy to write a dispatcher for
callback driven code, provided it is the control inverse of
a single thread of control]

So I think you are missing something: iter provides a reasonable
solution for only 'half' of those problems where the callback
driven style is 'natural'. In ocaml, the only way I'm aware of
to encode continuations 'naturally' requires using (hardware
or bytecode) threads.

[BTW: if we had this duality theorem, we would know how to
mix functional and stateful code in the same language seamlessly:
it would seem Charity offers some hints here]

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-11 18:59     ` Hongwei Xi
  2000-05-12 17:07       ` Pierre Weis
  2000-05-13  7:07       ` Daniel de Rauglaudre
@ 2000-05-13  7:09       ` Daniel de Rauglaudre
  2 siblings, 0 replies; 26+ messages in thread
From: Daniel de Rauglaudre @ 2000-05-13  7:09 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

On Thu, May 11, 2000 at 02:59:16PM -0400, Hongwei Xi wrote:

> I want to combine two arrays into one. Here is the code in OCaml.
>...

Woops, I forgot another solution:

==========================
let combine_arrays = Array.append
==========================

-- 
Daniel de RAUGLAUDRE
daniel.de_rauglaudre@inria.fr
http://cristal.inria.fr/~ddr/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-11 18:59     ` Hongwei Xi
  2000-05-12 17:07       ` Pierre Weis
@ 2000-05-13  7:07       ` Daniel de Rauglaudre
  2000-05-13  7:09       ` Daniel de Rauglaudre
  2 siblings, 0 replies; 26+ messages in thread
From: Daniel de Rauglaudre @ 2000-05-13  7:07 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

On Thu, May 11, 2000 at 02:59:16PM -0400, Hongwei Xi wrote:

> I want to combine two arrays into one. Here is the code in OCaml.
>  ...

Simple (library) solution:
=======================
let combine_arrays a b =
  let alen = Array.length a in
  let blen = Array.length b in
  Array.init (alen + blen)
    (fun i -> if i < alen then a.(i) else b.(i - alen))
=======================

Simple (custom) solution:
=======================
let combine_arrays a b =
  let alen = Array.length a in
  let blen = Array.length b in
  if alen = 0 && blen = 0 then [| |]
  else
    let c = Array.make (alen + blen) (if alen = 0 then b.(0) else a.(0)) in
    in begin
      for i = 0 to alen - 1 do
	c.(i) <- a.(i)
      done;
      for i = 0 to blen -1 do
	c.(alen + i) <- b.(i)
      done
    end
=======================

Dirty solution (don't tell them I told you):
=======================
let combine_arrays a b =
  let alen = Array.length a in
  let blen = Array.length b in
  let c = Array.make (alen + blen) (Obj.magic 0)
  in begin
    for i = 0 to alen - 1 do
      c.(i) <- a.(i)
    done;
    for i = 0 to blen -1 do
      c.(alen + i) <- b.(i)
    done
  end
=======================

-- 
Daniel de RAUGLAUDRE
daniel.de_rauglaudre@inria.fr
http://cristal.inria.fr/~ddr/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: reference initialization
  2000-05-11 14:28 Stephanie Weirich
@ 2000-05-12 20:38 ` Hongwei Xi
  2000-05-15  8:49   ` Xavier Leroy
  2000-05-15  9:36   ` Eijiro Sumii
  0 siblings, 2 replies; 26+ messages in thread
From: Hongwei Xi @ 2000-05-12 20:38 UTC (permalink / raw)
  To: Stephanie Weirich; +Cc: 'caml-list@inria.fr'

> The real difference between ML references and Java pointers is that ML
> separates "reference" from "possibly unitialized", while Java combines the
> two into one construct. It's not to difficult to implement Java-style
> pointers in ML, just use an option ref. For example:
> 
> type 'a ptr = a' option ref
> exception NullPointer
> let new () = ref None
> let get x = match !x with Some y -> y | None -> raise NullPointer
> let set x y = x := Some y 
> 
> ML, of course, lacks the syntactic support to use these pointers as
> gracefully as Java can. On the other hand, the problem with _Java_ is
> efficiency loss, as the programmer cannot syntactically enforce that the
> reference is initialized -- requiring a null check at every use.
> 
> -Stephanie

Yes, in theory, it requires null check at every use. However,
I assume that a marjority of such null checks can be readily
eliminated using flow analysis, though things can become difficult
when arrays are involved. Note that Java is imperative, which
makes flow analysis easier (compared to ML).

A common saying is that a program is made safer if references
are always initialized. I find this is contrary to the truth.
In this case, a program is made safer if you insert run-time
checks (and rely on a compiler to remove redundant checks)

If I use 'get' and 'set', is there a compiler to eliminate such
checks (i.e., after 'set', 'get' should do no tag checking)?

Yes, ML allows the programmer to tag values (using option
types) and thus shifts the burden to the programmer. In Java,
this is done automatically (and we can rely on a compiler to
remove redundant null checks). Which is better?

--Hongwei




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-12 17:07       ` Pierre Weis
@ 2000-05-12 19:59         ` Hongwei Xi
  2000-05-15  6:58           ` Max Skaller
  2000-05-14 14:37         ` John Max Skaller
  1 sibling, 1 reply; 26+ messages in thread
From: Hongwei Xi @ 2000-05-12 19:59 UTC (permalink / raw)
  To: Pierre Weis; +Cc: caml-list

> Not exactly, you just have to find one element into a. For instance:
>  let alen = Array.length a in
>  if alen = 0 then b (or Array.copy b if you need a fresh array) else
>  let ? = a.(0) in
>  let c = ...
> 
>   for i = 1 to alen - 1 do
>    ...

But, why? Why should we first initialize c with such a value.
I don't think you have made a program safer by doing so.

> # Array.init;;
> - : int -> f:(int -> 'a) -> 'a array = <fun>
> 
> Using it, you don't need to bother with any dummy initialization value:
> 
> let combine_arrays a b =
>   let alen = Array.length a in
>   let blen = Array.length b in
>   let init i = if i < alen then a.(i) else b.(i-alen) in
>   Array.init (alen + blen) init

This is great in this case. But suppose that I want to compute
the combinatorial numbers 'n chooses k' for k= 0,..,n.

let chooses (n) =
  let result = Array.array (n+1) ? in begin
    result.(0) <- 1; result.(n) <- 1;
    for i = 1 to n/2 do
      result.(i) <- result.(i-1) * (n-i+1) / i;
      result.(n-i) <- result.(i);
    done;
    result
  end
(* code is not tested *)

Certainly, we can replace ? with 0. But what is really achieved?
I would say it is simply an illusion that a program is made safer
by initializing each array upon its allocation. Actually, a program
is likely made faster but unsafer by doing so (since no nullity
checking is needed but a wrong value may be supplied to continue
a computation that should be aborted)

Best,

--Hongwei

\~~~~/ \\   //  \\    //    @       Mail: hwxi@ececs.uc.edu
C-o^o,  ))__||   \\__//_  // \\     Url: http://www.ececs.uc.edu/~hwxi
(  ^ )  ))__||    \--/-\\     \\    Tel: +1 513 871 4947 (home)
/ \V\   ))  ||     //   \\     \\   Tel: +1 513 556 4762 (office)
------ //   || o  //     \\     \\//Fax: +1 513 556 7326 (department)
Rhodes Hall 811-D
Department of ECE & CS  
University of Cincinnati
P. O. Box 210030
Cincinnati, OH 45221-0030

On Fri, 12 May 2000, Pierre Weis wrote:

> 
> > (2) If you really want to make sure that 'c' is well-initialized,
> > you should probably check this after those two loops. The question
> > is how to incorporate the checking result into the type system.
> > (3) If you initialize 'c' with a (wrong) value, it seems to me
> > that nothing is achieved.
> > (4) Also, the problem cannot be solved using option type.
> > 
> > This is a precise senario that I had in mind, where the kind of
> > mandatory array initialization in ML-like langugages is simply
> > inappropriate, isn't it?
> 
> You should consider that there is a general initialisation function in
> the Array module, named Array.init, that allocates for you a fresh
> array then fill it with values obtained by calling an arbitrary
> supplied function:
> 
> 
> This code ensures the ``always initialized strategy'' of ML, and seems
> to me elegant and clear (but note that it uses higher-order
> functionality). Have I missed something ?
> 
> Best regards,
> 
> PS: An even shorter version of combine_arrays should be
>     let combine_arrays = Array.append
> 
> Pierre Weis
> 
> INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/
> 
> 





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-11 18:59     ` Hongwei Xi
@ 2000-05-12 17:07       ` Pierre Weis
  2000-05-12 19:59         ` Hongwei Xi
  2000-05-14 14:37         ` John Max Skaller
  2000-05-13  7:07       ` Daniel de Rauglaudre
  2000-05-13  7:09       ` Daniel de Rauglaudre
  2 siblings, 2 replies; 26+ messages in thread
From: Pierre Weis @ 2000-05-12 17:07 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

> Okay, I withdraw my argument that the Java strategy is better then
> the ML strategy However, I'd like to use the following example
> to make my point clear.
> 
> I want to combine two arrays into one. Here is the code in OCaml.
> 
> let combine_arrays a b =
>   let alen = Array.length a in
>   let blen = Array.length b in
>   let c = Array.make (alen + blen) ?
>   in begin
>     for i = 0 to alen - 1 do
>       c.(i) <- a.(i)
>     done;
>     for i = 0 to blen -1 do
>       c.(alen + i) <- b.(i)
>     done
>   end
> 
> Of course, you need to provide ? to make the above code work.
> Here is my argument:
> 
> (1) If you try to provide ?, the code becomes repulsive.

Not exactly, you just have to find one element into a. For instance:
 let alen = Array.length a in
 if alen = 0 then b (or Array.copy b if you need a fresh array) else
 let ? = a.(0) in
 let c = ...

  for i = 1 to alen - 1 do
   ...

> (2) If you really want to make sure that 'c' is well-initialized,
> you should probably check this after those two loops. The question
> is how to incorporate the checking result into the type system.
> (3) If you initialize 'c' with a (wrong) value, it seems to me
> that nothing is achieved.
> (4) Also, the problem cannot be solved using option type.
> 
> This is a precise senario that I had in mind, where the kind of
> mandatory array initialization in ML-like langugages is simply
> inappropriate, isn't it?

You should consider that there is a general initialisation function in
the Array module, named Array.init, that allocates for you a fresh
array then fill it with values obtained by calling an arbitrary
supplied function:

# Array.init;;
- : int -> f:(int -> 'a) -> 'a array = <fun>

Using it, you don't need to bother with any dummy initialization value:

let combine_arrays a b =
  let alen = Array.length a in
  let blen = Array.length b in
  let init i = if i < alen then a.(i) else b.(i) in
  Array.init (alen + blen) init

This code ensures the ``always initialized strategy'' of ML, and seems
to me elegant and clear (but note that it uses higher-order
functionality). Have I missed something ?

Best regards,

PS: An even shorter version of combine_arrays should be
    let combine_arrays = Array.append

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-11 13:58   ` Pierre Weis
@ 2000-05-11 18:59     ` Hongwei Xi
  2000-05-12 17:07       ` Pierre Weis
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Hongwei Xi @ 2000-05-11 18:59 UTC (permalink / raw)
  To: Pierre Weis; +Cc: caml-list

Okay, I withdraw my argument that the Java strategy is better then
the ML strategy However, I'd like to use the following example
to make my point clear.

I want to combine two arrays into one. Here is the code in OCaml.

let combine_arrays a b =
  let alen = Array.length a in
  let blen = Array.length b in
  let c = Array.make (alen + blen) ?
  in begin
    for i = 0 to alen - 1 do
      c.(i) <- a.(i)
    done;
    for i = 0 to blen -1 do
      c.(alen + i) <- b.(i)
    done
  end

Of course, you need to provide ? to make the above code work.
Here is my argument:

(1) If you try to provide ?, the code becomes repulsive.
(2) If you really want to make sure that 'c' is well-initialized,
you should probably check this after those two loops. The question
is how to incorporate the checking result into the type system.
(3) If you initialize 'c' with a (wrong) value, it seems to me
that nothing is achieved.
(4) Also, the problem cannot be solved using option type.

This is a precise senario that I had in mind, where the kind of
mandatory array initialization in ML-like langugages is simply
inappropriate, isn't it?

Cheers,

--Hongwei

\~~~~/ \\   //  \\    //    @       Mail: hwxi@ececs.uc.edu
C-o^o,  ))__||   \\__//_  // \\     Url: http://www.ececs.uc.edu/~hwxi
(  ^ )  ))__||    \--/-\\     \\    
/ \V\   ))  ||     //   \\     \\   Tel: +1 513 556 4762 (office)
------ //   || o  //     \\     \\//Fax: +1 513 556 7326 (department)
 




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-10  4:50 ` reference initialization Hongwei Xi
  2000-05-11 13:58   ` Pierre Weis
@ 2000-05-11 16:02   ` John Prevost
  1 sibling, 0 replies; 26+ messages in thread
From: John Prevost @ 2000-05-11 16:02 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

>>>>> "hx" == Hongwei Xi <hwxi@ececs.uc.edu> writes:

    hx> I have given some thoughts on this one.

    hx> I find that it is really not a good design choice to
    hx> initialize every reference in all ML-like languages (yes, I
    hx> well realize the difficulty in not doing so). Here is my
    hx> argument.

The contrary argument is twofold.

First--if you want a reference which you may assign no initial value
to, you can use an option.  Certainly, this is yet another bit of
overhead.  (But the good thing is that unlike in Java, you have the
choice to always receive a not-null value, and therefore you don't
have to check all the time!)

Second--as for uninitialized values and loops and the like: try to
find a better way to do it.  I don't remember the last time I had to
use a reference in a loop to get what I wanted.  I hardly remember the
last time I used a reference.

Another reason to prefer this second approach (especially in a loop)
is that O'Caml's GC mechanism implies that destructive updates of heap
values are slower than variable changes, so the tail-recursive loop:

let test l =
  let rec loop n = function 
    | [] -> n
    | _ :: t -> loop (n + 1) t
  in loop 0 l

is more efficient than:

let test l =
  let l' = ref l in
  let n = ref 0 in

 let n = ref 0 in
  while !l' <> [] do
    incr n;
    l' := List.tl l'
  done;
  !n

even if you ignore the cost of repeatedly dereferencing--just because
of the cost of updating.  IMO, it's easier to read, too, though you
need to be used to the idiom.

Note that a loop like "for i = 1 to n" is as efficient (or more so,
possibly, I don't remember) as the equivalent tail recursive loop.
Why?  Because like the tail recursive call, all updates are temporary
storage in the stack, not values in the heap.


I have not seen the message you are replying to, so I don't know if it
addresses your specific concerns--but it might, I hope, clarify
something about ML style type systems that I find very important: the
ability to *choose* what features a variable has.  (Optional value,
mutable value) rather than always having to deal with optional mutable
values.

John.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: reference initialization
@ 2000-05-11 14:28 Stephanie Weirich
  2000-05-12 20:38 ` Hongwei Xi
  0 siblings, 1 reply; 26+ messages in thread
From: Stephanie Weirich @ 2000-05-11 14:28 UTC (permalink / raw)
  To: 'Hongwei  Xi', 'caml-list@inria.fr'



> -----Original Message-----
> From: Hongwei Xi [mailto:hwxi@ececs.uc.edu]
> Sent: Wednesday, May 10, 2000 12:50 AM
> To: caml-list@inria.fr
> Subject: reference initialization
> 
> 
> > Wrong. You have references, which are quite better than pointers
> > (they are typed, and necessarily initialized)
> 
> I have given some thoughts on this one.
> 
[... parts elided ...]
> 
> Can you still say that the ML strategy is better than the Java
> strategy? I thus argue that it is better using dynamic checking
> to detect reading from uninitialized reference than simply
> assigning a value to every reference upon its creation.
> 
> To summarize, my bottom line question is: what is really achieved
> by assigning a reference a (wrong) initial value? Isn't this just
> like an ostrich solving its problem by burying its head in sand?
> 
> Of course, another problem with the ML strategy is efficiency loss
> (which, though, is often negligible as discussed here before)

The real difference between ML references and Java pointers is that ML
separates "reference" from "possibly unitialized", while Java combines the
two into one construct. It's not to difficult to implement Java-style
pointers in ML, just use an option ref. For example:

type 'a ptr = a' option ref
exception NullPointer
let new () = ref None
let get x = match !x with Some y -> y | None -> raise NullPointer
let set x y = x := Some y 

ML, of course, lacks the syntactic support to use these pointers as
gracefully as Java can. On the other hand, the problem with _Java_ is
efficiency loss, as the programmer cannot syntactically enforce that the
reference is initialized -- requiring a null check at every use.

-Stephanie




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: reference initialization
  2000-05-10  4:50 ` reference initialization Hongwei Xi
@ 2000-05-11 13:58   ` Pierre Weis
  2000-05-11 18:59     ` Hongwei Xi
  2000-05-11 16:02   ` John Prevost
  1 sibling, 1 reply; 26+ messages in thread
From: Pierre Weis @ 2000-05-11 13:58 UTC (permalink / raw)
  To: Hongwei Xi; +Cc: caml-list

> I find that it is really not a good design choice to initialize
> every reference in all ML-like languages (yes, I well realize
> the difficulty in not doing so). Here is my argument.
[...]

> (1) ML strategy: initialize x with any value 'v' of an appropriate
[...]

> (2) Java Strategy: let us say we have the same scenario as before
> but 'x' is not initialized. If I read from x before assigning a value
[...]

> Can you still say that the ML strategy is better than the Java
> strategy?

Yes, I can.

> I thus argue that it is better using dynamic checking
> to detect reading from uninitialized reference than simply
> assigning a value to every reference upon its creation.
> 
> To summarize, my bottom line question is: what is really achieved
> by assigning a reference a (wrong) initial value? Isn't this just
> like an ostrich solving its problem by burying its head in sand?
> 
> Of course, another problem with the ML strategy is efficiency loss
> (which, though, is often negligible as discussed here before)

The ML style is better, since the default regime is to initialize
references and in most cases, you perfectly know the correct
initialization value at creation time.

For the few remaining cases, where you cannot guess the initial value,
then you must use another scheme that indeed involves dynamic testing;
I will not consider initialization with ``any value 'v' of an
appropriate type'' as a receivable solution, since this ``solution''
is far too error prone (as you had already mentioned).

Another well-known trick to initialize references is to use
options. At creation time you define the reference as being ``ref
None'', and later assign it to ``Some v'', when v can be computed.

Then in your program you have to pattern match the contents of the
reference to get its value, and raise an exception if necessary,
getting the Java semantics.

To elegantly handle this new scheme, you may even write a new
dereferencing prefix operator:

let ( !! ) r =
 match !r with
 | None -> invalid_arg "not yet initialized"
 | Some v -> v;;

Then you just substitute !!r to !r into your program.

One step further is to aly define a new assignment operator to hide the
``Some'' manipulations:

let ( =: ) r v = r := Some v;; 

Now, for instance:

let r = ref None;;

r =: 1;;

r =: 2 + !!r;;

This way, the ``cannot initialize my reference at creation time''
problem, is somewhat solved, but still evident in your code (and this
is a good property, since there is a real difficulty in your source
code, that can raise an exception when accessing the ``r'' reference).

Conclusion: I prefer the ML style, which elegantly solves the common
case and let you encode fairly simply the hairy cases. The efficiency
loss problem deceives the same remark and the same solution: there is
no loss in the most common case (in that case you know what is the
initial value and have to compute it anyway), and in the hairy cases
you can use the None/Some trick, an this hairy case is evident in the
source code.

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/





^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: reference initialization
@ 2000-05-11 13:48 Dave Berry
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Berry @ 2000-05-11 13:48 UTC (permalink / raw)
  To: Hongwei Xi, caml-list

IMO, the "ML strategy" is to use an option type.  SML has 
	datatype 'a option = NONE | SOME of 'a
although I prefer 
	datatype 'a option = NULL | VALUE of 'a.
I don't know if Caml provides such a type as standard, although
it's easy to define your own.

Then your code checks whether the value has been assigned (dynamically,
as you require), and can take whatever action is appropriate.  If you want
to throw an exception, you can.

With this approach, the ML type system tells you which values may be null.
This has an advantage over the Java approach, where any value may be
null or uninitialised.

Dave.

-----Original Message-----
From: Hongwei Xi [mailto:hwxi@ececs.uc.edu]
Sent: Wednesday, May 10, 2000 5:50 AM
To: caml-list@inria.fr
Subject: reference initialization


> Wrong. You have references, which are quite better than pointers
> (they are typed, and necessarily initialized)

Suppose I use a reference 'x'. If I know what the initial value
of 'x' should be, I'll of course prefer to initialize it with that
value. Now suppose I don't, that is, I intend to assign a value
to 'v' later (maybe in a loop or in a conditional branch)

(1) ML strategy: initialize x with any value 'v' of an appropriate
type (sometimes, such a 'v' is difficult to find or takes time to
construct (consider 'x' to be a large array)).




^ permalink raw reply	[flat|nested] 26+ messages in thread

* reference initialization
  2000-04-25 18:16 Caml wish list Pierre Weis
@ 2000-05-10  4:50 ` Hongwei Xi
  2000-05-11 13:58   ` Pierre Weis
  2000-05-11 16:02   ` John Prevost
  0 siblings, 2 replies; 26+ messages in thread
From: Hongwei Xi @ 2000-05-10  4:50 UTC (permalink / raw)
  To: caml-list

> Wrong. You have references, which are quite better than pointers
> (they are typed, and necessarily initialized)

I have given some thoughts on this one.

I find that it is really not a good design choice to initialize
every reference in all ML-like languages (yes, I well realize
the difficulty in not doing so). Here is my argument.

The common wisdom is that if we initialize every reference
immediately after it is created then we shall never read from
an uninitialized reference. But this is a lame argument.

Suppose I use a reference 'x'. If I know what the initial value
of 'x' should be, I'll of course prefer to initialize it with that
value. Now suppose I don't, that is, I intend to assign a value
to 'v' later (maybe in a loop or in a conditional branch)

(1) ML strategy: initialize x with any value 'v' of an appropriate
type (sometimes, such a 'v' is difficult to find or takes time to
construct (consider 'x' to be a large array)).

Now suppose I make a mistake, forgetting to assign a value to 'x'
later. What happens is that the computation now happily use the
*wrong* initial value and sings the song "well-typed-program-can-
never-go-wrong" until a (most likely) wrong answer or exception is
returned.

(2) Java Strategy: let us say we have the same scenario as before
but 'x' is not initialized. If I read from x before assigning a value
to 'x', the execution abnormally stops (yes, we need run-time checking).
I think this is much better than to carry the execution further until a
wrong answer or exception is returned.

Can you still say that the ML strategy is better than the Java
strategy? I thus argue that it is better using dynamic checking
to detect reading from uninitialized reference than simply
assigning a value to every reference upon its creation.

To summarize, my bottom line question is: what is really achieved
by assigning a reference a (wrong) initial value? Isn't this just
like an ostrich solving its problem by burying its head in sand?

Of course, another problem with the ML strategy is efficiency loss
(which, though, is often negligible as discussed here before)

--Hongwei Xi

\~~~~/ \\   //  \\    //    @       Mail: hwxi@ececs.uc.edu
C-o^o,  ))__||   \\__//_  // \\     Url: http://www.ececs.uc.edu/~hwxi
(  ^ )  ))__||    \--/-\\     \\    
/ \V\   ))  ||     //   \\     \\   Tel: +1 513 556 4762 (office)
------ //   || o  //     \\     \\//Fax: +1 513 556 7326 (department)






^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2000-05-24  8:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-20 20:13 reference initialization Simon Peyton-Jones
2000-05-22 16:10 ` Pierre Weis
  -- strict thread matches above, loose matches on Subject: below --
2000-05-11 14:28 Stephanie Weirich
2000-05-12 20:38 ` Hongwei Xi
2000-05-15  8:49   ` Xavier Leroy
2000-05-15 17:47     ` Hongwei Xi
2000-05-15 21:33       ` Pierre Weis
2000-05-16  2:53         ` Hongwei Xi
2000-05-18 16:16           ` Pierre Weis
2000-05-19  6:54             ` Max Skaller
2000-05-22 15:28               ` Pierre Weis
2000-05-22 22:29                 ` Max Skaller
2000-05-15 22:20       ` Dave Mason
2000-05-15  9:36   ` Eijiro Sumii
2000-05-11 13:48 Dave Berry
2000-04-25 18:16 Caml wish list Pierre Weis
2000-05-10  4:50 ` reference initialization Hongwei Xi
2000-05-11 13:58   ` Pierre Weis
2000-05-11 18:59     ` Hongwei Xi
2000-05-12 17:07       ` Pierre Weis
2000-05-12 19:59         ` Hongwei Xi
2000-05-15  6:58           ` Max Skaller
2000-05-15 17:56             ` Hongwei Xi
2000-05-14 14:37         ` John Max Skaller
2000-05-13  7:07       ` Daniel de Rauglaudre
2000-05-13  7:09       ` Daniel de Rauglaudre
2000-05-11 16:02   ` John Prevost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).