Bigarrays and temporar C pointers

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Bigarrays and temporar C pointers
@ 2005-01-15 23:40 Daniel Bünzli
  2005-01-16  2:28 ` [Caml-list] " John Prevost
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Bünzli @ 2005-01-15 23:40 UTC (permalink / raw)
  To: caml-list caml-list

Hello,

Suppose that I have a C library which allows me to access data via a 
temporar pointer with the following interface  :

> void *map(void);  /* Returns a valid pointer to data */
> int map_size(void);  /* Returns the size of the data in bytes. */
> void unmap(void);  /* Invalidates the last pointer returned by map. */

Mapping data to a pointer must only be done for a short period of time: 
map, get size, process data, and unmap.

I would like to be able to process data in ocaml with bigarrays. To do 
so I provide the ocaml function `map'. This function maps the pointer, 
pass it as a bigarray to a user callback to process the data and then 
unmaps the pointer.

> open Bigarray;;
>
> type ('a, 'b) data = ('a, 'b, c_layout) Array1.t
>
> val map : ('a, b) kind -> (('a, 'b) data -> unit) -> unit

Map is implemented as follow (C primitives are at the end of the mail),

> external _map_ptr : ('a, 'b) kind -> ('a, 'b) data = "stub_map_ptr"
> external _unmap_ptr : ('a, 'b) data -> unit = "stub_unmap_ptr"
>
> let map k f =
>   let a = _map_ptr k in
>   f a;
>   _unmap_ptr a

My problem is that the provided bigarray may escape the scope of the 
user callback (e.g. by setting a global reference to the bigarray) 
potentially allowing the user to access data at an invalid pointer 
position after the pointer was invalidated.

In fact for the bigarray itself it is not a problem, I set its 
dimension to zero when I unmap it in _unmap_ptr (see the C 
implementation below) so access outside the user callback raise 
exceptions. However, according to my experiments and wandering in the 
implementation of bigarray this doesn't work if the user extracts a 
subarray with Array1.sub and sets it to a global variable.

Is there a solution to make that completely safe or I can only warn the 
user that he should not to try to escape data from the callback ?

Thanks for your help,

Daniel

The implementation of the C primitives :

> extern int bigarray_element_size[]; /* bigarray_stubs.c */
>
> CAMLprim value stub_map_ptr (value kind)
> {
>   void *p = map ();
>   long dim = map_size () / bigarray_element_size[Int_val (kind)];
>   int flag = Int_val (kind) | BIGARRAY_C_LAYOUT | BIGARRAY_EXTERNAL;
>   return alloc_bigarray(flag, 1, p, &dim);
> }
>
> CAMLprim value stub_unmap_ptr (value b)
> {
>   struct caml_bigarray *arr = Bigarray_val(b);
>   arr->data = NULL;
>   arr->dim[0] = 0;
>   unmap();
>   return Val_unit;
> }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Bigarrays and temporar C pointers
  2005-01-15 23:40 Bigarrays and temporar C pointers Daniel Bünzli
@ 2005-01-16  2:28 ` John Prevost
  2005-01-16 12:31   ` Daniel Bünzli
  0 siblings, 1 reply; 3+ messages in thread
From: John Prevost @ 2005-01-16  2:28 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list caml-list

Well, assuming you really need to work in this strange way, I have a
couple of thoughts how to do it.  Note that it's going to be rather
unsound to work with this in any case, but at least you will get
exceptions instead of core dumps or worse.  The heart of the matter is
that you should *not* allow the user to manipulate the data array
directly.

module Scary_map_thingy_1 =
 (struct
    exception Scary_map_unmapped
    exception Scary_map_conflict
    type ('a, 'b) t = ('a, 'b, c_layout) Bigarray.Array1.t ref
    let dim a = match a with None -> raise Scary_map_unmapped
                           | Some a' -> Array1.dim a'
    (* replicate other functionality from Array1 below *)

    let already_mapped = ref false
    let map k f =
      if !already_mapped then raise Scary_map_conflict else
      let a = ref (Some (_map_ptr k)) in
        try
          already_mapped := true;
          f a;
          _unmap_ptr k;
          a := None;
          already_mapped := false
        with exn -> begin
          _unmap_ptr k;
          a := None;
          already_mapped := false;
          raise exn
        end
  end : sig
    type ('a, 'b) t
    exception Scary_map_unmapped (* It escaped scope *)
    exception Scary_map_conflict (* Tried to map inside map *)
    val dim : ('a, 'b) t -> int
    (* rest of replicated API *)
    val map : ('a, 'b) kind -> (('a, 'b) t -> unit) -> unit
  end)

So the approach here is to wrap the value in such a way that it
doesn't matter if it escape the scope.  This is not really any better
than what you have now.  You still have to warn the user *not* to
allow it to escape scope, since it won't work.  But the benefit is
that it is guaranteed to fail if the user tries it.

The second approach is to prevent the user from accessing the data directly:

module Scary_map_thingy_2 =
 (struct
    exception Scary_map_conflict
    let already_mapped = ref false
    let map k f =
      if !already_mapped then raise Scary_map_conflict else
      try
        let a = _map_ptr k in begin
          already_mapped := true;
          for i = 0 to Array1.dim a do
            f a.{i}
          done;
          _unmap_ptr a;
          already_mapped := false
        end
      with exn -> begin
        _unmap_ptr k;
        already_mapped := false;
        raise exn
      end
  end : sig
    exception Scary_map_conflict (* Tried to map inside map *)
    val map : ('a, 'b) kind -> (int -> 'a -> unit) -> unit
  end)

In this second case, the approach is to prevent the caller from ever
getting a handle on the actual data array.  Instead, the map is made,
the caller is handed every (index, value) pair from the array in turn,
and then the map is unmade.  This is much more restrictive, but also
much safer.

Finally, this kind of approach might be best if mapping and unmapping
is not particularly expensive, and you trust the user to act better:

module Scary_map_thing =
 (struct
    val data = (ref None : (some, specific, c_layout) Array1.t ref)
    val hold_count = ref 0
    let hold () =
      begin
        incr hold_count;
        match !data with
          | Some _ -> ()
          | None -> data := _map_ptr some_specific_kind
        end
    let unhold () =
      begin
        decr holding;
        match !holding with
          | 0 -> _unmap_ptr some_specific_kind
          | _ -> ()
      end
    let work f =
      begin
        hold ();
        try
          let result = f () in
          unhold ();
          result
        with exn -> (unhold (); raise exn)
      end
    let dim () = work (fun () -> Array1.dim !data)
    (* rest of modified Array1 calls here *)
  end : sig
    val work : (unit -> 'a) -> 'a
    val dim : unit -> int
    (* rest of modified Array1 calls *)
  end)

In this last approach, instead of wrapping that array up in a data
structure, we wrap it up in a module.  The module either has a
currently mapped copy of the data, or it doesn't.  If you call
Scary_map_thing.dim (), you get the dimensions of the data, no matter
what.  If the data was unmapped when you called dim, it is mapped, the
value is gotten, then it is unmapped.  If you have a *lot* of work to
do and wish to avoid mapping and unmapping constantly, you can wrap
your function up in work like this:

  let myfunc () =
    Scary_map_thing.work (fun () ->
      for i = 0 to Scary_map_thing.dim () do
        Scary_map_thing.set i (Scary_map_thing.get i + 1)
      done)

Which will map it once, then use it a lot, then unmap it at the end. 
This version also doesn't throw up if you call a function that tries
to map while mapped--it just increments a counter.

This third solution may be the best one over all--especially because
there are a number of ways it can be improved.  For example:

  * If you need to be able to map as multiple different kinds, then
    you can provide more useful state in the code, so that you track
    what kind it is currently mapped as, and adjust the mapping as
    needed in order to work safely.  In this case, Scary_map_thing.dim
    would have the type ('a, 'b) kind -> int, and likewise all of the
    modified Array1 calls would take kinds instead of unit or actual
    arrays.

  * If you want to avoid mapping and unmapping in a more general way, and
    use threads, you could start up a worker thread in the module that
    keeps track of the last time the data was used, and unmaps it if
    it hasn't been used in a certain amount of time.

Finally, please note that none of the skeletal solutions I describe
above are thread-safe.  If more than one thread can be working at a
time (like with the unmap-on-timeout extension), you need to be more
careful about modifying the internal state.  Note that I think
solution 3 is the only one that can cleanly handle threads at all,
since it's the only one that can handle multiple people wanting to
work with the data all at once.

Hope these ideas were useful,

John.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Bigarrays and temporar C pointers
  2005-01-16  2:28 ` [Caml-list] " John Prevost
@ 2005-01-16 12:31   ` Daniel Bünzli
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Bünzli @ 2005-01-16 12:31 UTC (permalink / raw)
  To: John Prevost; +Cc: caml-list caml-list

Le 16 janv. 05, à 03:28, John Prevost a écrit :

> Well, assuming you really need to work in this strange way,
Yes, some part of opengl work like this, vertex buffer objects [1]. In  
fact there a lot more things that are not allowed to do while a buffer  
is mapped and it is not possible to enforce every constraints (however  
most, if not all, of these errors just lead to gl errors, not to core  
dumps).

Anyway, thanks for you time and code. Especially for the handling of  
exceptions occuring in the callback which I completely forgot.

In fact I didn't consider, as you suggest, to make the type

> type ('a, 'b) data = ('a, 'b, c_layout) Array1.t

abstract and replicate Array1.t's "allowed" functionnality in the  
module --- I hope that it won't prevent the optimisations present in  
the compiler for bigarrays.

> set = Array1.set
> get = Array1.get
> ...

The only problem I see is that the user loses the ability to use  
existing bigarray code and the lighter syntax to access/write the  
array. On the other hand I can prevent the user from extracting  
subarrays and I'm on the safe side again.

There's a tradeoff and I cannot make up my mind right now.

Daniel

[1]  
<http://oss.sgi.com/projects/ogl-sample/registry/ARB/ 
vertex_buffer_object.txt>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-01-16 12:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-15 23:40 Bigarrays and temporar C pointers Daniel Bünzli
2005-01-16  2:28 ` [Caml-list] " John Prevost
2005-01-16 12:31   ` Daniel Bünzli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).