caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: ocaml limitations
@ 1999-12-13 19:18 David McClain
  0 siblings, 0 replies; 5+ messages in thread
From: David McClain @ 1999-12-13 19:18 UTC (permalink / raw)
  To: caml-list, John Carol Langford

I have implemented an extension to OCAML for large numeric arrays. This is a
C module (actually C++) and the measurements that I have made for it show
only a 10% speed penalty compared to native OCAML arrays. Of course it
depends on how you make this measurement. The overhead is confined to an
initial exploration of the type of datum, and so it can be amortized quite
nicely when you really do have large arrays. My measurements were made by
faking the maximum allowable OCAML array size to some small value like 100
elements or so, so that the external extension would get exercised. Under
this circumstance the overhead was measured at 10-11%.  BTW this code is on
the Hump and available from there to my Web site.

- David McClain

-----Original Message-----
From: John Carol Langford <jcl@gs176.sp.cs.cmu.edu>
To: caml-list@inria.fr <caml-list@inria.fr>
Date: Monday, December 13, 1999 8:25 AM
Subject: ocaml limitations


>I have been encountering some fundamental limitations of the ocaml
>language and compiler that are killing my performance - to the tune of
>a factor of 10 off equivalent C code.  This is a serious problem
>because the program I'm working on is both RAM and cpu intensive.
>
>The performance problems from two limitations.  The first is in the
>compiler and runtime - it's the limitation on the array size on 32 bit
>machines - I only have linux PC's available to work on.  It appears
>that the garbage collector needs some type information to work with
>arrays and enough bits are set aside for type information that not
>enough bits are allowed to specify a large array size.  You can get
>around this large array size problem by simulating a large array with
>an array of arrays, but there is a significant performance penalty.
>
>The second problem is a language failure - there is no 'short int'
>type in ocaml.  Due to the combinatorics of my problem it would be
>very convenient to use 16 bit integers.  Using 32 bit integers instead
>doubles the footprint of the program - which is unacceptable in this
>case.  Consequently, I simulate 16 bit integers using masking games -
>which again incurs a performance penalty.
>
>These two problems together add up to using a function considerably
>more complicated then an array dereference on the inner loop:
>
>let get_first i = i land (num_array -1)
>let get_second i = i lsr (log_num_array+1)
>let get_third i = (i lsr log_num_array) land 1
>
>let get_split split i =
>  let first_index = get_first i
>  and second_index = get_second i
>  and third_index = get_third i in
>  let ret = split.(first_index).(second_index) in
>  let real_ret =
>    if third_index = 1 then ret
>    else ret lsr 16 in
>  real_ret land 65535
>
>Naturally, the performance (relative to what the hardware is capable
>of) is terrible.  Consequently, I'm wondering if there are plans to
>remove either (or both) of these limitations in the near future and
>lacking that if there are better workarounds.  Any suggestions?
>
>-John
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ocaml limitations
  1999-12-14 12:31 ` Xavier Leroy
@ 1999-12-15  3:06   ` John Carol Langford
  0 siblings, 0 replies; 5+ messages in thread
From: John Carol Langford @ 1999-12-15  3:06 UTC (permalink / raw)
  To: Xavier Leroy, caml-list, David McClain


I grabbed David McClain's code, figured out how to use it, then
stripped it down to the minimal set necessary for what I want to do.
Here are the timings for a little benchmark I made:  (P-120, linux)

Time when redirecting to a c short array: ~40 seconds

Time when "arena"ing a c long array and playing masking games to get
simulated shorts: 25.68 seconds

Time to simulate the long array and play masking games in pure ocaml:
54.90 seconds

Time for native C: 7.13 seconds

'arena' was a big win.  Without the need to play shift & mask games
'arena' will get you down to 14 seconds or so.  The remaining
factor of 2 is probably loop optimizations in the C compiler.

-John

The benchmark in ocaml with 'arena' and short simulation:
let max = 1 lsl 23

let big = General_array.foreign_unsafe_create max 

let set arr loc valu = 
  let real_loc = loc / 2 in
  let mixed_val = Array.unsafe_get arr real_loc in
  let final_val = 
    if loc mod 2 = 1 then (mixed_val land 4294901760) lor valu
    else (mixed_val land 65535) lor (valu lsl 16) in
  Array.unsafe_set arr real_loc final_val

let get arr loc = 
  let ret = Array.unsafe_get arr (loc/2) in
  let real_ret = 
    if loc mod 2 = 1 then ret
    else ret lsr 16 in
  real_ret land 65535
    
let _ = 
  let accum =ref 0 in
  let ml_big = General_array.arena big in
  for i = 0 to max -1 do
    set ml_big i i
  done;
  for j = 0 to 9 do
    for i = 0 to max -1 do
      accum := get ml_big i
    done;
  done;
  print_int !accum; print_newline()

The benchmark in c:
int main(void)
{
  int i,j;
  int max = 1 << 23;
  int accum = 0;
  short int *big = (short int *)malloc(sizeof(short) *max);
  
  for(i=0;i<max-1;i++)
    big[i] = (short)i;
  for(j=0; j<10; j++)
    for(i=0;i<max-1;i++)
      accum = big[i];
  printf("%i\n",accum);
  return 0;
}

The support code:
// foreign_array.c -- Implementation of foreign array objects
// hacked to specialize by jcl@cs.cmu.edu
// DM/MCFA  01/99
// ---------------------------------------------------------------

#include <stdlib.h>
#include <stdio.h>

#include "mlvalues.h"
#include "memory.h"
#include "alloc.h"
#include "fail.h"
#include <assert.h>

// ----------------------------------------------------------------------

inline long int *pdata(value arr)
{
	return (long *)Field(arr,1);
}

inline void foreign_array_delete(value arr)
{
	free(pdata(arr));
}

inline value foreign_array_unsafe_get(value arr, value ix)
{
  //  printf("get %i = %i\n",Long_val(ix), pdata(arr)[Long_val(ix)]);
	return pdata(arr)[Long_val(ix)];
}

inline value foreign_array_unsafe_set(value arr, value ix, value val)
{
  //  printf("set %i to %i\n",Long_val(ix), Long_val(val));
	pdata(arr)[Long_val(ix)] = (long)val;
	return Val_unit;
}

inline value foreign_arena(value arr)
{
  return pdata(arr);  
}

value foreign_array_unsafe_create(value vlen)
{
        CAMLparam1(vlen);
	CAMLlocal1(res);

	long int *p;

	long len = Long_val(vlen);

	res = alloc_final(2, foreign_array_delete,1,1);
	p = (long int *)malloc(sizeof(long) *len);

	Store_field(res,1,(value)p);
	CAMLreturn res;
}

general_array.ml:
(* general_array.ml -- Implementation of (essentially) unlimited size arrays *)
(* hacked by jcl@cs.cmu.edu *)
(* DM/MCFA  01/99 *)

type foreign_array  (* opaque foreign array object pointer *)
type raw_ptr        (* opaque raw pointer to foreign array data *)
      
external foreign_array_length :
  foreign_array -> int
  = "foreign_array_length"

(* unsafe_get_double and unsafe_set_double are provided to allow *)
(* an array to respond to OCAML without requiring that it be an *)
(* array of doubles. *)      
external foreign_unsafe_get :
  foreign_array -> int -> int
  = "foreign_array_unsafe_get"
      
external foreign_unsafe_set :
  foreign_array -> int -> int -> unit
  = "foreign_array_unsafe_set"
      
external foreign_unsafe_create :
  int -> foreign_array
  = "foreign_array_unsafe_create"

external foreign_arena : 
   foreign_array -> raw_ptr = "foreign_arena"      

let arena f = ((Obj.magic (foreign_arena f)): int array)




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ocaml limitations
  1999-12-13  4:34 John Carol Langford
  1999-12-13 12:34 ` William Chesters
@ 1999-12-14 12:31 ` Xavier Leroy
  1999-12-15  3:06   ` John Carol Langford
  1 sibling, 1 reply; 5+ messages in thread
From: Xavier Leroy @ 1999-12-14 12:31 UTC (permalink / raw)
  To: John Carol Langford, caml-list

> The performance problems from two limitations.  The first is in the
> compiler and runtime - it's the limitation on the array size on 32 bit
> machines - I only have linux PC's available to work on.  It appears
> that the garbage collector needs some type information to work with
> arrays and enough bits are set aside for type information that not
> enough bits are allowed to specify a large array size.  You can get
> around this large array size problem by simulating a large array with
> an array of arrays, but there is a significant performance penalty.

This is correct.  We plan to address this by adding "large arrays"
that would be allocatd outside the Caml heap, and handled via an extra
indirection.

> The second problem is a language failure - there is no 'short int'
> type in ocaml.  Due to the combinatorics of my problem it would be
> very convenient to use 16 bit integers.  Using 32 bit integers instead
> doubles the footprint of the program - which is unacceptable in this
> case.  Consequently, I simulate 16 bit integers using masking games -
> which again incurs a performance penalty.  

Right.  Notice that the code you sent only simulates 15 bit integers
(because there are only 31 bits in a value of type "int").  Arrays of
16 bit integers can be expressed using strings and byte accesses, but
the performance isn't going to be much better than with your code.

> Consequently, I'm wondering if there are plans to
> remove either (or both) of these limitations in the near future and
> lacking that if there are better workarounds.

We have plans to support large integer and float arrays with various
sizes of elements at some point in the future, but the design isn't
complete yet.  In particular, there is a tension between flexible but
slow accesses (e.g. multidimensionnal arrays with automatic conversion
between the C and Fortran layouts) on the one hand, and basic but fast
primitives.

- Xavier Leroy




^ permalink raw reply	[flat|nested] 5+ messages in thread

* ocaml limitations
  1999-12-13  4:34 John Carol Langford
@ 1999-12-13 12:34 ` William Chesters
  1999-12-14 12:31 ` Xavier Leroy
  1 sibling, 0 replies; 5+ messages in thread
From: William Chesters @ 1999-12-13 12:34 UTC (permalink / raw)
  To: caml-list

John Carol Langford writes:
 > The performance problems from two limitations.  The first is in the
 > compiler and runtime - it's the limitation on the array size on 32 bit
 > machines - I only have linux PC's available to work on.

I think this has been worked on by David McClain:
http://caml.inria.fr/caml-list/1041.html

 > The second problem is a language failure - there is no 'short int'
 > type in ocaml.

The underlying problem is that for important reasons of GC efficiency
etc., ocaml (like nearly all other GCd languages) works on word-sized
objects only, i.e. 32 or 64 bits.  To support access to a smaller
granularity, like 16 bits, it's necessary to do something like how the
`string' type is implemented, with some slightly hacky arithmetic for
working out its length.  I don't think that would be difficult to
duplicate for 16 bit numbers, seeing as how the compiler code is so
pretty but would have to be careful to use unsafe_get.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* ocaml limitations
@ 1999-12-13  4:34 John Carol Langford
  1999-12-13 12:34 ` William Chesters
  1999-12-14 12:31 ` Xavier Leroy
  0 siblings, 2 replies; 5+ messages in thread
From: John Carol Langford @ 1999-12-13  4:34 UTC (permalink / raw)
  To: caml-list

I have been encountering some fundamental limitations of the ocaml
language and compiler that are killing my performance - to the tune of
a factor of 10 off equivalent C code.  This is a serious problem
because the program I'm working on is both RAM and cpu intensive.

The performance problems from two limitations.  The first is in the
compiler and runtime - it's the limitation on the array size on 32 bit
machines - I only have linux PC's available to work on.  It appears
that the garbage collector needs some type information to work with
arrays and enough bits are set aside for type information that not
enough bits are allowed to specify a large array size.  You can get
around this large array size problem by simulating a large array with
an array of arrays, but there is a significant performance penalty.

The second problem is a language failure - there is no 'short int'
type in ocaml.  Due to the combinatorics of my problem it would be
very convenient to use 16 bit integers.  Using 32 bit integers instead
doubles the footprint of the program - which is unacceptable in this
case.  Consequently, I simulate 16 bit integers using masking games -
which again incurs a performance penalty.  

These two problems together add up to using a function considerably
more complicated then an array dereference on the inner loop:

let get_first i = i land (num_array -1) 
let get_second i = i lsr (log_num_array+1)
let get_third i = (i lsr log_num_array) land 1

let get_split split i = 
  let first_index = get_first i 
  and second_index = get_second i 
  and third_index = get_third i in
  let ret = split.(first_index).(second_index) in
  let real_ret = 
    if third_index = 1 then ret
    else ret lsr 16 in
  real_ret land 65535

Naturally, the performance (relative to what the hardware is capable
of) is terrible.  Consequently, I'm wondering if there are plans to
remove either (or both) of these limitations in the near future and
lacking that if there are better workarounds.  Any suggestions?

-John




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-12-15  9:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-13 19:18 ocaml limitations David McClain
  -- strict thread matches above, loose matches on Subject: below --
1999-12-13  4:34 John Carol Langford
1999-12-13 12:34 ` William Chesters
1999-12-14 12:31 ` Xavier Leroy
1999-12-15  3:06   ` John Carol Langford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).