From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: weis
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id MAA25670 for caml-redistribution; Mon, 13 Dec 1999 12:44:42 +0100 (MET)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id FAA23018 for <caml-list@pauillac.inria.fr>; Mon, 13 Dec 1999 05:34:05 +0100 (MET)
Received: from gs176.sp.cs.cmu.edu (GS176.SP.CS.CMU.EDU [128.2.198.136])
	by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id FAA06109
	for <caml-list@inria.fr>; Mon, 13 Dec 1999 05:34:03 +0100 (MET)
Received: from gs176.sp.cs.cmu.edu (localhost [127.0.0.1])
	by gs176.sp.cs.cmu.edu (8.9.3/8.9.3) with ESMTP id XAA12728
	for <caml-list@inria.fr>; Sun, 12 Dec 1999 23:34:02 -0500
Message-Id: <199912130434.XAA12728@gs176.sp.cs.cmu.edu>
To: caml-list@inria.fr
Subject: ocaml limitations
Date: Sun, 12 Dec 1999 23:34:02 -0500
From: John Carol Langford <jcl@gs176.sp.cs.cmu.edu>
Sender: weis

I have been encountering some fundamental limitations of the ocaml
language and compiler that are killing my performance - to the tune of
a factor of 10 off equivalent C code.  This is a serious problem
because the program I'm working on is both RAM and cpu intensive.

The performance problems from two limitations.  The first is in the
compiler and runtime - it's the limitation on the array size on 32 bit
machines - I only have linux PC's available to work on.  It appears
that the garbage collector needs some type information to work with
arrays and enough bits are set aside for type information that not
enough bits are allowed to specify a large array size.  You can get
around this large array size problem by simulating a large array with
an array of arrays, but there is a significant performance penalty.

The second problem is a language failure - there is no 'short int'
type in ocaml.  Due to the combinatorics of my problem it would be
very convenient to use 16 bit integers.  Using 32 bit integers instead
doubles the footprint of the program - which is unacceptable in this
case.  Consequently, I simulate 16 bit integers using masking games -
which again incurs a performance penalty.  

These two problems together add up to using a function considerably
more complicated then an array dereference on the inner loop:

let get_first i = i land (num_array -1) 
let get_second i = i lsr (log_num_array+1)
let get_third i = (i lsr log_num_array) land 1

let get_split split i = 
  let first_index = get_first i 
  and second_index = get_second i 
  and third_index = get_third i in
  let ret = split.(first_index).(second_index) in
  let real_ret = 
    if third_index = 1 then ret
    else ret lsr 16 in
  real_ret land 65535

Naturally, the performance (relative to what the hardware is capable
of) is terrible.  Consequently, I'm wondering if there are plans to
remove either (or both) of these limitations in the near future and
lacking that if there are better workarounds.  Any suggestions?

-John