caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <xavier.leroy@inria.fr>
To: Norman Ramsey <nr@eecs.harvard.edu>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] implementing bit vectors in OCaml
Date: Mon, 2 Jun 2003 11:26:16 +0200	[thread overview]
Message-ID: <20030602112616.A3740@pauillac.inria.fr> (raw)
In-Reply-To: <20030601170317.002EF12F9CE@flatcoat.eecs.harvard.edu>; from nr@eecs.harvard.edu on Sun, Jun 01, 2003 at 01:03:17PM -0400

> We have a program that is spending a lot of time in set operations,
> and we're thinking of trying an imperative implementation based on
> bit vectors.
> I would hope that the basis of such an implementation would be an array
> of native integers, but on scrutinizing the manual (espeically the chapter
> on interfacing to C), I have concluded that such a thing is not possible.
> Our choices appear to be
> 
>   * An array of native integers, which will be implemented as an array
>     of pointers to native integers, because integers are boxed.
> 
>   * An array of tagged integers, which will be less efficient as
>     dividing by 31 is more expensive than shifting.
> 
> How would the gurus recommend that we proceed?  Is there a better,
> still efficient data structure for a set of small integers?

As others mentioned, you can use strings, which are really arrays of
bytes.  For instance:

type t = string

let create nbits = String.make ((nbits + 7) / 8) '\000'

let is_set s n =
  (Char.code s.[n lsr 3]) land (1 lsl (n land 7)) <> 0

let set s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] lor (1 lsl (n land 7)))

let clear s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] land
                                         (0xFF lxor (1 lsl (n land 7))))

Operations on whole sets such as union and intersection will not be as
fast as they could be with an array of 32- or 64-bit integers
(you're processing the set 8 bits at a time instead of 32 or 64 bits
at a time), though.

Another option is to use 1-dimensional bigarrays of kind int32 or
nativeint.  In bytecode, bigarray accesses are much slower than string
accesses.  However, with sufficient type constraints, ocamlopt can
generate good inline code for bigarray accesses.

For very sparse sets, binary trees (as in the Set module) or Patricia
trees (as in J.-C. Filliâtre's library) can be more efficient, though.

> Could the compiler gods be persuaded to provide unboxed
> representations for arrays of untagged integers, as is already done
> for `float array'?

The special case for float arrays is already a bit of a hack and it
isn't clear this was a really good idea -- although it sure helps with
benchmarks :-)  

The problem with this special case is that every polymorphic array
operation (when the array type is not known statically) is turned into
a run-time test

        if this_is_a_float_array
        then box_float(load_float_from_array)
        else load_word_from_array

If additional special cases were added, these generic array operations
would become even more costly and generate even bigger code...

Hope this helps,

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2003-06-02  9:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-06-01 17:03 Norman Ramsey
2003-06-01 17:50 ` Ville-Pertti Keinonen
2003-06-02  8:21 ` Diego Olivier Fernandez Pons
2003-06-02  8:54 ` Hendrik Tews
2003-06-02  8:59 ` Claude Marche
2003-06-02  9:26 ` Xavier Leroy [this message]
2003-06-02 10:11   ` Marcin 'Qrczak' Kowalczyk
2003-06-02 10:10 ` Fabrice Le Fessant
2003-06-02 13:29 Gregory Morrisett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030602112616.A3740@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=nr@eecs.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).