caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] implementing bit vectors in OCaml
@ 2003-06-01 17:03 Norman Ramsey
  2003-06-01 17:50 ` Ville-Pertti Keinonen
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Norman Ramsey @ 2003-06-01 17:03 UTC (permalink / raw)
  To: caml-list

We have a program that is spending a lot of time in set operations,
and we're thinking of trying an imperative implementation based on bit vectors.
I would hope that the basis of such an implementation would be an array
of native integers, but on scrutinizing the manual (espeically the chapter
on interfacing to C), I have concluded that such a thing is not possible.
Our choices appear to be

  * An array of native integers, which will be implemented as an array
    of pointers to native integers, because integers are boxed.

  * An array of tagged integers, which will be less efficient as
    dividing by 31 is more expensive than shifting.

How would the gurus recommend that we proceed?  Is there a better,
still efficient data structure for a set of small integers?

Could the compiler gods be persuaded to provide unboxed
representations for arrays of untagged integers, as is already done
for `float array'?


Norman

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
@ 2003-06-01 17:50 ` Ville-Pertti Keinonen
  2003-06-02  8:21 ` Diego Olivier Fernandez Pons
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Ville-Pertti Keinonen @ 2003-06-01 17:50 UTC (permalink / raw)
  To: Norman Ramsey; +Cc: caml-list


> We have a program that is spending a lot of time in set operations,
> and we're thinking of trying an imperative implementation based on bit 
> vectors.
> I would hope that the basis of such an implementation would be an array
> of native integers, but on scrutinizing the manual (espeically the 
> chapter
> on interfacing to C), I have concluded that such a thing is not 
> possible.
> Our choices appear to be
>
>   * An array of native integers, which will be implemented as an array
>     of pointers to native integers, because integers are boxed.
>
>   * An array of tagged integers, which will be less efficient as
>     dividing by 31 is more expensive than shifting.

I'd think that an obvious alternative would be to use custom blocks, 
which can basically be arrays of abstract values of whatever size you 
want to treat them as.

This should be evident based on the information on interfacing to C, if 
you are willing to implement the interface in C.  Of course the obvious 
problem with this approach would be the inefficiency of not being able 
to inline operations on this array.

If the optimizer were smart enough (I don't think it currently is), 
OCaml code treating strings (or even arrays of floats) as bit vectors 
might be a reasonably efficient approach.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
  2003-06-01 17:50 ` Ville-Pertti Keinonen
@ 2003-06-02  8:21 ` Diego Olivier Fernandez Pons
  2003-06-02  8:54 ` Hendrik Tews
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-06-02  8:21 UTC (permalink / raw)
  To: Norman Ramsey; +Cc: caml-list

    Bonjour,

> Is there a better, still efficient data structure for a set of small
> integers?

I am not sure since it's code is quite intricated (maybe could Jerôme
Vouillon give more information), but LibRE seems to use strings as
arrays of small integers.

        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
  2003-06-01 17:50 ` Ville-Pertti Keinonen
  2003-06-02  8:21 ` Diego Olivier Fernandez Pons
@ 2003-06-02  8:54 ` Hendrik Tews
  2003-06-02  8:59 ` Claude Marche
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Hendrik Tews @ 2003-06-02  8:54 UTC (permalink / raw)
  To: caml-list

Norman Ramsey writes:
   
   Could the compiler gods be persuaded to provide unboxed
   representations for arrays of untagged integers, as is already done
   for `float array'?
   
Its already there, in the bigarray library.

Bye,

Hendrik

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
                   ` (2 preceding siblings ...)
  2003-06-02  8:54 ` Hendrik Tews
@ 2003-06-02  8:59 ` Claude Marche
  2003-06-02  9:26 ` Xavier Leroy
  2003-06-02 10:10 ` Fabrice Le Fessant
  5 siblings, 0 replies; 9+ messages in thread
From: Claude Marche @ 2003-06-02  8:59 UTC (permalink / raw)
  To: Norman Ramsey; +Cc: caml-list

>>>>> "Norman" == Norman Ramsey <nr@eecs.harvard.edu> writes:

    Norman> We have a program that is spending a lot of time in set operations,
    Norman> and we're thinking of trying an imperative implementation based on bit vectors.

Did you have a look at the caml hump ? The Bitv module may suit your
needs. 

http://www.lri.fr/~filliatr/software.en.html


-- 
| Claude Marché           | mailto:Claude.Marche@lri.fr |
| LRI - Bât. 490          | http://www.lri.fr/~marche/  |
| Université de Paris-Sud | phoneto: +33 1 69 15 64 85  |
| F-91405 ORSAY Cedex     | faxto: +33 1 69 15 65 86    |

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
                   ` (3 preceding siblings ...)
  2003-06-02  8:59 ` Claude Marche
@ 2003-06-02  9:26 ` Xavier Leroy
  2003-06-02 10:11   ` Marcin 'Qrczak' Kowalczyk
  2003-06-02 10:10 ` Fabrice Le Fessant
  5 siblings, 1 reply; 9+ messages in thread
From: Xavier Leroy @ 2003-06-02  9:26 UTC (permalink / raw)
  To: Norman Ramsey; +Cc: caml-list

> We have a program that is spending a lot of time in set operations,
> and we're thinking of trying an imperative implementation based on
> bit vectors.
> I would hope that the basis of such an implementation would be an array
> of native integers, but on scrutinizing the manual (espeically the chapter
> on interfacing to C), I have concluded that such a thing is not possible.
> Our choices appear to be
> 
>   * An array of native integers, which will be implemented as an array
>     of pointers to native integers, because integers are boxed.
> 
>   * An array of tagged integers, which will be less efficient as
>     dividing by 31 is more expensive than shifting.
> 
> How would the gurus recommend that we proceed?  Is there a better,
> still efficient data structure for a set of small integers?

As others mentioned, you can use strings, which are really arrays of
bytes.  For instance:

type t = string

let create nbits = String.make ((nbits + 7) / 8) '\000'

let is_set s n =
  (Char.code s.[n lsr 3]) land (1 lsl (n land 7)) <> 0

let set s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] lor (1 lsl (n land 7)))

let clear s n =
  let i = n lsr 3 in
  s.[i] <- Char.unsafe_chr (Char.code s.[i] land
                                         (0xFF lxor (1 lsl (n land 7))))

Operations on whole sets such as union and intersection will not be as
fast as they could be with an array of 32- or 64-bit integers
(you're processing the set 8 bits at a time instead of 32 or 64 bits
at a time), though.

Another option is to use 1-dimensional bigarrays of kind int32 or
nativeint.  In bytecode, bigarray accesses are much slower than string
accesses.  However, with sufficient type constraints, ocamlopt can
generate good inline code for bigarray accesses.

For very sparse sets, binary trees (as in the Set module) or Patricia
trees (as in J.-C. Filliâtre's library) can be more efficient, though.

> Could the compiler gods be persuaded to provide unboxed
> representations for arrays of untagged integers, as is already done
> for `float array'?

The special case for float arrays is already a bit of a hack and it
isn't clear this was a really good idea -- although it sure helps with
benchmarks :-)  

The problem with this special case is that every polymorphic array
operation (when the array type is not known statically) is turned into
a run-time test

        if this_is_a_float_array
        then box_float(load_float_from_array)
        else load_word_from_array

If additional special cases were added, these generic array operations
would become even more costly and generate even bigger code...

Hope this helps,

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
                   ` (4 preceding siblings ...)
  2003-06-02  9:26 ` Xavier Leroy
@ 2003-06-02 10:10 ` Fabrice Le Fessant
  5 siblings, 0 replies; 9+ messages in thread
From: Fabrice Le Fessant @ 2003-06-02 10:10 UTC (permalink / raw)
  To: Norman Ramsey; +Cc: caml-list


>  We have a program that is spending a lot of time in set operations,
>  and we're thinking of trying an imperative implementation based on bit vectors.
>  I would hope that the basis of such an implementation would be an array
>  of native integers, but on scrutinizing the manual (espeically the chapter
>  on interfacing to C), I have concluded that such a thing is not possible.
>  Our choices appear to be
>  
>    * An array of native integers, which will be implemented as an array
>      of pointers to native integers, because integers are boxed.
>  
>    * An array of tagged integers, which will be less efficient as
>      dividing by 31 is more expensive than shifting.

You can also lose some space, and use only 16 bits by integer, and the division will be less expensive.

- Fabrice

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] implementing bit vectors in OCaml
  2003-06-02  9:26 ` Xavier Leroy
@ 2003-06-02 10:11   ` Marcin 'Qrczak' Kowalczyk
  0 siblings, 0 replies; 9+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2003-06-02 10:11 UTC (permalink / raw)
  To: caml-list

Dnia pon 2. czerwca 2003 11:26, Xavier Leroy napisał:

> The special case for float arrays is already a bit of a hack and it
> isn't clear this was a really good idea -- although it sure helps with
> benchmarks :-)

I don't like it, perhaps because I don't use float arrays. It makes most of 
the Array module useless as far as performance is concerned because they do 
the array kind check in all inner loops.

float array should be a separate type. Unfortunately OCaml can't provide 
uniform syntax for accessing elements in this case and it would require 
separate functions like those in Array.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Caml-list] implementing bit vectors in OCaml
@ 2003-06-02 13:29 Gregory Morrisett
  0 siblings, 0 replies; 9+ messages in thread
From: Gregory Morrisett @ 2003-06-02 13:29 UTC (permalink / raw)
  To: Norman Ramsey, caml-list

>   * An array of tagged integers, which will be less efficient as
>     dividing by 31 is more expensive than shifting.

Wait -- you can still use tagged integers.  Why do you need
to divide by 31?  You can still use Bits.lsr/lsl/land/etc, no?

-Greg

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-06-02 13:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-01 17:03 [Caml-list] implementing bit vectors in OCaml Norman Ramsey
2003-06-01 17:50 ` Ville-Pertti Keinonen
2003-06-02  8:21 ` Diego Olivier Fernandez Pons
2003-06-02  8:54 ` Hendrik Tews
2003-06-02  8:59 ` Claude Marche
2003-06-02  9:26 ` Xavier Leroy
2003-06-02 10:11   ` Marcin 'Qrczak' Kowalczyk
2003-06-02 10:10 ` Fabrice Le Fessant
2003-06-02 13:29 Gregory Morrisett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).