caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Sparse structure
@ 2005-07-07 19:27 Chris King
       [not found] ` <1120765639.9384.42.camel@titania>
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chris King @ 2005-07-07 19:27 UTC (permalink / raw)
  To: O'Caml Mailing List

I'm trying to create a sparse structure; i.e. a large structure which
doesn't allocate storage for "unused" fields.  Given a structure:

type foo = { a: int option; b: float option }

the best solution I can come up with, short of using Obj.magic, is to
use a hash table like this:

type foo_key = A_key | B_key
type foo_value = A_value of int | B_value of float
type sparse = (foo_key, foo_value) Hashtbl.t

which works, but the extra variant type required means more CPU time
and more keystrokes.  Is there a better solution than this?

The structure will have hundreds of fields, each with a different
type.  Most of the fields will be unused, but usage will be determined
at runtime so using objects is not an option.

Thanks!

- Chris K


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Sparse structure
       [not found] ` <1120765639.9384.42.camel@titania>
@ 2005-07-07 20:04   ` Chris King
  0 siblings, 0 replies; 6+ messages in thread
From: Chris King @ 2005-07-07 20:04 UTC (permalink / raw)
  To: David Teller; +Cc: O'Caml Mailing List

On 7/7/05, David Teller <David.Teller@ens-lyon.org> wrote:
> What's the problem with just using 'a options ?

Most of the values are going to be constant constructors (rather than
large pieces of data), so they'll only take up one word anyway.  My
problem is that I'm going to have hundreds of them, and those hundreds
of words is the space I'd like to save.

I was thinking of grouping the fields hierarchically in multiple
structures, with each one wrapped in an option, but there are reasons
I'd rather not do this (mostly because it'd make fields more difficult
to access).

- Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Sparse structure
       [not found] ` <87r7eamlv9.fsf@linux-france.org>
@ 2005-07-07 20:16   ` Chris King
  0 siblings, 0 replies; 6+ messages in thread
From: Chris King @ 2005-07-07 20:16 UTC (permalink / raw)
  To: David MENTRE; +Cc: O'Caml Mailing List

On 7/7/05, David MENTRE <dmentre@linux-france.org> wrote:
> Unless I haven't understood what you want to do, you don't need the
> foo_key type. The foo_value type already contains the key through the
> sum type.
> 
> So I would use:
> type foo_value = A_value of int | B_value of float
> type sparse = (int (*or whatever is your key*), foo_value) Hashtbl.t

It's not foo_key I have a problem with (that's what I want), it's
foo_value I want to ditch.  I tried using only foo_value and Obj.tag
to generate hash keys based on the foo_value tag, which worked fine
for sticking data into the hash table but not so well for getting it
back out.

> Think also at the readability of your code. Except in very few
> situations, having maintainable code is more important that memory or
> CPU efficient code.

Memory is definitely an issue here; I'm going to have thousands of
these structures each with hundreds of unused fields.  CPU time is not
so much, which is why I'm leaning toward my initial solution, but
having to access structure fields like this:

(match Sparse.get A_key with A_value v -> v | _ -> assert false)

doesn't do much for readability (hence why I want to dump the foo_value type).

- Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Sparse structure
  2005-07-07 19:27 Sparse structure Chris King
       [not found] ` <1120765639.9384.42.camel@titania>
       [not found] ` <87r7eamlv9.fsf@linux-france.org>
@ 2005-07-08  0:57 ` Jacques Garrigue
  2005-07-08 13:29   ` Chris King
  2 siblings, 1 reply; 6+ messages in thread
From: Jacques Garrigue @ 2005-07-08  0:57 UTC (permalink / raw)
  To: colanderman; +Cc: caml-list

From: Chris King <colanderman@gmail.com>

> I'm trying to create a sparse structure; i.e. a large structure which
> doesn't allocate storage for "unused" fields.  Given a structure:
> 
> type foo = { a: int option; b: float option }
> 
> the best solution I can come up with, short of using Obj.magic, is to
> use a hash table like this:
> 
> type foo_key = A_key | B_key
> type foo_value = A_value of int | B_value of float
> type sparse = (foo_key, foo_value) Hashtbl.t
> 
> which works, but the extra variant type required means more CPU time
> and more keystrokes.  Is there a better solution than this?
> 
> The structure will have hundreds of fields, each with a different
> type.  Most of the fields will be unused, but usage will be determined
> at runtime so using objects is not an option.

This is a classical problem, with no good solution that I know.

If you want to avoid Obj.magic, then your solution seems OK, yet
* if your values are really sparse, Map might be better than Hashtbl
  (more compact representation)
* if you have really hundreds of fields, then you have better switch
  to polymorphic variants, as normal ones only allow about 240 cases.

If you are ready to use a bit of Obj, then there are some other
solutions.
For instance, you could do something like this.

module Int = struct type t = int let compare : int -> int -> int = compare end
module M = Map.Make(Int)
type +'a elt
type 'a map = 'a elt M.t

let addA (x : 'a) (m : [> `A of 'a] map) =
  M.add (Obj.magic `A) (Obj.magic x) m

let addB (x : 'a) (m : [> `B of 'a] map) =
  M.add (Obj.magic `B) (Obj.magic x) m

let getA (m : [> `A of 'a] map) : 'a =
  Obj.magic (M.find (Obj.magic `A) m)

let getB (m : [> `B of 'a] map) : 'a =
  Obj.magic (M.find (Obj.magic `B) m)

Note that the result is completely type safe, and you don't need all
maps to contain the same potential fields.
The downside is that you need to define set and get for all fields.
On the other hand, it should be easy to define camlp4 macros
enforcing the same type annotations, avoiding such definitions.

(Polymorphic variants here are only used as phantom types. Note
however that this choice is useful: it ensures that two keys cannot
conflict, as if `A and `B had the same hash values, this would be
detected when constructing the type [> `A of 'a | `B of 'b])

Jacques Garrigue


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Sparse structure
  2005-07-08  0:57 ` Jacques Garrigue
@ 2005-07-08 13:29   ` Chris King
  2005-07-08 23:29     ` Jacques Garrigue
  0 siblings, 1 reply; 6+ messages in thread
From: Chris King @ 2005-07-08 13:29 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: caml-list

On 7/7/05, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
> module Int = struct type t = int let compare : int -> int -> int = compare end
> module M = Map.Make(Int)
> type +'a elt
> type 'a map = 'a elt M.t

Thanks, that works great!  I'm curious though, what is the purpose of
the elt type?  Is it to enforce the use of the map type instead of
M.t?

- Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Sparse structure
  2005-07-08 13:29   ` Chris King
@ 2005-07-08 23:29     ` Jacques Garrigue
  0 siblings, 0 replies; 6+ messages in thread
From: Jacques Garrigue @ 2005-07-08 23:29 UTC (permalink / raw)
  To: colanderman; +Cc: caml-list

From: Chris King <colanderman@gmail.com>

> > module Int = struct type t = int let compare : int -> int -> int = compare end
> > module M = Map.Make(Int)
> > type +'a elt
> > type 'a map = 'a elt M.t
> 
> Thanks, that works great!  I'm curious though, what is the purpose of
> the elt type?  Is it to enforce the use of the map type instead of
> M.t?

Not exactly. [map] is only an abbreviation, used to shorten types in
the rest of the code. [elt] is more fundamental: it both hides the
contents of the map, by being abstract (so you can only access them
through Obj.magic), and has a type parameter which will be used to
enforce the relation between key and data type.

By the way, I've started experimenting with a camlp4 syntax extension
for that, and it seems to works nicely. I'll post it when it gets
cleaner.

Jacques Garrigue


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-07-08 23:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-07 19:27 Sparse structure Chris King
     [not found] ` <1120765639.9384.42.camel@titania>
2005-07-07 20:04   ` [Caml-list] " Chris King
     [not found] ` <87r7eamlv9.fsf@linux-france.org>
2005-07-07 20:16   ` Chris King
2005-07-08  0:57 ` Jacques Garrigue
2005-07-08 13:29   ` Chris King
2005-07-08 23:29     ` Jacques Garrigue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).