caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Primitive sizes
@ 2004-09-29  6:43 Jonathan Bryant
  2004-09-29 15:02 ` Brian Hurt
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Bryant @ 2004-09-29  6:43 UTC (permalink / raw)
  To: caml-list

I would like to know the sizes of the "primitive" types in OCaml (I
assume that they vary per platform, but one can hope that they are
standard...)  If they do vary , is there any way to define new types
(similar to C typedef macro)?  I would like to create 8-, 16-, 32-, and
64-bit integers, 32- and 64-bit floats, and 16-bit characters.  I know i
could just create Int32s and Int64s and manipulate the bits ignoring the
ones I don't need, but is there a way to allocate just the necessary
memory without interfacing to C?  If not, can anyone point me in a good
direction to learn how to interface with C (by "good" I mean that a
tutorial is better/more preferable than a language specification...)?

--Jonathan Bryant
  AIM: JonBoy3182

  "The three principal virtues of a programmer are Laziness,
   Impatience, and Hubris."
	-- Perl man page

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Primitive sizes
  2004-09-29  6:43 [Caml-list] Primitive sizes Jonathan Bryant
@ 2004-09-29 15:02 ` Brian Hurt
  2004-09-30  6:16   ` Jonathan Bryant
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Hurt @ 2004-09-29 15:02 UTC (permalink / raw)
  To: Jonathan Bryant; +Cc: caml-list

On Wed, 29 Sep 2004, Jonathan Bryant wrote:

> I would like to know the sizes of the "primitive" types in OCaml (I
> assume that they vary per platform, but one can hope that they are
> standard...)  

ints are either 31 or 63 bits, depending upon if you're on 32 or 64 bit 
machines (one bit is stolen for the tag bit).  Int32 and Int64 have the 
obvious bit size, but they are boxed integers (as opposed to ints, which 
are unboxed).  Chars are 8 bits and unboxed- but can't be used as short 
integers directly.

This should be a FAQ, if it isn't already.  We just recently had a 
discussion on this very mailing list on why ints are one bit short- I'd 
search the archives and read the discussion before bringing that 
discussion up again.

> If they do vary , is there any way to define new types
> (similar to C typedef macro)?  I would like to create 8-, 16-, 32-, and
> 64-bit integers, 32- and 64-bit floats, and 16-bit characters.  I know i
> could just create Int32s and Int64s and manipulate the bits ignoring the
> ones I don't need, but is there a way to allocate just the necessary
> memory without interfacing to C?  If not, can anyone point me in a good
> direction to learn how to interface with C (by "good" I mean that a
> tutorial is better/more preferable than a language specification...)?

The Ocaml manual has a good section on interfacing to C.  But I have to 
ask the question: why bother?  Especially with the integers?

First off, Ocaml holds all variables in single words- which are defined as 
the size of a pointer on the current machine.  If you have a char list, 
every single char in that list takes up three words- one word for the list 
element tag, one word for the next pointer, and the char itself takes up 
one word.  Likewise, if you have a char array, every element in the array 
takes up one whole word (this is why strings are not char arrays).  This 
allows Ocaml to share code- a function that handles a 'a array can now 
handle an array of chars, ints, floats, booleans, or foos.  If the type 
isn't unboxed (int, char, boolean) the array or list holds a reference to 
the type- which is still just a word.

The humorous thing is that C doesn't save as much as most people think it 
does in using smaller types- this is because pretty much all C compilers 
these days pad the data.  Accessing data that is aligned is signifigantly 
faster than accessing data that isn't aligned (and on many CPUs, you can't 
access misaligned data), so the C compiler inserts padding- unused bytes- 
to keep the data aligned.  For example, how large is the following 
structure on a 32-bit platform (ints are 4 bytes)?

struct foo {
    char c;
    int i;
};

You might say five bytes- four for the int and one for the char.  You'd be
wrong- the compiler will almost certainly add three bytes of padding
between c and i to keep i aligned- meaning the size of the structure is
actualy 8 bytes.  The char takes up a full four bytes all by it's 
lonesome.

Changing the order doesn't help.  Consider the following structure:

struct foo2 {
    int i;
    char c;
};

Now, the int doesn't follow the char.  The char can't be misaligned, so 
you don't need padding, do you?  Well, yes you still do need padding.  The 
C standard says the size of a structure will be padded out so that arrays 
of the structure are still aligned- effectively, that given a pointer p, 
the access:
    ((struct foo2 *) p)->i
to i is still aligned.  So again, the size of the structure is still 8 
bytes, and the char is still taking up a full four bytes.

Padding also shows up on local variables and function arguments in C.  
Consider the function:

void bar (char c) {
    char t;
    ...

How much memory does the argument c and the local variable t take up?  
Again- the compiler needs to keep the stack aligned, so variables and 
arguments get padded- both take up a full word.

If you have multiple variables of the same type, the shorter types do save 
some memory.  For example, this structure also only takes up two words of 
memory:
struct foo3 {
    int i;
    char c;
    char d;
};

But this requires you sort your variables, and happens less often than
people think.  This is why Ocaml isn't the memory hog a naive analysis
might make you think it is.

In nine years of professional C programming and 15 years of hobbyist 
programming, I have come to the conclusion that the main use of the 
various C int types- which, by the way, not only includes char, short, 
int, and long in both signed and unsigned varieties, but also size_t, 
ssize_t, off_t, ptrdiff_t, pid_t, etc.- is to introduce bugs by allowing 
you to pick the wrong int type.

So the question becomes- why do you need the other integer types?



-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Primitive sizes
  2004-09-29 15:02 ` Brian Hurt
@ 2004-09-30  6:16   ` Jonathan Bryant
  2004-09-30 20:54     ` Brian Hurt
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Bryant @ 2004-09-30  6:16 UTC (permalink / raw)
  To: caml-list

Ok.  Maybe I need to make clear what I'm doing.  I'm trying to write
native Java methods and have them call an OCaml library.  The problem is
that I need Java compatible primitives to pass back from the methods. 
In Java (ON ALL PLATFORMS), the primitives fall this way (in case you
don't know them off the top of your head):
 Type     Bits
 -----    ----
byte    - 8
short   - 16
int     - 32
long    - 64
float   - 32
double  - 64
char    - 16
boolean - 8

Is there a way to create these, or should I just stick to C?  I
understand how to do them in C/C++, but I was hoping it was possible to
do this in OCaml as well.

On Wed, 2004-09-29 at 11:02, Brian Hurt wrote:
> On Wed, 29 Sep 2004, Jonathan Bryant wrote:
> 
> > I would like to know the sizes of the "primitive" types in OCaml (I
> > assume that they vary per platform, but one can hope that they are
> > standard...)  
> 
> ints are either 31 or 63 bits, depending upon if you're on 32 or 64 bit 
> machines (one bit is stolen for the tag bit).  Int32 and Int64 have the 
> obvious bit size, but they are boxed integers (as opposed to ints, which 
> are unboxed).  Chars are 8 bits and unboxed- but can't be used as short 
> integers directly.
> 
> This should be a FAQ, if it isn't already.  We just recently had a 
> discussion on this very mailing list on why ints are one bit short- I'd 
> search the archives and read the discussion before bringing that 
> discussion up again.
> 
> > If they do vary , is there any way to define new types
> > (similar to C typedef macro)?  I would like to create 8-, 16-, 32-, and
> > 64-bit integers, 32- and 64-bit floats, and 16-bit characters.  I know i
> > could just create Int32s and Int64s and manipulate the bits ignoring the
> > ones I don't need, but is there a way to allocate just the necessary
> > memory without interfacing to C?  If not, can anyone point me in a good
> > direction to learn how to interface with C (by "good" I mean that a
> > tutorial is better/more preferable than a language specification...)?
> 
> The Ocaml manual has a good section on interfacing to C.  But I have to 
> ask the question: why bother?  Especially with the integers?
> 
> First off, Ocaml holds all variables in single words- which are defined as 
> the size of a pointer on the current machine.  If you have a char list, 
> every single char in that list takes up three words- one word for the list 
> element tag, one word for the next pointer, and the char itself takes up 
> one word.  Likewise, if you have a char array, every element in the array 
> takes up one whole word (this is why strings are not char arrays).  This 
> allows Ocaml to share code- a function that handles a 'a array can now 
> handle an array of chars, ints, floats, booleans, or foos.  If the type 
> isn't unboxed (int, char, boolean) the array or list holds a reference to 
> the type- which is still just a word.
> 
> The humorous thing is that C doesn't save as much as most people think it 
> does in using smaller types- this is because pretty much all C compilers 
> these days pad the data.  Accessing data that is aligned is signifigantly 
> faster than accessing data that isn't aligned (and on many CPUs, you can't 
> access misaligned data), so the C compiler inserts padding- unused bytes- 
> to keep the data aligned.  For example, how large is the following 
> structure on a 32-bit platform (ints are 4 bytes)?
> 
> struct foo {
>     char c;
>     int i;
> };
> 
> You might say five bytes- four for the int and one for the char.  You'd be
> wrong- the compiler will almost certainly add three bytes of padding
> between c and i to keep i aligned- meaning the size of the structure is
> actualy 8 bytes.  The char takes up a full four bytes all by it's 
> lonesome.
> 
> Changing the order doesn't help.  Consider the following structure:
> 
> struct foo2 {
>     int i;
>     char c;
> };
> 
> Now, the int doesn't follow the char.  The char can't be misaligned, so 
> you don't need padding, do you?  Well, yes you still do need padding.  The 
> C standard says the size of a structure will be padded out so that arrays 
> of the structure are still aligned- effectively, that given a pointer p, 
> the access:
>     ((struct foo2 *) p)->i
> to i is still aligned.  So again, the size of the structure is still 8 
> bytes, and the char is still taking up a full four bytes.
> 
> Padding also shows up on local variables and function arguments in C.  
> Consider the function:
> 
> void bar (char c) {
>     char t;
>     ...
> 
> How much memory does the argument c and the local variable t take up?  
> Again- the compiler needs to keep the stack aligned, so variables and 
> arguments get padded- both take up a full word.
> 
> If you have multiple variables of the same type, the shorter types do save 
> some memory.  For example, this structure also only takes up two words of 
> memory:
> struct foo3 {
>     int i;
>     char c;
>     char d;
> };
> 
> But this requires you sort your variables, and happens less often than
> people think.  This is why Ocaml isn't the memory hog a naive analysis
> might make you think it is.
> 
> In nine years of professional C programming and 15 years of hobbyist 
> programming, I have come to the conclusion that the main use of the 
> various C int types- which, by the way, not only includes char, short, 
> int, and long in both signed and unsigned varieties, but also size_t, 
> ssize_t, off_t, ptrdiff_t, pid_t, etc.- is to introduce bugs by allowing 
> you to pick the wrong int type.
> 
> So the question becomes- why do you need the other integer types?
-- 
--Jonathan Bryant
  AIM: JonBoy3182

  "The three principal virtues of a programmer are Laziness,
   Impatience, and Hubris."
	-- Perl man page

  "Usenet is like a herd of performing elephants with diarrhea -- massive,
   difficult to redirect, awe-inspiring, entertaining, and a source of
   mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford
  OAS AAS LLS
  ZG214

  TeKE For Life
  ZN759

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Primitive sizes
  2004-09-30  6:16   ` Jonathan Bryant
@ 2004-09-30 20:54     ` Brian Hurt
  2004-10-01  9:36       ` Richard Jones
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Hurt @ 2004-09-30 20:54 UTC (permalink / raw)
  To: Jonathan Bryant; +Cc: caml-list

On Thu, 30 Sep 2004, Jonathan Bryant wrote:

> Ok.  Maybe I need to make clear what I'm doing.  I'm trying to write
> native Java methods and have them call an OCaml library.  The problem is
> that I need Java compatible primitives to pass back from the methods. 
> In Java (ON ALL PLATFORMS), the primitives fall this way (in case you
> don't know them off the top of your head):

I don't know what Java's calling convention is.  My advice:

>  Type     Bits
>  -----    ----
> byte    - 8
> short   - 16
> char    - 16
> boolean - 8

The above I'd just promote to ints for Ocaml, and truncate as necessary.

> int     - 32

I'd use Int32's.

> long    - 64

I'd use Int64's.

> float   - 32
> double  - 64

I'd be inclined to promote both of these to Ocaml floats, which are 64-bit 
IEEE 754 floats (the same as Java's).  The extra precision won't hurt 
anything except stupidly strict binary compatability.

I assume you're using one of the Ocaml->JVM compilers kicking around.  
Tying Ocaml native mode (or VM) code to Java would be way too much work.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Primitive sizes
  2004-09-30 20:54     ` Brian Hurt
@ 2004-10-01  9:36       ` Richard Jones
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Jones @ 2004-10-01  9:36 UTC (permalink / raw)
  Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

On Thu, Sep 30, 2004 at 03:54:33PM -0500, Brian Hurt wrote:
> I don't know what Java's calling convention is.  My advice:
> 
> >  Type     Bits
> >  -----    ----
> > byte    - 8
> > short   - 16
> > char    - 16
> > boolean - 8
> 
> The above I'd just promote to ints for Ocaml, and truncate as necessary.

We have a similar problem with the ocamldbi Dbi interface.  It maps
SQL int4 and serial types (ie. full 32 bit signed ints) to OCaml int.

The reason is convenience.  It's MUCH more convenient to deal with
OCaml int than boxed Int32.

However if anyone has a database with more than about 1 billion rows
in it, then they may find they have a problem ...

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
"One serious obstacle to the adoption of good programming languages is
the notion that everything has to be sacrificed for speed. In computer
languages as in life, speed kills." -- Mike Vanier

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-10-01  9:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-29  6:43 [Caml-list] Primitive sizes Jonathan Bryant
2004-09-29 15:02 ` Brian Hurt
2004-09-30  6:16   ` Jonathan Bryant
2004-09-30 20:54     ` Brian Hurt
2004-10-01  9:36       ` Richard Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).