From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <weis@pauillac.inria.fr>
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id CC349BCAF
	for <caml-list@yquem.inria.fr>; Thu, 16 Jun 2005 21:02:08 +0200 (CEST)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j5GJ28fM024599
	for <caml-list@yquem.inria.fr>; Thu, 16 Jun 2005 21:02:08 +0200
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA17662 for <caml-list@pauillac.inria.fr>; Thu, 16 Jun 2005 21:02:07 +0200 (MET DST)
Received: from nef2.ens.fr (nef2.ens.fr [129.199.96.40])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j5GJ27xk004724
	for <caml-list@inria.fr>; Thu, 16 Jun 2005 21:02:07 +0200
Received: from clipper.ens.fr (clipper-gw.ens.fr [129.199.1.22])
          by nef2.ens.fr (8.13.2/1.01.28121999) with ESMTP id j5GJ27vG092219
          ; Thu, 16 Jun 2005 21:02:07 +0200 (CEST)
X-Envelope-To: caml-list@inria.fr
Received: from (george@localhost) by clipper.ens.fr (8.13.1/jb-1.1)
Date: Thu, 16 Jun 2005 21:02:07 +0200
From: Nicolas George <nicolas.george@ens.fr>
To: Caml mailing list <caml-list@inria.fr>
Cc: David MENTRE <dmentre@linux-france.org>
Subject: Re: [Caml-list] How to handle endianness and binary string conversion for 32 bits integers (Int32)?
Message-ID: <20050616190206.GA553@clipper.ens.fr>
References: <87slzima67.fsf@linux-france.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="pf9I7BMVVzbSWLtt"
Content-Disposition: inline
In-Reply-To: <87slzima67.fsf@linux-france.org>
User-Agent: Mutt/1.5.9i
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.5.10 (nef2.ens.fr [129.199.96.32]); Thu, 16 Jun 2005 21:02:07 +0200 (CEST)
X-Miltered: at concorde with ID 42B1CCB0.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Miltered: at nez-perce with ID 42B1CCAF.002 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 binary:01 integers:01 endian:01 integers:01 binary:01 buffer:01 bounded:01 endian:01 runtime:01 compilers:01 inverting:01 lexical:01 integer:01 structures:01 
X-Attachments: type="application/pgp-signature" name="signature.asc" 
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.2
X-Spam-Level: 


--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

L'octidi 28 prairial, an CCXIII, David MENTRE a =E9crit=A0:
>  1. convert between big and little endian 32 bits integers;

Don't do that.

>  2. convert between 32 bits integers and string binary representation
>     (to store integers in Buffer and string data structures);

What you mean to do is represent an integer in a bounded interval as a
fixed-length sequence of finite-valued objects. Said that way, children
learn how to do it in school: it's writing the number in some base. Since
bytes in a string can take 256 values, one will obviously use base 256.

The first (rightmost) "digit" will be (n mod 256).
The second "digit" will be ((n / 256) mod 256).
The third "digit" will be ((n / (256 * 256)) mod 256)
The fourth (leftmost) "digit" will be ((n / (256 * 256 * 256)) mod 256).

And so on, but since your numbers are less than 256*256*256*256, all
remaining "digits" are 0. So all you have to do is store these four bytes in
your string, in any order you may prefer.

"Big endian" is when you store the fourth, the third, the second and the
first; it is the nearest to the way we humans write numbers; and the lexical
order is the same as the numeric order. "Small endian" is when you store the
first, the second, the third and the fourth.

But, and that is important, this does not depend on the hardware it runs on:
it is purely arithmetic.

The reverse operation is simply

n =3D d1 + d2 * 256 + d3 * 256 * 256 + d4 * 256 * 256 * 256

>  3. detect machine endianness at runtime.

Don't do that. I develop: there are no guarantees that numbers are either in
big or little endian. I have heard that some architectures exist where
8-bits bytes in 16-bits words are in little endian, but 16-bits words in
32-bit words are in big endian, which gives 3412 as a global order.

Using the internal representation of integers can so never be reliable. On
the contrary, compilers ensure that arithmetic in reasonable interval is the
real Peano arithmetic, for all architectures.

Using the internal representation of numbers may allow to gain some cycles
on the packing-unpacking, but it is probably nothing in regard to anything
that will be done with the data (disc access or network for example).
Furthermore, if you have to worry about inverting the order of the bytes in
the number, the gain will be even smaller.

--pf9I7BMVVzbSWLtt
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (SunOS)

iD8DBQFCscyusGPZlzblTJMRAj+cAJ4ipTWJrFG4XjFOqm+H/Y8OFqzZUwCfZlme
PRQvgAk60U5ZV3HVAu0V56Q=
=jglo
-----END PGP SIGNATURE-----

--pf9I7BMVVzbSWLtt--