From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id CC349BCAF for ; Thu, 16 Jun 2005 21:02:08 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j5GJ28fM024599 for ; Thu, 16 Jun 2005 21:02:08 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA17662 for ; Thu, 16 Jun 2005 21:02:07 +0200 (MET DST) Received: from nef2.ens.fr (nef2.ens.fr [129.199.96.40]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j5GJ27xk004724 for ; Thu, 16 Jun 2005 21:02:07 +0200 Received: from clipper.ens.fr (clipper-gw.ens.fr [129.199.1.22]) by nef2.ens.fr (8.13.2/1.01.28121999) with ESMTP id j5GJ27vG092219 ; Thu, 16 Jun 2005 21:02:07 +0200 (CEST) X-Envelope-To: caml-list@inria.fr Received: from (george@localhost) by clipper.ens.fr (8.13.1/jb-1.1) Date: Thu, 16 Jun 2005 21:02:07 +0200 From: Nicolas George To: Caml mailing list Cc: David MENTRE Subject: Re: [Caml-list] How to handle endianness and binary string conversion for 32 bits integers (Int32)? Message-ID: <20050616190206.GA553@clipper.ens.fr> References: <87slzima67.fsf@linux-france.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pf9I7BMVVzbSWLtt" Content-Disposition: inline In-Reply-To: <87slzima67.fsf@linux-france.org> User-Agent: Mutt/1.5.9i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.5.10 (nef2.ens.fr [129.199.96.32]); Thu, 16 Jun 2005 21:02:07 +0200 (CEST) X-Miltered: at concorde with ID 42B1CCB0.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 42B1CCAF.002 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 binary:01 integers:01 endian:01 integers:01 binary:01 buffer:01 bounded:01 endian:01 runtime:01 compilers:01 inverting:01 lexical:01 integer:01 structures:01 X-Attachments: type="application/pgp-signature" name="signature.asc" X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: --pf9I7BMVVzbSWLtt Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable L'octidi 28 prairial, an CCXIII, David MENTRE a =E9crit=A0: > 1. convert between big and little endian 32 bits integers; Don't do that. > 2. convert between 32 bits integers and string binary representation > (to store integers in Buffer and string data structures); What you mean to do is represent an integer in a bounded interval as a fixed-length sequence of finite-valued objects. Said that way, children learn how to do it in school: it's writing the number in some base. Since bytes in a string can take 256 values, one will obviously use base 256. The first (rightmost) "digit" will be (n mod 256). The second "digit" will be ((n / 256) mod 256). The third "digit" will be ((n / (256 * 256)) mod 256) The fourth (leftmost) "digit" will be ((n / (256 * 256 * 256)) mod 256). And so on, but since your numbers are less than 256*256*256*256, all remaining "digits" are 0. So all you have to do is store these four bytes in your string, in any order you may prefer. "Big endian" is when you store the fourth, the third, the second and the first; it is the nearest to the way we humans write numbers; and the lexical order is the same as the numeric order. "Small endian" is when you store the first, the second, the third and the fourth. But, and that is important, this does not depend on the hardware it runs on: it is purely arithmetic. The reverse operation is simply n =3D d1 + d2 * 256 + d3 * 256 * 256 + d4 * 256 * 256 * 256 > 3. detect machine endianness at runtime. Don't do that. I develop: there are no guarantees that numbers are either in big or little endian. I have heard that some architectures exist where 8-bits bytes in 16-bits words are in little endian, but 16-bits words in 32-bit words are in big endian, which gives 3412 as a global order. Using the internal representation of integers can so never be reliable. On the contrary, compilers ensure that arithmetic in reasonable interval is the real Peano arithmetic, for all architectures. Using the internal representation of numbers may allow to gain some cycles on the packing-unpacking, but it is probably nothing in regard to anything that will be done with the data (disc access or network for example). Furthermore, if you have to worry about inverting the order of the bytes in the number, the gain will be even smaller. --pf9I7BMVVzbSWLtt Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (SunOS) iD8DBQFCscyusGPZlzblTJMRAj+cAJ4ipTWJrFG4XjFOqm+H/Y8OFqzZUwCfZlme PRQvgAk60U5ZV3HVAu0V56Q= =jglo -----END PGP SIGNATURE----- --pf9I7BMVVzbSWLtt--