From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 1B8D17EE4B for ; Fri, 27 Sep 2013 16:06:19 +0200 (CEST) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of yotambarnoy@gmail.com) identity=pra; client-ip=209.85.216.182; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="yotambarnoy@gmail.com"; x-sender="yotambarnoy@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of yotambarnoy@gmail.com designates 209.85.216.182 as permitted sender) identity=mailfrom; client-ip=209.85.216.182; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="yotambarnoy@gmail.com"; x-sender="yotambarnoy@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-qc0-f182.google.com) identity=helo; client-ip=209.85.216.182; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="yotambarnoy@gmail.com"; x-sender="postmaster@mail-qc0-f182.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArgBALWPRVLRVdi2m2dsb2JhbABZhBHBXwgWDgEBAQEBBgsLCRQogmwBGx4DEhBdAREBBQEiiAYBAw+ZG4MDjFKDCoQIChknDWSJAAEFDJNqA5d9kA4YKYRpIA X-IPAS-Result: ArgBALWPRVLRVdi2m2dsb2JhbABZhBHBXwgWDgEBAQEBBgsLCRQogmwBGx4DEhBdAREBBQEiiAYBAw+ZG4MDjFKDCoQIChknDWSJAAEFDJNqA5d9kA4YKYRpIA X-IronPort-AV: E=Sophos;i="4.90,993,1371074400"; d="scan'208";a="28328979" Received: from mail-qc0-f182.google.com ([209.85.216.182]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 27 Sep 2013 16:06:18 +0200 Received: by mail-qc0-f182.google.com with SMTP id n4so1771419qcx.27 for ; Fri, 27 Sep 2013 07:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=Eez6WTSufrGxHYpzfVYOVpc/L3hv65PCJqgRouFAc8A=; b=CbefwV23Y0868QJMh3MXv8MvNf/kg/zSfFaN4u1oKhYnGy+d7u2Mrus71Z8ZyZ0qSv d0wMwxrtLUa/2J6Zqqp8USMTlb5WI0fuE/+pD0FslDtHI1VSUT+7ez+cp+Ao0fijqwhK x+hJGxuh17eAB/uXtXpauk54rRpMie9Tj+PnjMERoSkC80+YJ/6vwxqS7wspjdv2+267 Du13CymuHpPXyOIgELpQzD74oGbedeBMQqAOhLg/uUOf0b1zhr0HZ8pug63npKIdJE3E /JRvqn/LWBRrqxM9HJeMQdPiZ4dofHd1OWEL5w648P3Tlt2ol6myLEXqB3yVZTf4E35f 07mw== X-Received: by 10.49.35.203 with SMTP id k11mr9265299qej.5.1380290776859; Fri, 27 Sep 2013 07:06:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.139.20 with HTTP; Fri, 27 Sep 2013 07:05:56 -0700 (PDT) From: Yotam Barnoy Date: Fri, 27 Sep 2013 10:05:56 -0400 Message-ID: To: Ocaml Mailing List Content-Type: multipart/alternative; boundary=047d7b62206e71141004e75dfde1 Subject: [Caml-list] Proposal: re-design of ocaml headers --047d7b62206e71141004e75dfde1 Content-Type: text/plain; charset=ISO-8859-1 Following up on the previous thread I started in this general topic, I present some more thinking I've done on the redesign of ocaml's headers. The purpose of this redesign is to lift the tag number restriction as well as size limits on 32-bit platforms. At the same time, header bits can be used to indicate floats, allowing cheaper usage of floats in data structures, and even to indicate the presence of pointers, making the traversal of some data structures by the GC unnecessary. The basic idea of this redesign is that most allocations need only a small amount of space, but a large number of tags is a necessity. If you're allocating a large block of memory (>8KB) then you can spare another word for the size. The pfbits (as shown below) are an efficient way of representing both floats and pointers in a data structure, at the cost of disallowing random access. From what I can gather, random access to this data is never needed, since the GC, Marshal module, and polymorphic comparison all process the whole data structure rather than referring to specific parts of it. Open issue: making float representation efficient on the stack. + For 16-bit and 32-bit architectures: +---------------+----+----+-----+-------+------+ | wosize | ext|cust|noptr| color | tag | +---------------+----+----+-----+-------+------+ bits 31 21 20 19 18 17 16 15 0 - noptr: no pointers present - ext: uses extension word - cust(om): uses custom word. Custom word is normally used to indicate floats and pointers. 32 bit extension word (present only if ext is 1) +---------------------------------------------+ | wosize | +---------------------------------------------+ bits 31 0 32 bit custom word (default usage - present only if cust is 1): +----+----------------------------------------+ |nofp| pfbits | +----+----------------------------------------+ bits 31 30 0 - nofp: a structure with no floats. All pfbits are used for pointers, with a 1 signifying a pointer and a 0 signifying a value. - pfbits: indicates which double words are floats and pointers. Starting at the highest bit: - a 0 indicates neither a pointer nor a float - a 10 indicates a float (double) - a 11 indicates a pointer - If noptr is set, each bit indicates a float. If nofp is set, each bit indicates a pointer. + For 64-bit architectures: +----------------+--------+----+----+-----+-------+------+ | pfbits | wosize |cust|nofp|noptr| color | tag | +----------------+--------+----+----+-----+-------+------+ bits 63 40 39 21 20 19 18 17 16 15 0 - noptr: a structure with no pointers. All pfbits are used for floats, with a 1 signifying a float and a 0 signifying a non-float. - nofp: a structure with no floats. All pfbits are used for pointers, with a 1 signifying a pointer and a 0 signifying a value. - If both noptr and nofp are set, wosize is extended to include the pfbits. - cust(om): uses custom double word. Custom double word is normally used to indicate more floats and pointers, but functionality can change with certain tags. - If the custom bit is set, wosize is expanded to include the pfbits in the main header. - pfbits: indicates which double words are floats and pointers. Starting at the highest bit: - a 0 indicates neither a pointer nor a float - a 10 indicates a float (double) - a 11 indicates a pointer - If noptr is set, each bit indicates a float. If nofp is set, each bit indicates a pointer. 64 bit custom header (default usage indicated - present only if cust is 1): +--------------------------------------------------------+ | pfbits | +--------------------------------------------------------+ bits 63 0 - pfbits: indicates which double words are floats and pointers. Starting at the highest bit: - a 0 indicates neither a pointer nor a float - a 10 indicates a float (double) - a 11 indicates a pointer - If noptr is set, each bit indicates a float. If nofp is set, each bit indicates a pointer. + Tags: - I think it's a good idea to move custom tags to the low end of the spectrum, and add more there if any are needed. This way, if the tag field is ever expanded, it's not necessary to move the custom tags again. - 0: Closure tag - 1: Infix tag (must be 1 mod 4) - 2: Lazy tag - 3: Object tag - 4: Forward tag - 5: Abstract tag - 6: String tag - 7: Double tag - 8: Custom tag - 9: Proposed tag: custom array. Half of custom header is used to indicate array member size, so one could have an array of tuples, saving both memory and indirections. - 100: Start of user tags Yotam Barnoy --047d7b62206e71141004e75dfde1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Following up on the previous thread I started in this gene= ral topic, I present some more thinking I've done on the redesign of oc= aml's headers. The purpose of this redesign is to lift the tag number r= estriction as well as size limits on 32-bit platforms. At the same time, he= ader bits can be used to indicate floats, allowing cheaper usage of floats = in data structures, and even to indicate the presence of pointers, making t= he traversal of some data structures by the GC unnecessary.

The basic idea of this redesign is that most allocations need only= a small amount of space, but a large number of tags is a necessity. If you= 're allocating a large block of memory (>8KB) then you can spare ano= ther word for the size.

The pfbits (as shown below) are an efficient way of representing both f= loats and pointers in a data structure, at the cost of disallowing random a= ccess. From what I can gather, random access to this data is never needed, = since the GC, Marshal module, and polymorphic comparison all process the wh= ole data structure rather than referring to specific parts of it.

Open issue: making float representation efficient on the sta= ck.


+ For 16-bit and 32-bit architectures:
=A0=A0=A0=A0 +---------------+--= --+----+-----+-------+------+
=A0=A0=A0=A0 |=A0=A0=A0=A0 wosize=A0=A0=A0 | ext|cust|noptr| color | tag=A0= |
=A0=A0=A0=A0 +---------------+----+----+-----+-------+------+
bits= =A0 31=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 21=A0 20=A0=A0 19=A0=A0 18=A0=A0 17=A0= =A0 16 15=A0=A0 0

- noptr: no pointers present
- ext:=A0 uses ext= ension word
- cust(om): uses custom word. Custom word is normally used to indicate floa= ts and pointers.

32 bit extension word (present only if ext is 1)=A0=A0=A0=A0 +---------------------------------------------+
=A0=A0=A0= =A0 |=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 wosize=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 |
=A0=A0=A0=A0 +---------------------------------------------+
bits=A0 31= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0

32 bit custom wor= d (default usage - present only if cust is 1):
=A0=A0=A0=A0 +----+------= ----------------------------------+
=A0=A0=A0=A0 |nofp|=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 pfbits=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 |
=A0=A0=A0=A0 +----+--= --------------------------------------+
bits=A0=A0 31=A0 30=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 0

- nofp: a st= ructure with no floats. All pfbits are used for pointers, with a 1 signifyi= ng a pointer and a 0 signifying a value.
- pfbits: indicates which double words are floats and pointers. Star= ting at the highest bit:
=A0=A0=A0 - a 0 indicates neither a pointer nor= a float
=A0=A0=A0 - a 10 indicates a float (double)
=A0=A0=A0 - a 11= indicates a pointer
=A0=A0=A0 - If noptr is set, each bit indicates a float. If nofp is set, ea= ch bit indicates a pointer.

+ For 64-bit architectures:

=A0= =A0=A0=A0 +----------------+--------+----+----+-----+-------+------+
=A0= =A0=A0=A0 |=A0=A0=A0=A0 pfbits=A0=A0=A0=A0 | wosize |cust|nofp|noptr| color= | tag=A0 |
=A0=A0=A0=A0 +----------------+--------+----+----+-----+-------+------+
= bits=A0 63=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 40 39=A0=A0=A0 21=A0 20=A0=A0 1= 9=A0=A0 18=A0=A0 17=A0=A0 16 15=A0=A0 0

- noptr: a structure with no= pointers. All pfbits are used for floats, with a 1 signifying a float and = a 0 signifying a non-float.
- nofp: a structure with no floats. All pfbits are used for pointers, with = a 1 signifying a pointer and a 0 signifying a value.
- If both noptr and= nofp are set, wosize is extended to include the pfbits.
- cust(om): use= s custom double word. Custom double word is normally used to indicate more = floats and pointers, but functionality can change with certain tags.
=A0=A0=A0 - If the custom bit is set, wosize is expanded to include the pfb= its in the main header.
- pfbits: indicates which double words are float= s and pointers. Starting at the highest bit:
=A0=A0=A0 - a 0 indicates n= either a pointer nor a float
=A0=A0=A0 - a 10 indicates a float (double)
=A0=A0=A0 - a 11 indicates a= pointer
=A0=A0=A0 - If noptr is set, each bit indicates a float. If nof= p is set, each bit indicates a pointer.

64 bit custom header (defaul= t usage indicated - present only if cust is 1):
=A0=A0=A0=A0 +--------------------------------------------------------+
= =A0=A0=A0=A0 |=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 pfbits=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 |
=A0=A0=A0=A0 +-------------------------------------= -------------------+
bits=A0 63=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0

- pfbits: indicates which double words are floats and pointers. Startin= g at the highest bit:
=A0=A0=A0 - a 0 indicates neither a pointer nor a = float
=A0=A0=A0 - a 10 indicates a float (double)
=A0=A0=A0 - a 11 in= dicates a pointer
=A0=A0=A0 - If noptr is set, each bit indicates a float. If nofp is set, ea= ch bit indicates a pointer.

+ Tags:
- I think it's a good ide= a to move custom tags to the low end of the spectrum, and add more there if= any are needed. This way, if the tag field is ever expanded, it's not = necessary to move the custom tags again.

- 0: Closure tag
- 1: Infix tag (must be 1 mod 4)
- 2: Lazy tag- 3: Object tag
- 4: Forward tag
- 5: Abstract tag
- 6: String t= ag
- 7: Double tag
- 8: Custom tag
- 9: Proposed tag: custom array= . Half of custom header is used to indicate array member size, so one could= have an array of tuples, saving both memory and indirections.
- 100: Start of user tags

Yotam Barnoy
--047d7b62206e71141004e75dfde1--