On Mon, Sep 30, 2013 at 10:48 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:

> >
> > + For 16-bit and 32-bit architectures:
> >      +---------------+----+----+-----+-------+------+
> >      |     wosize    | ext|cust|noptr| color | tag  |
> >      +---------------+----+----+-----+-------+------+
> > bits  31           21  20   19   18   17   16 15   0
> >
> > - noptr: no pointers present
> > - ext:  uses extension word
> > - cust(om): uses custom word. Custom word is normally used to indicate
> > floats and pointers.
> >
> > 32 bit extension word (present only if ext is 1)
> >      +---------------------------------------------+
> >      |                   wosize                    |
> >      +---------------------------------------------+
> > bits  31                                          0
>
> Why use a full bit for ext? I would define wosize == 0 to mean an
> extension word with the actual size is present. That way sizes up to
> <16KB can be encoded without extension word.
>
>
Great point! Of course, that makes perfect sense. I was feeling like I was
wasting the wosize bits with the extension word but couldn't quite get put
2 and 2 together.
BTW, down the thread is a newer version of the design that reduces the tag
space to 8000 tags, which I do think is sufficient.


>  > 32 bit custom word (default usage - present only if cust is 1):
> >      +----+----------------------------------------+
> >      |nofp|              pfbits                    |
> >      +----+----------------------------------------+
> > bits   31  30                                     0
> >
> > - nofp: a structure with no floats. All pfbits are used for pointers,
> with
> > a 1 signifying a pointer and a 0 signifying a value.
> > - pfbits: indicates which double words are floats and pointers. Starting
> at
> > the highest bit:
> >     - a 0 indicates neither a pointer nor a float
> >     - a 10 indicates a float (double)
> >     - a 11 indicates a pointer
> >     - If noptr is set, each bit indicates a float. If nofp is set, each
> bit
> > indicates a pointer.
>
> There are 3 kinds of values:
>
> 1) pointers with bit 0 == 0
> 2) non-pointers with bit 0 == 1
> 3) floats with all bits used for the type (spanning 2 fields in 32bit)
>
> So if pfbits indicates a float then a field (or 2) is a float and all
> bits are used for the value. Otherwise the bit 0 of the field will
> tell you wether it is a pointer or not. So why would you want to
> duplicate that information in the pfbits?
>

I was thinking of doing it for efficiency. If we're already indicating
what's what, we might as well represent shortcuts to the pointers, which
would cut down on the amount of reading, no? In the average case, the GC
would need to access a lot less memory.


> It might be nice to support C values like untagged ints or unaligned
> pointers. If Custom tag is set then the pfbits become ocaml value
> bits. The GC will only inspect fields with pfbit set. All other fields
> are ignored. The custom_operations handle compare, hash, serialize and
> deserialize so nothing else will access the data.
>
> Another thing are int32 and int64. I guess if you want to unbox those
> then having 2 bits per field in pfbits makes sense again. But then I
> would allocate them as:
>
>     - a 00 indicates a tagged value (int or pointer)
>     - a 01 indicates a non-pointer: int, int32, native int, C pointer
>     - a 10 indicates a float (double)
>     - a 11 indicates an int64
>
> The higher bit would indicate a 64bit value, meaning spanning 2 fields
> on 32bit. Not that those 4 values allow mixing ocaml values, C values,
> int32, int64 and float in a block.
>
> I would combine the noptr and nofp bits into a single 2bit field:
>
>     - a 00 indicates no pointers and no double size, no pfbits
>     - a 01 indicates no double size, pfbits indicate tagged / non-pointer
>     - a 10 indicates no pointers but double size, pfbits indicate size
>     - a 11 indicates both pointers and double size, 2 pfbits per field
>
> Note: tagged integers can be stored as 00 or 01. I think this would be
> required for polymorphic types. An 'a could be int or pointer. In both
> cases 00 will work.
>
>
I really like this idea -- unboxing more types could be really useful. I'm
not sure double 'size' would work, however. It should be fine for the
marshal module, but polymorphic comparison would get messed up because
floats have to be compared differently. So I think 10 in the bit field
should indicate no pointers but floats, while 11 could allow both pointers
and double size, with the 2-bits specifying if it's a float or an int64 (as
you've outlined). Of course, one cannot have both shortcuts to pointers and
enhanced unboxing, so let me know what you think about the performance
increase from shortcutting the tag bit.

Yotam