caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Maximum non-constant constructors
@ 2005-03-16 19:51 Tom Hawkins
  2005-03-16 20:42 ` [Caml-list] " Alain Frisch
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Hawkins @ 2005-03-16 19:51 UTC (permalink / raw)
  To: caml-list

I just discovered another limitation: the maximum number of non-constant 
constructors is 246.

My modified OCaml parser generator (ocamlyacc with ints -- see previous 
post) is producing a parser that consumes 342 different tokens types.

Are there any ways to bypass this limitation?

-Tom


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-16 19:51 Maximum non-constant constructors Tom Hawkins
@ 2005-03-16 20:42 ` Alain Frisch
  2005-03-16 23:22   ` Tom Hawkins
  2005-03-16 23:32   ` [Caml-list] " Richard Jones
  0 siblings, 2 replies; 11+ messages in thread
From: Alain Frisch @ 2005-03-16 20:42 UTC (permalink / raw)
  To: Tom Hawkins; +Cc: caml-list

Tom Hawkins wrote:
> I just discovered another limitation: the maximum number of non-constant 
> constructors is 246.
> 
> My modified OCaml parser generator (ocamlyacc with ints -- see previous 
> post) is producing a parser that consumes 342 different tokens types.
> 
> Are there any ways to bypass this limitation?

Polymorphic variants.

-- Alain


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-16 20:42 ` [Caml-list] " Alain Frisch
@ 2005-03-16 23:22   ` Tom Hawkins
  2005-03-17 15:23     ` Stefan Monnier
  2005-03-16 23:32   ` [Caml-list] " Richard Jones
  1 sibling, 1 reply; 11+ messages in thread
From: Tom Hawkins @ 2005-03-16 23:22 UTC (permalink / raw)
  To: caml-list

Alain Frisch wrote:
> Tom Hawkins wrote:
> 
>> I just discovered another limitation: the maximum number of 
>> non-constant constructors is 246.
>>
>> My modified OCaml parser generator (ocamlyacc with ints -- see 
>> previous post) is producing a parser that consumes 342 different 
>> tokens types.
>>
>> Are there any ways to bypass this limitation?
> 
> 
> Polymorphic variants.

Good idea, but ocamlyacc does not accept `Some_token as valid syntax.

I'm starting to think I should invest some time developing a new parser 
generator for OCaml.

-Tom

> 
> -- Alain
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-16 20:42 ` [Caml-list] " Alain Frisch
  2005-03-16 23:22   ` Tom Hawkins
@ 2005-03-16 23:32   ` Richard Jones
  2005-03-17 12:36     ` Jean-Christophe Filliatre
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Jones @ 2005-03-16 23:32 UTC (permalink / raw)
  To: caml-list

On Wed, Mar 16, 2005 at 09:42:35PM +0100, Alain Frisch wrote:
> Polymorphic variants.

Has anyone ever seen (in real life) a collision in the hash function
which encodes polymorphic variants?  Just wondering ...  It seems like
it could occur.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-16 23:32   ` [Caml-list] " Richard Jones
@ 2005-03-17 12:36     ` Jean-Christophe Filliatre
  2005-03-17 14:00       ` Eric Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jean-Christophe Filliatre @ 2005-03-17 12:36 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list


Richard Jones writes:
 > 
 > Has anyone ever seen (in real life) a collision in the hash function
 > which encodes polymorphic variants?  Just wondering ...  It seems like
 > it could occur.

Yes it could occur, but there is a check a link time for such collisions.

-- 
Jean-Christophe

(http://www.lri.fr/~filliatr)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-17 12:36     ` Jean-Christophe Filliatre
@ 2005-03-17 14:00       ` Eric Cooper
  2005-03-17 17:32         ` Marcin 'Qrczak' Kowalczyk
  2005-03-17 17:55         ` Jon Harrop
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Cooper @ 2005-03-17 14:00 UTC (permalink / raw)
  To: caml-list

On Thu, Mar 17, 2005 at 01:36:35PM +0100, Jean-Christophe Filliatre wrote:
> 
> Richard Jones writes:
>  > 
>  > Has anyone ever seen (in real life) a collision in the hash function
>  > which encodes polymorphic variants?  Just wondering ...  It seems like
>  > it could occur.
> 
> Yes it could occur, but there is a check a link time for such collisions.

Out of curiousity, I wrote a short program to enumerate possible
collisions.  (If you examine the hash_variant function in
typing/btype.ml, it's clear that any base-223 representation of a
multiple of 2^31 in which the "digits" are legal identifier characters
will hash to zero, and will therefore be an "invisible prefix".)

For example, for all strings XXX, the variants `XXX and `zyctRecXXX
collide.

Here's a small sampling of other invisible prefixes:

    CibPXd UMEDdm d1usNc hS1P1' jagJhn oZshTt Atmtemb CAoStes DHobutv
    PeoQMeo SQufoxX alzzdgn dRtEXEl eeAnNdc glMavfi stYKKKs vbasThr

My guess is that none of them are likely to be chosen by
humans, but might occur in program-generated code.

-- 
Eric Cooper             e c c @ c m u . e d u


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Maximum non-constant constructors
  2005-03-16 23:22   ` Tom Hawkins
@ 2005-03-17 15:23     ` Stefan Monnier
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Monnier @ 2005-03-17 15:23 UTC (permalink / raw)
  To: caml-list

> I'm starting to think I should invest some time developing a new parser
> generator for OCaml.

BTW, has anyone ported SML/NJ's ml-yacc to OCaml?
I really like its automatic error recovery.


        Stefan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-17 14:00       ` Eric Cooper
@ 2005-03-17 17:32         ` Marcin 'Qrczak' Kowalczyk
  2005-03-17 18:27           ` Richard Jones
  2005-03-17 17:55         ` Jon Harrop
  1 sibling, 1 reply; 11+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2005-03-17 17:32 UTC (permalink / raw)
  To: caml-list

Eric Cooper <ecc@cmu.edu> writes:

> (If you examine the hash_variant function in typing/btype.ml, it's
> clear that any base-223 representation of a multiple of 2^31 in
> which the "digits" are legal identifier characters will hash to
> zero, and will therefore be an "invisible prefix".)

Beware of the birthday paradox: the probability of finding two values
with the same hash is much larger than the probability of finding a
single value with the given hash (a square root of the previous one).

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-17 14:00       ` Eric Cooper
  2005-03-17 17:32         ` Marcin 'Qrczak' Kowalczyk
@ 2005-03-17 17:55         ` Jon Harrop
  2005-03-19  2:28           ` Jacques Garrigue
  1 sibling, 1 reply; 11+ messages in thread
From: Jon Harrop @ 2005-03-17 17:55 UTC (permalink / raw)
  To: caml-list

On Thursday 17 March 2005 14:00, Eric Cooper wrote:
> For example, for all strings XXX, the variants `XXX and `zyctRecXXX
> collide.

So they do:

# type a = [ `ABC | `zyctRecABC ];;
Variant tags `ABC and `zyctRecABC have same hash value. Change one of them.
#

> My guess is that none of them are likely to be chosen by
> humans, but might occur in program-generated code.

Good point! I think this justifies the existence of a library function to 
compare the hashes of string names of polymorphic variants. Does such a 
function exist?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-17 17:32         ` Marcin 'Qrczak' Kowalczyk
@ 2005-03-17 18:27           ` Richard Jones
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Jones @ 2005-03-17 18:27 UTC (permalink / raw)
  Cc: caml-list

On Thu, Mar 17, 2005 at 06:32:06PM +0100, Marcin 'Qrczak' Kowalczyk wrote:
> Beware of the birthday paradox: the probability of finding two values
> with the same hash is much larger than the probability of finding a
> single value with the given hash (a square root of the previous one).

Yes, I was thinking about this too:

http://en.wikipedia.org/wiki/OCaml#Code_examples

(Good excuse to get rid of the terrible "99 bottles of beer" example
too).

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Maximum non-constant constructors
  2005-03-17 17:55         ` Jon Harrop
@ 2005-03-19  2:28           ` Jacques Garrigue
  0 siblings, 0 replies; 11+ messages in thread
From: Jacques Garrigue @ 2005-03-19  2:28 UTC (permalink / raw)
  To: jon; +Cc: caml-list

From: Jon Harrop <jon@ffconsultancy.com>
> On Thursday 17 March 2005 14:00, Eric Cooper wrote:
> > For example, for all strings XXX, the variants `XXX and `zyctRecXXX
> > collide.
> > My guess is that none of them are likely to be chosen by
> > humans, but might occur in program-generated code.
> 
> Good point! I think this justifies the existence of a library function to 
> compare the hashes of string names of polymorphic variants. Does such a 
> function exist?

Here it is:

let hash_variant s =
  let accu = ref 0 in
  for i = 0 to String.length s - 1 do
    accu := 223 * !accu + Char.code s.[i]
  done;
  (* reduce to 31 bits *)
  accu := !accu land (1 lsl 31 - 1);
  (* make it signed for 64 bits architectures *)
  if !accu > 0x3FFFFFFF then !accu - (1 lsl 31) else !accu

It is also available in the C runtime (where it is useful for
interfacing) but if you really need it in your program just copy the
above code.

To give you an idea of the probability of conflict, here are the
conclicts I found in a 235882 word dictionary (/usr/share/dict/web2 on
FreeBSD)
Tag `Cretacic conflicts with `cornigerous
Tag `gedackt conflicts with `aquicolous
Tag `myeloencephalitis conflicts with `adequative
Tag `Nemorensian conflicts with `condor
Tag `nonrioter conflicts with `anematosis
Tag `omphaloma conflicts with `crimelessness
Tag `persecute conflicts with `paraenetic
Tag `soredioid conflicts with `genitorial
Tag `subpopulation conflicts with `oxyquinone
Tag `sympathy conflicts with `herbman
Tag `trophaeum conflicts with `prepossession
Tag `unbreaded conflicts with `neback
Tag `undragooned conflicts with `nuisance
Tag `unlistened conflicts with `disturb
Tag `veratrine conflicts with `absolutely
You will get different conflict if you change the capitalization, but
not more.

So much for the birthday paradox: variants fully use 31 bits, so that
the probability only gets more than 0.5 when you have more than 32000
constructors in the same type... Of course this is assuming a perfect
distribution, but as the above dictionary shows, reality is close to
that.

By the way, the check is not at link time, as was stated in another
message, but at compile time. It is the typing of variants itself that
guarantees that no such conflict will go undetected. So you will be
warned early enough.
The same hash function is also used for method names (with the same
compile-time check), but fortunately it is hard to imagine an object
with more than 10000 methods.

Jacques Garrigue


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-03-19  2:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-16 19:51 Maximum non-constant constructors Tom Hawkins
2005-03-16 20:42 ` [Caml-list] " Alain Frisch
2005-03-16 23:22   ` Tom Hawkins
2005-03-17 15:23     ` Stefan Monnier
2005-03-16 23:32   ` [Caml-list] " Richard Jones
2005-03-17 12:36     ` Jean-Christophe Filliatre
2005-03-17 14:00       ` Eric Cooper
2005-03-17 17:32         ` Marcin 'Qrczak' Kowalczyk
2005-03-17 18:27           ` Richard Jones
2005-03-17 17:55         ` Jon Harrop
2005-03-19  2:28           ` Jacques Garrigue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).