caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Computing with big numbers?
@ 2008-12-01 12:36 Alan Schmitt
  2008-12-01 12:52 ` [Caml-list] " Martin Jambon
  2008-12-01 13:47 ` Dario Teixeira
  0 siblings, 2 replies; 9+ messages in thread
From: Alan Schmitt @ 2008-12-01 12:36 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

Hello,

In preparation for a talk I'm going to give, I wanted to estimate how  
good 128 bits MD5 hashes were: how many hashes must be taken before  
the probability for a collision become non negligible? (I'm assuming  
equi-probability of every hash.)

The brute force approach is simply to enumerate the non-collision  
probability for k different hashes, and compute until it becomes lower  
than 1. This probability is (writing N for 2 ^ 128):
N * (N-1) * (N - 2) * ... * (N - k)
---------------------------------------
N^k

I tried computing this using the bignum library that comes with OCaml,  
and it slows down to a crawl very fast (for k ~ 1000).

So I tried to be more subtle and approximate the result (using  
Stirling's approximation of factorials), but OCaml's floats are not  
precise enough to yield anything significant. (I'm trying to compute  
the log of the approximation of N! / (N^k * (N-k)!), which is N (ln N)  
- N - (k (ln N) + (N - k)(ln (N - k)) - (N - k)).)

Is there a library with better floating point precision than the OCaml  
one?

Thanks,

Alan

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-01 12:36 Computing with big numbers? Alan Schmitt
@ 2008-12-01 12:52 ` Martin Jambon
  2008-12-01 14:29   ` Alan Schmitt
  2008-12-01 13:47 ` Dario Teixeira
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Jambon @ 2008-12-01 12:52 UTC (permalink / raw)
  To: Alan Schmitt; +Cc: caml-list

Alan Schmitt wrote:
> Hello,
> 
> In preparation for a talk I'm going to give, I wanted to estimate how
> good 128 bits MD5 hashes were: how many hashes must be taken before the
> probability for a collision become non negligible? (I'm assuming
> equi-probability of every hash.)
> 
> The brute force approach is simply to enumerate the non-collision
> probability for k different hashes, and compute until it becomes lower
> than 1. This probability is (writing N for 2 ^ 128):
> N * (N-1) * (N - 2) * ... * (N - k)
> ---------------------------------------
> N^k
> 
> I tried computing this using the bignum library that comes with OCaml,
> and it slows down to a crawl very fast (for k ~ 1000).
> 
> So I tried to be more subtle and approximate the result (using
> Stirling's approximation of factorials), but OCaml's floats are not
> precise enough to yield anything significant. (I'm trying to compute the
> log of the approximation of N! / (N^k * (N-k)!), which is N (ln N) - N -
> (k (ln N) + (N - k)(ln (N - k)) - (N - k)).)
> 
> Is there a library with better floating point precision than the OCaml one?

If I understand your problem correctly, this is the so-called birthday
problem with 2^128 days in a year. The Wikipedia article gives useful
approximations:

http://en.wikipedia.org/wiki/Birthday_problem


Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-01 12:36 Computing with big numbers? Alan Schmitt
  2008-12-01 12:52 ` [Caml-list] " Martin Jambon
@ 2008-12-01 13:47 ` Dario Teixeira
  2008-12-01 14:37   ` Alan Schmitt
  1 sibling, 1 reply; 9+ messages in thread
From: Dario Teixeira @ 2008-12-01 13:47 UTC (permalink / raw)
  To: caml-list, Alan Schmitt

Hi,

> In preparation for a talk I'm going to give, I wanted
> to estimate how good 128 bits MD5 hashes were: how many
> hashes must be taken before the probability for a collision
> become non negligible? (I'm assuming equi-probability of
> every hash.)

I reckon that by saying "how good 128 bits MD5 hashes were"
you are aware of the recent attacks that make MD5's effective
security less than 128-bit.  The Wikipedia has a good summary:
http://en.wikipedia.org/wiki/MD5

Cheers,
Dario Teixeira






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-01 12:52 ` [Caml-list] " Martin Jambon
@ 2008-12-01 14:29   ` Alan Schmitt
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Schmitt @ 2008-12-01 14:29 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 482 bytes --]

On 1 déc. 08, at 13:52, Martin Jambon wrote:

> If I understand your problem correctly, this is the so-called birthday
> problem with 2^128 days in a year. The Wikipedia article gives useful
> approximations:
>
> http://en.wikipedia.org/wiki/Birthday_problem

Thank you for the link, this was very informative. And there is even a  
table linking desired probability of collision to number of outputs in  
this page:

http://en.wikipedia.org/wiki/Birthday_attack

Alan

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-01 13:47 ` Dario Teixeira
@ 2008-12-01 14:37   ` Alan Schmitt
  2008-12-04 16:06     ` Florian Hars
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Schmitt @ 2008-12-01 14:37 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

On 1 déc. 08, at 14:47, Dario Teixeira wrote:

> I reckon that by saying "how good 128 bits MD5 hashes were"
> you are aware of the recent attacks that make MD5's effective
> security less than 128-bit.  The Wikipedia has a good summary:
> http://en.wikipedia.org/wiki/MD5

Thanks for the link. I knew of the attacks but did not know they had  
gotten so good. But I don't think this applies here, as the hashes I'm  
looking at are the one used by Unison to identify file contents.

Alan

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-01 14:37   ` Alan Schmitt
@ 2008-12-04 16:06     ` Florian Hars
  2008-12-04 16:40       ` David Thomas
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Florian Hars @ 2008-12-04 16:06 UTC (permalink / raw)
  To: Alan Schmitt; +Cc: caml-list

Alan Schmitt schrieb:
> But I don't think this applies here, as the hashes I'm
> looking at are the one used by Unison to identify file contents.

Then it is *especially* relevant, as it is quite trivial to generate
several files with  different content and the same MD5 hash, all you
need is a Playstation 3:
http://www.win.tue.nl/hashclash/Nostradamus/

- Florian
-- 
But our moral language is fragmented; our contemporaries reject the Kantian
hunch that choosing those things most admirable and plausible as ends in
themselves is the best practice; autonomous sources of the good are everywhere
brown and broken. Thus we have PHP. http://lambda-the-ultimate.org/node/1463


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-04 16:06     ` Florian Hars
@ 2008-12-04 16:40       ` David Thomas
  2008-12-04 21:51       ` Martin Jambon
  2008-12-05  7:07       ` Alan Schmitt
  2 siblings, 0 replies; 9+ messages in thread
From: David Thomas @ 2008-12-04 16:40 UTC (permalink / raw)
  To: caml-list

That depends on the threat model.  If the question is, "presuming no active attack, how likely is it to break?", then the cryptanalytic results against the hash are irrelevant.  If the question is "how secure is it if someone is maliciously manipulating files", then they are certainly relevant.

If you're operating between reasonably secure machines, where an attacker having write access is already more catastrophic than a failure of Unison, then the first is what matters.  If someone else has control over some of the files, then you've gotta watch the second.


--- On Thu, 12/4/08, Florian Hars <hars@bik-gmbh.de> wrote:

> From: Florian Hars <hars@bik-gmbh.de>
> Subject: Re: [Caml-list] Computing with big numbers?
> To: "Alan Schmitt" <alan.schmitt@polytechnique.org>
> Cc: caml-list@inria.fr
> Date: Thursday, December 4, 2008, 8:06 AM
> Alan Schmitt schrieb:
> > But I don't think this applies here, as the hashes
> I'm
> > looking at are the one used by Unison to identify file
> contents.
> 
> Then it is *especially* relevant, as it is quite trivial to
> generate
> several files with  different content and the same MD5
> hash, all you
> need is a Playstation 3:
> http://www.win.tue.nl/hashclash/Nostradamus/
> 
> - Florian
> -- 
> But our moral language is fragmented; our contemporaries
> reject the Kantian
> hunch that choosing those things most admirable and
> plausible as ends in
> themselves is the best practice; autonomous sources of the
> good are everywhere
> brown and broken. Thus we have PHP.
> http://lambda-the-ultimate.org/node/1463
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list:
> http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


      


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-04 16:06     ` Florian Hars
  2008-12-04 16:40       ` David Thomas
@ 2008-12-04 21:51       ` Martin Jambon
  2008-12-05  7:07       ` Alan Schmitt
  2 siblings, 0 replies; 9+ messages in thread
From: Martin Jambon @ 2008-12-04 21:51 UTC (permalink / raw)
  To: Florian Hars; +Cc: Alan Schmitt, caml-list

Florian Hars wrote:
> Alan Schmitt schrieb:
>> But I don't think this applies here, as the hashes I'm
>> looking at are the one used by Unison to identify file contents.
> 
> Then it is *especially* relevant, as it is quite trivial to generate
> several files with  different content and the same MD5 hash, all you
> need is a Playstation 3:
> http://www.win.tue.nl/hashclash/Nostradamus/

It's like being able to manufacture a lock and the key that goes with
it: big deal!

:-)


Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Computing with big numbers?
  2008-12-04 16:06     ` Florian Hars
  2008-12-04 16:40       ` David Thomas
  2008-12-04 21:51       ` Martin Jambon
@ 2008-12-05  7:07       ` Alan Schmitt
  2 siblings, 0 replies; 9+ messages in thread
From: Alan Schmitt @ 2008-12-05  7:07 UTC (permalink / raw)
  To: Florian Hars; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On 4 déc. 08, at 17:06, Florian Hars wrote:

> Alan Schmitt schrieb:
>> But I don't think this applies here, as the hashes I'm
>> looking at are the one used by Unison to identify file contents.
>
> Then it is *especially* relevant, as it is quite trivial to generate
> several files with  different content and the same MD5 hash, all you
> need is a Playstation 3:
> http://www.win.tue.nl/hashclash/Nostradamus/

Thanks for the link, but I wasn't considering a malicious attack: if  
someone were able to modify my files, I would not worry about Unison  
not detecting (thus not propagating) a malicious change.

Alan

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-12-05  7:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-01 12:36 Computing with big numbers? Alan Schmitt
2008-12-01 12:52 ` [Caml-list] " Martin Jambon
2008-12-01 14:29   ` Alan Schmitt
2008-12-01 13:47 ` Dario Teixeira
2008-12-01 14:37   ` Alan Schmitt
2008-12-04 16:06     ` Florian Hars
2008-12-04 16:40       ` David Thomas
2008-12-04 21:51       ` Martin Jambon
2008-12-05  7:07       ` Alan Schmitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).