From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.9 required=5.0 tests=AWL,DNS_FROM_SECURITYSAGE, SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id E040FBB84 for ; Thu, 13 Nov 2008 21:56:43 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmwAAEIjHEnUTWUFkGdsb2JhbACTXgEBAQEJCQwHEQS+Z4NX X-IronPort-AV: E=Sophos;i="4.33,598,1220220000"; d="scan'208";a="19932449" Received: from mx1.wp.pl ([212.77.101.5]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 13 Nov 2008 21:56:43 +0100 Received: (wp-smtpd smtp.wp.pl 24331 invoked from network); 13 Nov 2008 21:56:38 +0100 Received: from phpdk6.ph.kcl.ac.uk (HELO [137.73.5.196]) (d0@[137.73.5.196]) (envelope-sender ) by smtp.wp.pl (WP-SMTPD) with AES256-SHA encrypted SMTP for ; 13 Nov 2008 21:56:38 +0100 Message-ID: <491C9472.6050603@wp.pl> Date: Thu, 13 Nov 2008 20:56:18 +0000 From: Dawid Toton User-Agent: Thunderbird 1.5.0.12 (X11/20081014) MIME-Version: 1.0 To: caml-list@yquem.inria.fr Subject: Portable hash Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-WP-AV: skaner antywirusowy poczty Wirtualnej Polski S. A. X-WP-SPAM: NO 0000000 [IWOE] X-Spam: no; 0.00; hash:01 hashes:01 hash:01 compiler:01 hashes:01 hashtbl:01 hashtbl:01 serialize:01 sexp:01 sexp:01 marshal:01 marshal:01 digest:01 digest:01 int:01 I'm looking for a way to calculate hashes of values of variuos types. It has to be: 1. proper function (i.e. x=y => (hash x)=(hash y) ) 2. and should not change with a platform or compiler version. 3. It'd also help to have practically no collisions. For some time I needed not to move data across machines and what I have been (wrongly) using for this so far is: let hash x = Digest.string (Marshal.to_string x []) Recently I realized that it's incorrect as Marshal.to_string is not pure function and doesn't satisfy my first requirement. Indeed in some very rare cases I got different results from the same value. Only the 3rd point is satisfied very well. I have to change my function before I run into problems and I'm considering: 1) Small hashes: let hash x = |Hashtbl.hash_param large_int large_int x Does anybody know what are properies of | |Hashtbl.hash_param? I mean: is the implementation stable? Can I have good distribution when all the data is examined (large parameters)? Unfortunately it returns int of platform-dependent length (and even platform-depentent less significant bits of result?). How hard would it be to tailor it to, say, work always with 31 bits? || 2) To serialize values with Sexp: let hash to_sexp x = Digest.string (string_of_sexp (to_sexp x)) | |This way I have the stablility because I can just keep implementation of Sexp untouched. But the performance is going to be even worse than with my original solution.| Has anybody solved already this (or similar) problem? Dawid Toton