From mboxrd@z Thu Jan 1 00:00:00 1970 X-Sympa-To: caml-list@inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p6S9mpdh031215 for ; Thu, 28 Jul 2011 11:48:51 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuQEADcwMU5KfVK2kGdsb2JhbAA0AQEBAQMUAhMfATsCAw0BBgUYCSILAgIJAwIBAgECJgEFASQVDwEBH4Qvk3aDVIsuCBQBAQEBCQkNBxQEIYh+AqUhCotrgxKFAokoAgMGhSuBEASNfYR8hQeBJoM/gmA8gT+CHw X-IronPort-AV: E=Sophos;i="4.67,281,1309730400"; d="vcf'?scan'208";a="104097492" Received: from mail-wy0-f182.google.com ([74.125.82.182]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 28 Jul 2011 11:48:35 +0200 Received: by wyg24 with SMTP id 24so31153wyg.27 for ; Thu, 28 Jul 2011 02:48:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:x-enigmail-version:content-type; bh=PIzupDA9nKQzAe3jaJ4lnMNp1CoWyAaB3uMs3JUig6c=; b=iRbvHYrw0zK2Sd1X0mptZBnD6O8L2N6Bb05waCazSUn5LJfjqY+kF23Fn5iS0q68R9 S+JuzmvlwP7WPn3qUhLs+e0U8G/KIcZa0wpSfRqOQWarxWH2YKUKEAkfMvMMtz/AFFzO nXZI+Yqd460w/I02T7P9MznJGwKKbXB+lHTFA= Received: by 10.227.58.143 with SMTP id g15mr1179040wbh.17.1311846514391; Thu, 28 Jul 2011 02:48:34 -0700 (PDT) Received: from [192.168.0.3] (lns-bzn-48f-81-56-241-169.adsl.proxad.net [81.56.241.169]) by mx.google.com with ESMTPS id em16sm673967wbb.67.2011.07.28.02.48.31 (version=SSLv3 cipher=OTHER); Thu, 28 Jul 2011 02:48:32 -0700 (PDT) Sender: Fabrice Le Fessant Message-ID: <4E313071.7090309@inria.fr> Date: Thu, 28 Jul 2011 11:48:33 +0200 From: Fabrice Le Fessant User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: caml-list@inria.fr References: <4E307DFE.3040502@gmail.com> In-Reply-To: <4E307DFE.3040502@gmail.com> X-Enigmail-Version: 1.1.2 Content-Type: multipart/mixed; boundary="------------030701010909090907010503" X-Validation-by: fabrice.le_fessant@inria.fr Subject: Re: [Caml-list] Hashtbl performance This is a multi-part message in MIME format. --------------030701010909090907010503 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I think it is a well known bug of the hash function: let list = ref [];; for i = 1 to 30 do list := i :: !list; Printf.printf "%d " (Hashtbl.hash !list) done;; 1 65601 8392706 797307010 578301955 797307010 8392706 65601 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 As you can see, long lists all have the same hash value. It is due to the fact that the hash function does a depth-first traversal of the datastructure, starting with the last field, instead of a breadth-first traversal, and, as it only visits 10 nodes, they are always the same ones for all lists of length > 10. Thus, you should consider using your own hash function, probably only considering the ints in the list and its length. Then, use the functor in the Hashtbl module. --Fabrice On 07/27/2011 11:07 PM, Toby Kelsey wrote: > I have written two versions of a small program, one using an (int list) data structure and the other an ((int*int) list); and I tested using Hashtbl, Set and (Jean-Christophe Filliatre's) Trie to cache these elements in each version. > The relative run times of the programs turns out to be: > > Hashtbl Set Trie > (int list) 1 2.1 2.0 > ((int*int) list) 34.7 2.6 2.1 > > > There is a slight inherent speed difference between the 2 versions but the major effect is the cache type (in fact the caching difference must be larger than these ratios due to non-cache computations). I expected Hashtbl might be a bit slower than more specialised data structures, but the large speed difference with different data structures was unexpected. Presumably the slow-down is due to excessive hash collisions. > > I had expected that the generic Hashtbl would be written to give adequate speed for all types of data and when I look at OCaml code on the web Hashtbl is usually used, so it seems most OCaml programmers believe the standard Hashtbl is a reasonable choice for most data types as well. > > I haven't tested the Batteries or OCaml-Core hash tables so these may be more consistent, but if not, my question is can you predict how well Hashtbl will work for different types of data and so what to use it with, or it is just not reliable enough for general-purpose caching/memoization? > > If anyone wants to look at the code, the (int list) Hashtbl version is at and the other versions are in temporarily. Apart from the caching and the int/int pair changes they should be identical. > > Toby > --------------030701010909090907010503 Content-Type: text/x-vcard; charset=utf-8; name="fabrice_le_fessant.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="fabrice_le_fessant.vcf" begin:vcard fn:Fabrice LE FESSANT n:LE FESSANT;Fabrice org:INRIA Saclay -- Ile-de-France;P2P & OCaml adr;quoted-printable:;;Parc Orsay Universit=C3=A9 ;Orsay CEDEX;;91893;France email;internet:fabrice.le_fessant@inria.fr title;quoted-printable:Charg=C3=A9 de Recherche tel;work:+33 1 74 85 42 14 tel;fax:+33 1 74 85 42 49 url:http://fabrice.lefessant.net/ version:2.1 end:vcard --------------030701010909090907010503--