From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA17340; Tue, 27 Nov 2001 10:41:16 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA17329 for ; Tue, 27 Nov 2001 10:41:15 +0100 (MET) Received: from mailb.telia.com (mailb.telia.com [194.22.194.6]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id fAR9fDD00861; Tue, 27 Nov 2001 10:41:14 +0100 (MET) Received: from gateway (h225n2fls34o849.telia.com [217.208.235.225]) by mailb.telia.com (8.11.6/8.11.6) with SMTP id fAR9fCh22479; Tue, 27 Nov 2001 10:41:13 +0100 (CET) From: "Mattias Waldau" To: "Xavier Leroy" Cc: Subject: RE: [Caml-list] Beware of compare (and Ocaml beaten by Java) Date: Tue, 27 Nov 2001 10:41:10 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 In-Reply-To: <20011127102130.A16765@pauillac.inria.fr> Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > I assume you mean "compare at type int -> int -> int", because for > types represented as pointers, pointer subtraction wouldn't give > reliable results -- for one thing, the GC can move blocks around. Yes. > > Even on integer arguments, "-" cannot be used due to arithmetic > overflow: > > compare min_int max_int = -1 (* correct *) > min_int - max_int = 1 (* incorrect *) > > So, replacing "compare" by "-" is only valid for small integer types > such as "char" and enumerated datatypes, and I felt this wasn't > important enough to implement. (Given your example, you'll disagree, > of course.) Yes, you are right, overflow is a problem. Maybe using "-" instead of compare should just be a tip instead next to the documentation of compare. > > > The core of the slow program is > > > > (* compare two substrings of the SAME text > > [compare x y] returns [0] if [x=y], a negative integer if > > [xy] *) > > let rec same_substr_compare str idx1 idx2 : int = > > let length = String.length str in > > (* shortest string is smaller *) > > if idx1 = length then -1 else > > if idx2 = length then 1 else > > (* compare one char *) > > let res = compare str.[idx1] str.[idx2] in > > (* char was equal, recurse *) > > if res = 0 then same_substr_compare str (idx1+1) (idx2+1) > > (* char was different, finished *) > > else res ;; > > I get the impression that same_substr_compare never returns 0; is this > intentional? Since I compare substrings of the same string from different positions to the end of the string, there will never be any identical substrings (one of them will always be shorter). > > > 3. Mergesort (=Array.stable_sort) is faster than > > heapsort (=Array.sort). (runtime down from 60s to 40s). > > (I also tried quicksort (=Sort.array), but after 8 hours > > it still hadn't finished.) > > This is a bit surprising: you might have hit one of those cases where > Quicksort is O(n^2), but it could also be the case that Sort.array > malfunctions because your comparison function is not a proper > comparison function (it doesn't return 0 for equal things). It worked on smaller datasets, so I think I found an O(n^2)-case. My Java-collegue had the same problem and had to implement merge sort. I didn't look into it a lot. > > - Xavier Leroy ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr