caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Mattias Waldau" <mattias.waldau@abc.se>
To: <caml-list@inria.fr>
Subject: [Caml-list] Beware of compare (and Ocaml beaten by Java)
Date: Mon, 26 Nov 2001 19:40:00 +0100	[thread overview]
Message-ID: <AAEBJHFJOIPMMIILCEPBCECKDFAA.mattias.waldau@abc.se> (raw)
In-Reply-To: <20011127015149.A10358@earth.cs.mu.oz.au>

Why isn't compare compiled as '-'? According to the definition
of compare this should be okay.

The reason why I am asking this can be found below.

/mattias


Suffix arrays are very efficient for fast string searching 
in big texts. I implemented a first nice version, which 
took 40 s to run on my P4, when applied to the bible (4.5 MB)

The core of the slow program is

(* compare two substrings of the SAME text 
   [compare x y] returns [0] if [x=y], a negative integer if
   [x<y], and a positive integer if [x>y] *)
let rec same_substr_compare str idx1 idx2 : int =
  let length = String.length str in
  (* shortest string is smaller *)
  if idx1 = length then -1 else
  if idx2 = length then 1 else 
  (* compare one char *)
  let res = compare str.[idx1] str.[idx2] in
  (* char was equal, recurse *)
  if res = 0 then same_substr_compare str (idx1+1) (idx2+1)
  (* char was different, finished *)
  else res ;;

(* create and sort index array for string *)
let create_index str : int array =
  let idx = Array.init (String.length str) (fun ii -> ii) in
  Array.stable_sort (same_substr_compare str) idx;
  idx ;;

However, a colleague implemented it using Java. His code 
only takes 15 s to run.

Thus, I had to further tune the code by replacing 'compare' 
with '-', inlining better and removing bound-checking 
on strings and arrays. The new version runs in 10 s. 

During this lesson I learned that
1. compare is incredible slow! By replacing

	let res = compare str.[idx1] str.[idx2] in

with
	
	let res = (-) (int_of_char str.[idx1]) (int_of_char str.[idx2]) in

the run-time of the complete program went down 
from 20 s to 10 s.

2. Tail-recursive comparation is faster than using a 
loop and exit (runtime down from 80s to 60s). See the file
other_compares.ml in the nice version for these other attempts, 
one example is

exception Result of int ;;
let substr_compare str idx1 idx2 : int =
  let res = compare str.[idx1] str.[idx2] in
  if res <> 0 then res 
  else try
    let length = String.length str in
    let max_size = if idx1 > idx2 then
      length - idx1 
    else length - idx2 in
    for offset = 1 to max_size - 1 do
      let res = compare str.[idx1+offset] str.[idx2+offset] in
      if res<>0 then raise (Result res);
    done;
    0
  with Result res -> res ;;

3. Mergesort (=Array.stable_sort) is faster than 
heapsort (=Array.sort). (runtime down from 60s to 40s).
(I also tried quicksort (=Sort.array), but after 8 hours
it still hadn't finished.)

4. The rest is inline and removing bounds checking.

This is the first time I got beaten by Java, I will 
have to reevaluate Java. Note that Java still does 
bounds checking!

/mattias

The nice version is at 
http://www.abc.se/~m10217/download/mans.tar.bz2

The fast version is at
http://www.abc.se/~m10217/download/mans.opt.tar.bz2

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


  parent reply	other threads:[~2001-11-26 18:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-10-02 17:53 [Caml-list] replay debugger Yoann Padioleau
2001-10-04 12:39 ` Xavier Leroy
2001-11-26 14:51   ` Fergus Henderson
2001-11-26 16:14     ` Xavier Leroy
2001-11-26 18:40     ` Mattias Waldau [this message]
2001-11-27  9:21       ` [Caml-list] Beware of compare (and Ocaml beaten by Java) Xavier Leroy
2001-11-27  9:41         ` Mattias Waldau
2001-11-30  9:12           ` Pierre Weis
2001-12-03 21:37             ` Chris Hecker
2001-11-27 17:03         ` [Caml-list] bytegen.comp_expr error when doing object copying Neil Inala
2001-11-28 20:15           ` Xavier Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AAEBJHFJOIPMMIILCEPBCECKDFAA.mattias.waldau@abc.se \
    --to=mattias.waldau@abc.se \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).