On Thu, Oct 02, 2003 at 05:31:22AM -0700, Ram Bhamidipaty wrote: > Someone suggested that my previous question about dynamically resizing > arrays hinted that my solution may be going in a non-functional > direction. That might be true. > > So here is the problem I am trying to solve. I would like to solve it > in a "functional way". Use Map. You won't notice a difference. [...] > I have already written a python application that can read this data file and > generate the data I am looking for. I was not happy with the python solution > because it was not very fast. Even with using a heap to store the top 100 > largest files I was not able to create a python solution that could beat > a "grep | sort" pipeline (on unix of course). [...] ./stats < /shm/d 0.11s user 0.00s system 78% cpu 0.139 total grep '^F' /shm/d 0.02s user 0.00s system 22% cpu 0.090 total sort -n -t' ' -k2,2 0.32s user 0.05s system 71% cpu 0.517 total tail -100 0.02s user 0.00s system 3% cpu 0.514 total 86287 263948 1912177 /shm/d Did I win something? ;-) [...] > My goal of implementing this in OCaml is to beat the grep+sort > combination. If I can create a solution that can output all the > date I want - in one pass - AND still be faster than the grep+sort > partial solution - _that_ would be cool! > > Having said all that - I wanted to use an array to hold the data for > each directory. I hoped that using an array would be faster than a > hash table since I know that the directory numbers are assigned > sequentially. Map! Map! Stop thinking in such an imperative way ;-) Key points of my implementation: 1. It also uses heap (simple one from pure-fun) 2. It is compiled with ocamlopt (ocamlc version is 6 times slower) 3. It doesn't use Scanf. For such linear task as this Scanf takes 10 or more times to parse input then actual computations. 4. There is little trick, that heap holds 101 elements, and always discards last one, so we don't need to check if newly inserted element is really bigger then min we just deleted. 5. The code isn't very pretty (it would be better to make heap functor with item count, and so on). It is left as an exercise to the reader. You're free to use it. I also attach program I used to create input. -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h