From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA15369; Wed, 21 May 2003 09:23:48 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id JAA15321 for ; Wed, 21 May 2003 09:23:47 +0200 (MET DST) Received: from teutates.kfunigraz.ac.at (TEUTATES.kfunigraz.ac.at [143.50.129.26]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h4L7NkH14403 for ; Wed, 21 May 2003 09:23:47 +0200 (MET DST) Received: from localhost (localhost.localdomain [127.0.0.1]) by teutates.kfunigraz.ac.at (Postfix) with ESMTP id 3448B3D222B; Wed, 21 May 2003 09:23:45 +0200 (CEST) Received: from teutates.kfunigraz.ac.at ([127.0.0.1]) by localhost (teutates.kfunigraz.ac.at [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 17696-04; Wed, 21 May 2003 09:23:42 +0200 (CEST) Received: from stud.uni-graz.at (IGAM08AV.kfunigraz.ac.at [143.50.39.35]) by teutates.kfunigraz.ac.at (Postfix) with ESMTP id 2A22C3D2225; Wed, 21 May 2003 09:23:40 +0200 (CEST) Message-ID: <3ECB189C.5090400@stud.uni-graz.at> Date: Wed, 21 May 2003 08:11:40 +0200 From: Siegfried Gonzi Organization: Universitaet Graz User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204 X-Accept-Language: en-us MIME-Version: 1.0 To: Michal Moskal Cc: caml-list@inria.fr Subject: Re: [Caml-list] Reading a file References: <4.3.2.7.2.20030517225010.04b748a0@localhost> <4.3.2.7.2.20030519120753.04545700@localhost> <200305201007.17990.wolfgang.mueller2@uni-bayreuth.de> <3EC9EA84.3070404@stud.uni-graz.at> <20030520132032.GA9564@roke.freak> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at stud.uni-graz.at X-Spam: no; 0.00; siegfried:01 gonzi:01 stud:99 caml-list:01 michal:01 moskal:01 erg:01 bottleneck:01 ocaml's:01 psilab:01 python:01 clumsy:01 verbose:01 bigloo:01 posses:99 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Michal Moskal wrote: > >If you expand each line of megabyte file to list of characters -- it >cannot be fast. > Enclosed the OCaml version in question: 'split' has been pinched up from comp.lang.functional. A year ago I had a conversation there and someone posted this split function tailored to my request: split "nil,2.23,3.34,nil" (-1.0) = [-1.0,2.23,3.34,-1.0] 'extractFloats' opens a file and applies split to every line and stores the result into a list: == let split s c = let rec loop start acc = try let next = String.index_from s start c in let substring = String.sub s start (next-start) in loop (next+1) (substring :: acc) with Not_found -> let len = String.length s in let substring = String.sub s start (len-start) in List.rev (substring :: acc) in loop 0 [] ;; let frob userval s = match s with | "n/a" -> userval | "nil" -> userval | _ -> float_of_string s ;; let extractFloats file del nanProxy = let rec readLoop i acc = try let line = input_line file in let floatL = List.map (frob nanProxy) (split line del) in readLoop (i+1) (floatL :: acc) with End_of_file -> List.rev acc in readLoop 0 [] ;; let f = open_in "/home/gonzi/test.txt";; let erg2 = extractFloats f ',' (-1.0);; let rows = List.length erg2;; rows;; ==== Enclosed also the Clean function. This version would be way more readable than the Ocaml version. But I do not know how to translate it to OCaml. My Clean function reads line after line and passes this string-line on to RealsFromString. The latter function converts the string-line to a char-list: [x\\x <-: string-line] and uses takeWhile, toString, dropWhile and toReal in order to get the double numbers. As I said the function is incredibly fast and takes for a 50MB file about 15 seconds. Ocaml takes 8minutes. If I try to read the file line by line only (without the conversion to double numbers) then Ocaml would take about 1 minutes. Where is the bottleneck here? List.map or what? I think everybody has one specific task which he tries to implement in every programming language he encounters. My specific task is this floating-point extraction from string-files. I didn't play around with different OCaml solutions, because I had to play a bit with OCaml's Psilab implementation (if you need something like Python+Numeric+Dislin you could give Psilab a try). If you need the whole Clean program drop me a note. By the way: my Scheme version is clumsy and is more or less similar to the OCaml version. I wrote this verbose Scheme (Bigloo) version a year ago when I was a beginner of Scheme. The performance of the Scheme (Bigloo) version is about 30 seconds for this 50MB file and is therefore similar to the C++-template version which takes about 30 seconds. Oh yes: do not draw to close out a comment when I write "clumsy" which implies OCaml is clumsy too; I have the strong believing that OCaml's exception handling mechanism is more or less better than Clean's one because Clean does not posses such a thing as exception handling, so to speak. S. Gonzi ==== //////////////////////////////////////////////// // The dead as Latin functional language // whith the most readable syntax out there // and one of the /fastest functional languages/: // Clean (In the meantime open(source?) // for Linux/Unix). But as life plays: // nobody jumps onto the Clean-bandwagon. Is this // a pity or a bless? Why doesn't the "most" // readable syntax plays a role in real life? // Do not get me wrong, but why does always the // "punctuation syntax" win in real life? //////////////////////////////////////////////// FExtractReals:: HeaderKeys File -> [[Real]] FExtractReals h file | sfend file = [] # (line,nextline) = sfreadline file = [(RealsFromString line h.del h.nan h.nanProxy) : (FExtractReals h nextline)] RealsFromString:: String Char String Real -> [Real] RealsFromString line del nan nanProxy= searchDel [x\\x<-:line] where searchDel:: [Char] -> [Real] searchDel [] = [] searchDel linerest # val = toString( takeWhile notDelNl linerest ) # rest = dropWhile ((<>)del) linerest = [toRealNaN val nan : searchDel (drop 1 rest)] notDelNl::Char -> Bool notDelNl x | x==del = False | x==' ' = False | x=='\t' = False | x=='\n' = False = True toRealNaN:: String String -> Real toRealNaN s nan | s==nan = nanProxy = toReal(s) ==== ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners