caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Siegfried Gonzi <siegfried.gonzi@stud.uni-graz.at>
To: Michal Moskal <malekith@pld-linux.org>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Reading a file
Date: Wed, 21 May 2003 08:11:40 +0200	[thread overview]
Message-ID: <3ECB189C.5090400@stud.uni-graz.at> (raw)
In-Reply-To: <20030520132032.GA9564@roke.freak>

Michal Moskal wrote:

>
>If you expand each line of megabyte file to list of characters -- it
>cannot be fast.
>
Enclosed the OCaml version in question:

'split' has been pinched up from comp.lang.functional. A year ago I had
a conversation there and someone posted this split function tailored to
my request: split "nil,2.23,3.34,nil" (-1.0) = [-1.0,2.23,3.34,-1.0]

'extractFloats' opens a file and applies split to every line and stores
the result into a list:

==
let split s c =
    let rec loop start acc =
 	try
  	  let next = String.index_from s start c in
  	  let substring = String.sub s start (next-start) in
  	  loop (next+1) (substring :: acc)
  	with
  	  Not_found ->
		let len = String.length s in
  		let substring = String.sub s start (len-start) in
  		List.rev (substring :: acc)
     in loop 0 []
;;


let frob userval s =
    match s with
    | "n/a" -> userval
    | "nil" -> userval
    | _ -> float_of_string s
;;

let extractFloats file del nanProxy =
  let rec readLoop i acc =
    try
      let line = input_line file in
      let floatL = List.map (frob nanProxy) (split line del) in
      readLoop  (i+1) (floatL :: acc)
    with
      End_of_file ->
	List.rev acc
  in
    readLoop 0 []
;;



let f = open_in "/home/gonzi/test.txt";;
let erg2 = extractFloats f ',' (-1.0);;
let rows = List.length erg2;;
rows;;
====

Enclosed also the Clean function. This version would be way more
readable than the Ocaml version. But I do not know how to translate it
to OCaml. My Clean function reads line after line and passes this
string-line on to RealsFromString. The latter function converts the
string-line to a char-list: [x\\x <-: string-line] and uses takeWhile,
toString, dropWhile and toReal in order to get the double numbers. As I
said the function is incredibly fast and takes for a 50MB file about 15
seconds.

Ocaml takes 8minutes. If I try to read the file line by line only
(without the conversion to double numbers) then Ocaml would take
 about 1 minutes. Where is the bottleneck here? List.map or what?

I think everybody has one specific task which he tries to implement in
every programming language he encounters. My specific task is this
floating-point extraction from string-files.

I didn't play around with different OCaml solutions, because I
had to play a bit with OCaml's Psilab implementation (if you need
something like Python+Numeric+Dislin you could give Psilab a try).

If you need the whole Clean program drop me a note. By the way: my
Scheme version is clumsy and is more or less similar to the OCaml
version. I wrote this verbose Scheme (Bigloo) version a year ago when I
was a beginner of Scheme. The performance of the Scheme (Bigloo) version
is about 30 seconds for this 50MB file and is therefore similar to the
C++-template version which takes about 30 seconds.

Oh yes: do not draw to close out a comment when I write "clumsy" which
implies OCaml is clumsy too; I have the strong believing that OCaml's
exception handling mechanism is more or less better than Clean's one
because Clean does not posses such a thing as exception handling, so to
speak. 

S. Gonzi

====
////////////////////////////////////////////////
// The dead as Latin functional language
// whith the most readable syntax out there
// and one of the /fastest functional languages/:
// Clean (In the meantime open(source?)
// for Linux/Unix). But as life plays:
// nobody jumps onto the Clean-bandwagon. Is this
// a pity or a bless? Why doesn't the "most"
// readable syntax plays a role in real life?
// Do not get me wrong, but why does always the
// "punctuation syntax" win in real life?
////////////////////////////////////////////////
FExtractReals:: HeaderKeys File -> [[Real]]
FExtractReals h file
		  | sfend file = []
		  # (line,nextline) = sfreadline file
		  = [(RealsFromString line h.del h.nan h.nanProxy) :
		     (FExtractReals h nextline)]



RealsFromString:: String Char String Real -> [Real]
RealsFromString line del nan nanProxy= searchDel [x\\x<-:line]
where
	searchDel:: [Char] -> [Real]
	searchDel [] = []
	searchDel linerest
		  # val = toString( takeWhile notDelNl linerest )
		  # rest = dropWhile ((<>)del) linerest
		  = [toRealNaN val nan : searchDel (drop 1 rest)]
	notDelNl::Char -> Bool
	notDelNl x
	       | x==del = False
	       | x==' ' = False
	       | x=='\t' = False
	       | x=='\n' = False
	       = True
	toRealNaN:: String String -> Real
	toRealNaN s nan
		  | s==nan = nanProxy
		  = toReal(s)
====





-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2003-05-21  7:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ocaml@tagger.yapper.org>
2003-03-31 16:51 ` [Caml-list] How can I check for the use of polymorphic equality? Neel Krishnaswami
2003-03-31 17:33   ` brogoff
2003-04-03 19:44   ` Jason Hickey
2003-04-03 20:40     ` Pierre Weis
2003-04-03 20:53       ` Chris Hecker
2003-04-04  8:46         ` Pierre Weis
2003-04-04 19:05           ` Jason Hickey
2003-04-04  9:10         ` Andreas Rossberg
2003-05-14 11:43 ` [Caml-list] ocaml and large development projects Traudt, Mark
2003-05-14 15:52   ` Jason Hickey
2003-05-18  5:32     ` Chris Hecker
2003-05-18  5:44       ` David Brown
2003-05-18  6:10         ` Chris Hecker
2003-05-18 11:13           ` John Carr
2003-05-18 16:51             ` Ed L Cashin
2003-05-18 18:08               ` Lex Stein
2003-05-18 19:08                 ` Ed L Cashin
2003-05-18 19:55                   ` Lex Stein
2003-05-19  8:13                   ` Markus Mottl
2003-05-19  8:33                     ` Nicolas Cannasse
2003-06-02 21:59                     ` John Max Skaller
2003-05-18 23:19                 ` Chris Hecker
2003-05-18 14:38           ` David Brown
2003-05-18 16:00             ` Ville-Pertti Keinonen
2003-05-19 15:36           ` Brian Hurt
2003-05-19 19:31             ` Chris Hecker
2003-05-19 23:39               ` Seth Kurtzberg
2003-05-20  8:07               ` [Caml-list] ocaml as *.so (was: ...and large development projects) Wolfgang Müller
2003-05-20  8:42                 ` [Caml-list] Reading a file Siegfried Gonzi
2003-05-20 10:21                   ` Mattias Waldau
2003-05-20 10:48                   ` Nicolas Cannasse
2003-05-20 10:55                   ` Markus Mottl
2003-05-20 13:20                   ` Michal Moskal
2003-05-20 12:21                     ` Siegfried Gonzi
2003-05-21  6:11                     ` Siegfried Gonzi [this message]
2003-05-21  6:48                       ` Siegfried Gonzi
2003-05-21  6:53                         ` Siegfried Gonzi
2003-05-21  9:16                           ` Markus Mottl
2003-05-21 10:04                             ` Eray Ozkural
2003-05-21 16:20                               ` brogoff
2003-05-21  8:21                       ` Michal Moskal
2003-05-21  7:24                         ` [Caml-list] PsiLAB works fine under Linux SuSE 8 Siegfried Gonzi
2003-05-21  9:11                       ` [Caml-list] Reading a file Markus Mottl
2003-05-22  6:27                         ` Siegfried Gonzi
2003-05-22 10:26                           ` Markus Mottl
2003-05-23  5:59                             ` Siegfried Gonzi
2003-05-23  6:04                               ` Siegfried Gonzi
2003-05-20 10:45                 ` [Caml-list] ocaml as *.so (was: ...and large development projects) Nicolas Cannasse
2003-05-20 11:17                   ` Wolfgang Müller
2003-05-20 11:31                     ` Nicolas Cannasse
2003-05-20 11:40                       ` Wolfgang Müller
2003-06-02 22:40                 ` John Max Skaller
2003-06-03 13:26                   ` [Caml-list] ocaml as *.so Remi Vanicat
2003-06-02 22:42               ` [Caml-list] ocaml and large development projects John Max Skaller
2003-06-02 21:24           ` John Max Skaller
2003-06-02 21:12       ` John Max Skaller
2003-06-03  0:31         ` Chris Hecker
2003-06-03 10:13           ` Michal Moskal
2003-06-03 18:12             ` Chris Hecker
2003-06-03 14:31           ` art yerkes
2003-06-03 21:55           ` Jason Hickey
2003-06-03 22:42             ` Chris Hecker
2003-06-06 23:46             ` John Max Skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ECB189C.5090400@stud.uni-graz.at \
    --to=siegfried.gonzi@stud.uni-graz.at \
    --cc=caml-list@inria.fr \
    --cc=malekith@pld-linux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).