caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Reading a large text file
@ 2004-04-28 15:28 André Luiz Moura
  2004-04-28 16:28 ` Richard Jones
  2004-05-01 14:03 ` Brian Hurt
  0 siblings, 2 replies; 14+ messages in thread
From: André Luiz Moura @ 2004-04-28 15:28 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

Hi List, 
 
I wrote an OCaml function to read a text file with a million of lines with  five separate columns for spaces. I based myself on previous messages of Caml-list, but however the work is taking time frightful (many minutes). 
This function written in C, for example, does not take more than 4 seconds. 
 
I am executing the program in a PC Pentium III of 128 MB of RAM under Windows platform. How would be implemented the function to decide this problem?

File format:
<vertex #> <x> <y> [attributes] [boundary marker] 

Thanks, 

André






---------------------------------
Yahoo! Messenger - Fale com seus amigos online. Instale agora!

[-- Attachment #2: Type: text/html, Size: 974 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-04-28 15:28 [Caml-list] Reading a large text file André Luiz Moura
@ 2004-04-28 16:28 ` Richard Jones
  2004-05-01 14:03 ` Brian Hurt
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Jones @ 2004-04-28 16:28 UTC (permalink / raw)
  Cc: caml-list

On Wed, Apr 28, 2004 at 12:28:13PM -0300, Andr? Luiz Moura wrote:
> Hi List, 
>  
> I wrote an OCaml function to read a text file with a million of lines with  five separate columns for spaces. I based myself on previous messages of Caml-list, but however the work is taking time frightful (many minutes). 
> This function written in C, for example, does not take more than 4 seconds. 
>  
> I am executing the program in a PC Pentium III of 128 MB of RAM under Windows platform. How would be implemented the function to decide this problem?
> 
> File format:
> <vertex #> <x> <y> [attributes] [boundary marker] 

Post some example code?

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
NET::FTPSERVER is a full-featured, secure, configurable, database-backed
FTP server written in Perl: http://www.annexia.org/freeware/netftpserver/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-04-28 15:28 [Caml-list] Reading a large text file André Luiz Moura
  2004-04-28 16:28 ` Richard Jones
@ 2004-05-01 14:03 ` Brian Hurt
  2004-05-01 15:43   ` Rahul Siddharthan
  2004-05-17  5:28   ` Eric Stokes
  1 sibling, 2 replies; 14+ messages in thread
From: Brian Hurt @ 2004-05-01 14:03 UTC (permalink / raw)
  To: André Luiz Moura; +Cc: caml-list


I'm still digging through a serious backlog of email, so your question may 
have already been answered.  If so, ignore this.

But what I'm guessing is happening is that you are appending (adding to 
the end of) your list, and that this is what is killing you.  To add an 
element to the *end* of a list, Ocaml has to completely reallocate the 
entire list- turning what you might think is an O(1) operation into an 
O(n) operation.

The solution to this is to build the list backwards- instead of adding 
things to the end of the list, add them to the begining.  Then, when 
you're done, just reverse the list using List.rev.

Lets look at the example of just reading the lines from a file and making 
a list of them.  This code is bad:

let readfile chan =
    let rec loop lst =
        try
            loop (lst @ [ (input_line chan) ])
        with
            | End_of_file -> lst
    in
    loop []
;;

It works, but to read n lines requires O(n^2) work, because the @ has to
reallocate the entire list every iteration.  Worse yet it isn't tail
recursive (a recursive call inside a try/catch isn't a tail call).

A better implementation of this would be:
let readfile chan =
    let rec loop rlst =
       let line, eof = 
           try
               (input_line chan), false
           with
               | End_of_file -> "", true
       in
       if not eof then
           loop (line :: rlst)
       else
           List.rev rlst
   in
   loop []
;;

Now, the first thing to notice is that we're using the :: operator (which
is O(1)), not the @ operator (which is O(n))- we're prepending things onto
the list, not appending them.  We're building up the list backwards, and
then, when we hit the end of the file, reversing the entire list.  This is
a standard pattern in Ocaml.

The second thing to notice is that the recursive call has been hoisted up 
out of the try/catch block.  We've introduced a new boolean flag to note 
when we've hit the end of the file- catching the exception simply sets the 
flag to true.  So now our function is tail recursive, and operates in 
constant stack space.

Brian

On Wed, 28 Apr 2004, André Luiz Moura wrote:

> Hi List, 
>  
> I wrote an OCaml function to read a text file with a million of lines with  five separate columns for spaces. I based myself on previous messages of Caml-list, but however the work is taking time frightful (many minutes). 
> This function written in C, for example, does not take more than 4 seconds. 
>  
> I am executing the program in a PC Pentium III of 128 MB of RAM under Windows platform. How would be implemented the function to decide this problem?
> 
> File format:
> <vertex #> <x> <y> [attributes] [boundary marker] 
> 
> Thanks, 
> 
> André
> 
> 
> 
> 
> 
> 
> ---------------------------------
> Yahoo! Messenger - Fale com seus amigos online. Instale agora!

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 14:03 ` Brian Hurt
@ 2004-05-01 15:43   ` Rahul Siddharthan
  2004-05-01 16:00     ` [Caml-list] [OcamlDoc] langage support sejourne kevin
  2004-05-01 18:05     ` [Caml-list] Reading a large text file Richard Jones
  2004-05-17  5:28   ` Eric Stokes
  1 sibling, 2 replies; 14+ messages in thread
From: Rahul Siddharthan @ 2004-05-01 15:43 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Andr? Luiz Moura, caml-list

Brian Hurt said on May  1, 2004 at 09:03:56:
> But what I'm guessing is happening is that you are appending (adding to 
> the end of) your list, and that this is what is killing you.  To add an 
> element to the *end* of a list, Ocaml has to completely reallocate the 
> entire list- turning what you might think is an O(1) operation into an 
> O(n) operation.

I'm pretty puzzled by that: why would it have to do that?  Arrays,
yes, but lists, can't it just traverse the existing list to its end
and then add a new element?  It's still O(n) but no reallocation.

> The solution to this is to build the list backwards- instead of adding 
> things to the end of the list, add them to the begining.  Then, when 
> you're done, just reverse the list using List.rev.

Yes that seems best.

Rahul

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Caml-list] [OcamlDoc] langage support.
  2004-05-01 15:43   ` Rahul Siddharthan
@ 2004-05-01 16:00     ` sejourne kevin
  2004-05-14  7:15       ` Maxence Guesdon
  2004-05-01 18:05     ` [Caml-list] Reading a large text file Richard Jones
  1 sibling, 1 reply; 14+ messages in thread
From: sejourne kevin @ 2004-05-01 16:00 UTC (permalink / raw)
  To: caml-list

Hi evrybody!

Is there a way in Ocamldoc to make multilingual
documentation ? I think of something like that :

(**
	[En] Reverse a list.
	[Fr] Retourne une liste.
	[Es] ...
*)
Let rev = List.rev ;;

And generate documentation file in evry langage use in
the code?

Kevin


	

	
		
Yahoo! Mail : votre e-mail personnel et gratuit qui vous suit partout ! 
Créez votre Yahoo! Mail sur http://fr.benefits.yahoo.com/

Dialoguez en direct avec vos amis grâce à Yahoo! Messenger !Téléchargez Yahoo! Messenger sur http://fr.messenger.yahoo.com

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 15:43   ` Rahul Siddharthan
  2004-05-01 16:00     ` [Caml-list] [OcamlDoc] langage support sejourne kevin
@ 2004-05-01 18:05     ` Richard Jones
  2004-05-01 18:25       ` Charles Forsyth
  2004-05-01 19:25       ` skaller
  1 sibling, 2 replies; 14+ messages in thread
From: Richard Jones @ 2004-05-01 18:05 UTC (permalink / raw)
  Cc: caml-list

On Sat, May 01, 2004 at 11:43:51AM -0400, Rahul Siddharthan wrote:
> Brian Hurt said on May  1, 2004 at 09:03:56:
> > But what I'm guessing is happening is that you are appending (adding to 
> > the end of) your list, and that this is what is killing you.  To add an 
> > element to the *end* of a list, Ocaml has to completely reallocate the 
> > entire list- turning what you might think is an O(1) operation into an 
> > O(n) operation.
> 
> I'm pretty puzzled by that: why would it have to do that?  Arrays,
> yes, but lists, can't it just traverse the existing list to its end
> and then add a new element?  It's still O(n) but no reallocation.

The short answer is no, because in OCaml (unlike in LISP) lists are
immutable.  In LISP terminology, there's no way to 'set cdr' on an
OCaml 'cons structure'.  The disadvantage to this is that you can't do
certain destructive operations on lists, like you can so easily in
LISP.  The advantage is that you can't do certain destructive
operations on lists!  In other words, your code is more likely to be
bug free.

However, OCaml is a practical language and so allows you to create a
mutable cons structure if you so desire.  eg:

# type 'a mylist = { head : 'a; mutable tail : 'a mylist option };;
type 'a mylist = { head : 'a; mutable tail : 'a mylist option; }

# let xs = { head = 10; tail = None };;
val xs : int mylist = {head = 10; tail = None}
# let xs = { head = 5; tail = Some xs};;
val xs : int mylist = {head = 5; tail = Some {head = 10; tail = None}}

# let ys = { head = 20; tail = None };;		(* new tail *)
val ys : int mylist = {head = 20; tail = None}

# xs.tail <- Some ys;;				(* replace tail of xs *)
- : unit = ()
# xs;;
- : int mylist = {head = 5; tail = Some {head = 20; tail = None}}

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
MAKE+ is a sane replacement for GNU autoconf/automake. One script compiles,
RPMs, pkgs etc. Linux, BSD, Solaris. http://www.annexia.org/freeware/makeplus/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 18:05     ` [Caml-list] Reading a large text file Richard Jones
@ 2004-05-01 18:25       ` Charles Forsyth
  2004-05-01 19:25       ` skaller
  1 sibling, 0 replies; 14+ messages in thread
From: Charles Forsyth @ 2004-05-01 18:25 UTC (permalink / raw)
  To: caml-list

>>The short answer is no, because in OCaml (unlike in LISP) lists are
>>immutable.  In LISP terminology, there's no way to 'set cdr' on an
>>OCaml 'cons structure'.  The disadvantage to this is that you can't do
>>certain destructive operations on lists, like you can so easily in
>>LISP.  The advantage is that you can't do certain destructive
>>operations on lists!  In other words, your code is more likely to be
>>bug free.

the concurrent programming language Limbo does much the same,
and has a similar penalty for appending to a list.  nevertheless, it has
advantages as you say, and i thought i'd add that the property
turns out to be quite helpful in concurrent programs, because you
can pass another process the current value of a list (for instance by sending
it on a channel) and be sure it always sees the `right' value.
i've found it particularly useful for lock-free concurrent access to caches
implemented by hash tables (array of list of T), when the cache
acts as a hint (as for instance in DNS implementation).
as you say, you can always program a mutable list when you need one.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 18:05     ` [Caml-list] Reading a large text file Richard Jones
  2004-05-01 18:25       ` Charles Forsyth
@ 2004-05-01 19:25       ` skaller
  2004-05-01 19:51         ` Alain.Frisch
  1 sibling, 1 reply; 14+ messages in thread
From: skaller @ 2004-05-01 19:25 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

On Sun, 2004-05-02 at 04:05, Richard Jones wrote:
> On Sat, May 01, 2004 at 11:43:51AM -0400, Rahul Siddharthan wrote:

> The short answer is no, because in OCaml (unlike in LISP) lists are
> immutable. 

> However, OCaml is a practical language and so allows you to create a
> mutable cons structure if you so desire.  eg:

It can now do slightly better than that. It is possible to use
the new 'private' keyword to *guard* your mutable list.

module MList = struct 
type 'a mylist = private { head : 'a; mutable tail : 'a mylist option; }
..
let splice a b = ...(* makes a new mylist of a @ b *)
end

Here, lists are immutable *publically*. However the splice
function provides a concatenation of two ordinary lists
into a mylist in 1 pass, with O(1) stack usage <grin>
and the result is in forward order, not reversed.

The list is still immutable! You can't modify it.
The mutable field is only used during construction
to append to the end, preserving the forward ordering.

Its possible for C language implementations to do
that kind of thing right now for actual Ocaml lists.
But now you can play the game in Ocaml.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 19:25       ` skaller
@ 2004-05-01 19:51         ` Alain.Frisch
  2004-05-01 20:40           ` skaller
  0 siblings, 1 reply; 14+ messages in thread
From: Alain.Frisch @ 2004-05-01 19:51 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

On 2 May 2004, skaller wrote:

> It can now do slightly better than that. It is possible to use
> the new 'private' keyword to *guard* your mutable list.
>
> module MList = struct
> type 'a mylist = private { head : 'a; mutable tail : 'a mylist option; }
> ..
> let splice a b = ...(* makes a new mylist of a @ b *)
> end
> Here, lists are immutable *publically*.

Not quite.

First, the "private" keyword should be used only in the signature, not the
structure, otherwise the implementation of the module has no special
right. Something like:

module M :
 sig type t = private *** end
=
 struct type t = *** end

Second, the client cannot create values of the "private" type.  This is
not related at all with the mutability of the fields. It can only
deconstruct values (with pattern matching). So you have to expose
constructors explicitly (splice is one of them, but you probably want a
simple Cons).


-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 19:51         ` Alain.Frisch
@ 2004-05-01 20:40           ` skaller
  2004-05-01 21:11             ` [Caml-list] Private types skaller
  2004-05-01 21:33             ` [Caml-list] Reading a large text file Alain.Frisch
  0 siblings, 2 replies; 14+ messages in thread
From: skaller @ 2004-05-01 20:40 UTC (permalink / raw)
  To: Alain.Frisch; +Cc: caml-list

On Sun, 2004-05-02 at 05:51, Alain.Frisch@ens.fr wrote:
> On 2 May 2004, skaller wrote:
> 
> > It can now do slightly better than that. It is possible to use
> > the new 'private' keyword to *guard* your mutable list.
> >
> > module MList = struct
> > type 'a mylist = private { head : 'a; mutable tail : 'a mylist option; }
> > ..
> > let splice a b = ...(* makes a new mylist of a @ b *)
> > end
> > Here, lists are immutable *publically*.
> 
> Not quite.
> 
> First, the "private" keyword should be used only in the signature, not the
> structure, otherwise the implementation of the module has no special
> right. Something like:
> 
> module M :
>  sig type t = private *** end
> =
>  struct type t = *** end

Thanks for the correction. Also note I chose a *really bad name*
when I called it 'splice': the intent was to construct
a new list as stated, so I should have just called it 'concat'.

Of course you could make a list with an actual splice mutator,
but then it wouldn't be immutable.

> Second, the client cannot create values of the "private" type.  This is
> not related at all with the mutability of the fields. It can only
> deconstruct values (with pattern matching). 

Right. Well stated.

Hmm ..

 module M : sig
  ?? type 'a t = private 'a list
    val uniq_cons: 'a t -> 'a -> 'a list
    val empty: unit -> 'a list
  end = struct
    type 'a t = 'a list
    let uniq_cons lst a = if List.mem a lst then lst else a::lst
    let empty () = []
  end
  ;;              ;;
Syntax error: 'end' expected, the highlighted 'sig' might be unmatched

Arggg.. what's wrong here? It has nothing to do with
either private or type variables, and the above sig
fails immediately in a module type statement too.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Caml-list] Private types
  2004-05-01 20:40           ` skaller
@ 2004-05-01 21:11             ` skaller
  2004-05-01 21:33             ` [Caml-list] Reading a large text file Alain.Frisch
  1 sibling, 0 replies; 14+ messages in thread
From: skaller @ 2004-05-01 21:11 UTC (permalink / raw)
  To: caml-list

Ahh, well this works:

module type M_t = sig
type 'a t = private { x: 'a list }
val uniq_cons: 'a t -> 'a -> 'a t
val empty: unit -> 'a t
end
 
module M : M_t = struct
type 'a t = { x : 'a list }
let uniq_cons lst a = if List.mem a lst.x then lst else { x = a::lst.x }
let empty () = { x = [] }
end
 
let x = M.empty ()
let x = M.uniq_cons x 1
let x = M.uniq_cons x 1
let x = M.uniq_cons x 2
 
let print { M.x=x } = print_endline (
  "[" ^
  String.concat "," (List.map string_of_int x) ^
  "]"
)
;;                                                                            print x
;;

[skaller@pelican] /mnt/user2/work/flx>ocaml xx.ml
[2,1]

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 20:40           ` skaller
  2004-05-01 21:11             ` [Caml-list] Private types skaller
@ 2004-05-01 21:33             ` Alain.Frisch
  1 sibling, 0 replies; 14+ messages in thread
From: Alain.Frisch @ 2004-05-01 21:33 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

On 2 May 2004, skaller wrote:

>   ?? type 'a t = private 'a list
>
> Arggg.. what's wrong here?

As stated in the manual, "private" can only be applied to the type
*representation* (sum type, record type), not the type equation.

http://caml.inria.fr/ocaml/htmlman/manual021.html

-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] [OcamlDoc] langage support.
  2004-05-01 16:00     ` [Caml-list] [OcamlDoc] langage support sejourne kevin
@ 2004-05-14  7:15       ` Maxence Guesdon
  0 siblings, 0 replies; 14+ messages in thread
From: Maxence Guesdon @ 2004-05-14  7:15 UTC (permalink / raw)
  To: sejourne kevin, caml-list

On Sat, 1 May 2004 18:00:24 +0200 (CEST)
sejourne kevin <sejourne_kevin@yahoo.fr> wrote:

> Hi evrybody!
> 
> Is there a way in Ocamldoc to make multilingual
> documentation ? I think of something like that :
> 
> (**
> 	[En] Reverse a list.
> 	[Fr] Retourne une liste.
> 	[Es] ...
> *)
> Let rev = List.rev ;;
> 
> And generate documentation file in evry langage use in
> the code?

No there is no such feature. You can add it yourself in a custom
generator by handling @fr, @en, @es, ..., tags.

Hope this helps,

-- 
Maxence Guesdon

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Reading a large text file
  2004-05-01 14:03 ` Brian Hurt
  2004-05-01 15:43   ` Rahul Siddharthan
@ 2004-05-17  5:28   ` Eric Stokes
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Stokes @ 2004-05-17  5:28 UTC (permalink / raw)
  To: André Luiz Moura; +Cc: caml-list

If you don't care about line breaks you might try Buffer.add_channel. 
This should perform
close to O(1) for the append operations, and it will bypass the call to 
input_line, which I have
found to be an order of magnitude slower than the read system call. You 
can always break
the lines up later.

On May 1, 2004, at 7:03 AM, Brian Hurt wrote:

>
> I'm still digging through a serious backlog of email, so your question 
> may
> have already been answered.  If so, ignore this.
>
> But what I'm guessing is happening is that you are appending (adding to
> the end of) your list, and that this is what is killing you.  To add an
> element to the *end* of a list, Ocaml has to completely reallocate the
> entire list- turning what you might think is an O(1) operation into an
> O(n) operation.
>
> The solution to this is to build the list backwards- instead of adding
> things to the end of the list, add them to the begining.  Then, when
> you're done, just reverse the list using List.rev.
>
> Lets look at the example of just reading the lines from a file and 
> making
> a list of them.  This code is bad:
>
> let readfile chan =
>     let rec loop lst =
>         try
>             loop (lst @ [ (input_line chan) ])
>         with
>             | End_of_file -> lst
>     in
>     loop []
> ;;
>
> It works, but to read n lines requires O(n^2) work, because the @ has 
> to
> reallocate the entire list every iteration.  Worse yet it isn't tail
> recursive (a recursive call inside a try/catch isn't a tail call).
>
> A better implementation of this would be:
> let readfile chan =
>     let rec loop rlst =
>        let line, eof =
>            try
>                (input_line chan), false
>            with
>                | End_of_file -> "", true
>        in
>        if not eof then
>            loop (line :: rlst)
>        else
>            List.rev rlst
>    in
>    loop []
> ;;
>
> Now, the first thing to notice is that we're using the :: operator 
> (which
> is O(1)), not the @ operator (which is O(n))- we're prepending things 
> onto
> the list, not appending them.  We're building up the list backwards, 
> and
> then, when we hit the end of the file, reversing the entire list.  
> This is
> a standard pattern in Ocaml.
>
> The second thing to notice is that the recursive call has been hoisted 
> up
> out of the try/catch block.  We've introduced a new boolean flag to 
> note
> when we've hit the end of the file- catching the exception simply sets 
> the
> flag to true.  So now our function is tail recursive, and operates in
> constant stack space.
>
> Brian
>
> On Wed, 28 Apr 2004, André Luiz Moura wrote:
>
>> Hi List,
>>
>> I wrote an OCaml function to read a text file with a million of lines 
>> with  five separate columns for spaces. I based myself on previous 
>> messages of Caml-list, but however the work is taking time frightful 
>> (many minutes).
>> This function written in C, for example, does not take more than 4 
>> seconds.
>>
>> I am executing the program in a PC Pentium III of 128 MB of RAM under 
>> Windows platform. How would be implemented the function to decide 
>> this problem?
>>
>> File format:
>> <vertex #> <x> <y> [attributes] [boundary marker]
>>
>> Thanks,
>>
>> André
>>
>>
>>
>>
>>
>>
>> ---------------------------------
>> Yahoo! Messenger - Fale com seus amigos online. Instale agora!
>
> -- 
> "Usenet is like a herd of performing elephants with diarrhea -- 
> massive,
> difficult to redirect, awe-inspiring, entertaining, and a source of
> mind-boggling amounts of excrement when you least expect it."
>                                 - Gene Spafford
> Brian
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: 
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-05-17  5:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-28 15:28 [Caml-list] Reading a large text file André Luiz Moura
2004-04-28 16:28 ` Richard Jones
2004-05-01 14:03 ` Brian Hurt
2004-05-01 15:43   ` Rahul Siddharthan
2004-05-01 16:00     ` [Caml-list] [OcamlDoc] langage support sejourne kevin
2004-05-14  7:15       ` Maxence Guesdon
2004-05-01 18:05     ` [Caml-list] Reading a large text file Richard Jones
2004-05-01 18:25       ` Charles Forsyth
2004-05-01 19:25       ` skaller
2004-05-01 19:51         ` Alain.Frisch
2004-05-01 20:40           ` skaller
2004-05-01 21:11             ` [Caml-list] Private types skaller
2004-05-01 21:33             ` [Caml-list] Reading a large text file Alain.Frisch
2004-05-17  5:28   ` Eric Stokes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).