caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Eric Stokes <eric.stokes@csun.edu>
To: "André Luiz Moura" <aluizmoura@yahoo.com.br>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Reading a large text file
Date: Sun, 16 May 2004 22:28:26 -0700	[thread overview]
Message-ID: <0604BD51-A7C3-11D8-B8C2-000A95A1E69A@csun.edu> (raw)
In-Reply-To: <Pine.LNX.4.44.0405010847480.9460-100000@localhost.localdomain>

If you don't care about line breaks you might try Buffer.add_channel. 
This should perform
close to O(1) for the append operations, and it will bypass the call to 
input_line, which I have
found to be an order of magnitude slower than the read system call. You 
can always break
the lines up later.

On May 1, 2004, at 7:03 AM, Brian Hurt wrote:

>
> I'm still digging through a serious backlog of email, so your question 
> may
> have already been answered.  If so, ignore this.
>
> But what I'm guessing is happening is that you are appending (adding to
> the end of) your list, and that this is what is killing you.  To add an
> element to the *end* of a list, Ocaml has to completely reallocate the
> entire list- turning what you might think is an O(1) operation into an
> O(n) operation.
>
> The solution to this is to build the list backwards- instead of adding
> things to the end of the list, add them to the begining.  Then, when
> you're done, just reverse the list using List.rev.
>
> Lets look at the example of just reading the lines from a file and 
> making
> a list of them.  This code is bad:
>
> let readfile chan =
>     let rec loop lst =
>         try
>             loop (lst @ [ (input_line chan) ])
>         with
>             | End_of_file -> lst
>     in
>     loop []
> ;;
>
> It works, but to read n lines requires O(n^2) work, because the @ has 
> to
> reallocate the entire list every iteration.  Worse yet it isn't tail
> recursive (a recursive call inside a try/catch isn't a tail call).
>
> A better implementation of this would be:
> let readfile chan =
>     let rec loop rlst =
>        let line, eof =
>            try
>                (input_line chan), false
>            with
>                | End_of_file -> "", true
>        in
>        if not eof then
>            loop (line :: rlst)
>        else
>            List.rev rlst
>    in
>    loop []
> ;;
>
> Now, the first thing to notice is that we're using the :: operator 
> (which
> is O(1)), not the @ operator (which is O(n))- we're prepending things 
> onto
> the list, not appending them.  We're building up the list backwards, 
> and
> then, when we hit the end of the file, reversing the entire list.  
> This is
> a standard pattern in Ocaml.
>
> The second thing to notice is that the recursive call has been hoisted 
> up
> out of the try/catch block.  We've introduced a new boolean flag to 
> note
> when we've hit the end of the file- catching the exception simply sets 
> the
> flag to true.  So now our function is tail recursive, and operates in
> constant stack space.
>
> Brian
>
> On Wed, 28 Apr 2004, André Luiz Moura wrote:
>
>> Hi List,
>>
>> I wrote an OCaml function to read a text file with a million of lines 
>> with  five separate columns for spaces. I based myself on previous 
>> messages of Caml-list, but however the work is taking time frightful 
>> (many minutes).
>> This function written in C, for example, does not take more than 4 
>> seconds.
>>
>> I am executing the program in a PC Pentium III of 128 MB of RAM under 
>> Windows platform. How would be implemented the function to decide 
>> this problem?
>>
>> File format:
>> <vertex #> <x> <y> [attributes] [boundary marker]
>>
>> Thanks,
>>
>> André
>>
>>
>>
>>
>>
>>
>> ---------------------------------
>> Yahoo! Messenger - Fale com seus amigos online. Instale agora!
>
> -- 
> "Usenet is like a herd of performing elephants with diarrhea -- 
> massive,
> difficult to redirect, awe-inspiring, entertaining, and a source of
> mind-boggling amounts of excrement when you least expect it."
>                                 - Gene Spafford
> Brian
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: 
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


      parent reply	other threads:[~2004-05-17  5:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-28 15:28 André Luiz Moura
2004-04-28 16:28 ` Richard Jones
2004-05-01 14:03 ` Brian Hurt
2004-05-01 15:43   ` Rahul Siddharthan
2004-05-01 16:00     ` [Caml-list] [OcamlDoc] langage support sejourne kevin
2004-05-14  7:15       ` Maxence Guesdon
2004-05-01 18:05     ` [Caml-list] Reading a large text file Richard Jones
2004-05-01 18:25       ` Charles Forsyth
2004-05-01 19:25       ` skaller
2004-05-01 19:51         ` Alain.Frisch
2004-05-01 20:40           ` skaller
2004-05-01 21:11             ` [Caml-list] Private types skaller
2004-05-01 21:33             ` [Caml-list] Reading a large text file Alain.Frisch
2004-05-17  5:28   ` Eric Stokes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0604BD51-A7C3-11D8-B8C2-000A95A1E69A@csun.edu \
    --to=eric.stokes@csun.edu \
    --cc=aluizmoura@yahoo.com.br \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).