caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Alex Baretta <alex@baretta.com>
To: brogoff <brogoff@speakeasy.net>
Cc: Ocaml <caml-list@inria.fr>
Subject: Re: [Caml-list] looping recursion
Date: Thu, 29 Jul 2004 08:28:00 +0200	[thread overview]
Message-ID: <410898F0.6030804@baretta.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0407280924150.10739@shell1.speakeasy.net>

brogoff wrote:
> On Wed, 28 Jul 2004, Alex Baretta wrote:
> 
>>brogoff wrote:
>>
>>>Sometimes lists are best. If lists are OK for 100, 1_000, or 10_000 items,
>>>why not 100_000 or 1_000_000? Map and friends shouldn't blow stack.
>>>
>>>-- Brian
>>
>>Actually, it's not that bad.
> 
> 
> Well, I'm still on the caml-list, so of course it isn't *that* bad. Also, as
> I said, if you need a tail recursive map over built in lists, you have at
> least two options. Unfortunately, my preference is to use Obj, which IMO
> points to a deficiency in the core language. Most, maybe all, of my issues
> with OCaml, amount to petty complaints, no showstoppers or dealbreakers.
> Hey, at least I'm not whining about the license!

Yes, I guess we simply can live with it. I can't wait to have STL 
iterators in Ocaml, but meanwhile I can live with this functional 
nonsense: lists, maps, sets... ;)

> There are quite a few cases where singly linked lists are the most efficient
> data structure, and they're just right for the problem. Streams and laziness
> are  not at all efficient in comparison. Stack overflow commonly happens whenm
> one of my users (yup, I have users of my code who aren't me ;-) decides to
> run on data much larger than I'd anticipated.

Huge lists are usually not an efficient data structure. Lists are local 
structures: at any point you can only observe a fixed number of head 
elements (usually one at a time) and an unspecified tail. Such locality 
generally makes caching the entire data structure useless. Long lists 
(lists of a priori unbounded length) arise from user input: typically 
the are obtained by reading in a file of a priori unbounded length. In 
such cases lists are very much inefficient from the point of view of 
memory complexity: lists cost O(n) where n is the size of the input; 
streams cost O(1). Both cost O(n) in terms of time complexity. Usually 
the potential speed benefit of lists is amply counterbalanced by 
reduction in memory footprint, which usually means no memory swapping in 
the kernel.

Here's an obvious example (not too obvious: I fell for it at first). I 
have a XML syntax specifying relational databases. Essentially an entire 
relational database can be represented with it. This XML lexicon is 
meant for now only for the database initial schema definition and for 
human-readable backups. I initially wrote the backup algorithm in terms 
of transformations on a list of PXP trees which were finally serialized. 
Algebraically, this is perfect. The only problem is that this algorithm 
stores the entire contents of a database in memory before serializing 
everything. Ooops, here's a SIGKILL landing in! Now I only use PXP trees 
to represent the database schema, which I immediately begin to 
serialize. All the while I pass around continuations which are invoked 
iteratively on subnodes. If a subnode is an table-data node I declare an 
SQL cursor in the database and begin walking through the table. Result, 
where my initial memory footprint bought me a SIGKILL on a 2GB server 
now I have bounded memory usage (a few megabytes). I don't swap, so the 
the backup process actually runs an order of magnitude faster. And I 
actually get my output at the end ;)

> Also, lists are builtin, and have syntactic support in OCaml, by which I mean
> nice, easy to read syntax. So the language encourages you to use them.

They are sooo cool! I actually ask all trainees to reimplement the List 
module. But then the industrial practice is to use lists only where 
there is no input or as the result of some kind of slicing or research 
applied to bigger and more efficient data structures. I actually use a 
list iteration to schedule tasks in my real-time control kernel, but of 
course, I don't expect my collegues to write more than a few tasks at a 
time for my kernel to schedule: safety, liveness and UI.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2004-07-29  6:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-27 23:43 briand
2004-07-28  0:27 ` John Prevost
2004-07-28  0:38   ` John Prevost
2004-07-28  1:17     ` skaller
2004-07-28  1:05   ` briand
2004-07-28  1:43     ` Brian Hurt
2004-07-28  2:49       ` briand
2004-07-28  3:12         ` Brian Hurt
2004-07-28  3:20         ` Brian Hurt
2004-07-28  5:54         ` brogoff
2004-07-28  7:22           ` Alex Baretta
2004-07-28 16:38             ` brogoff
2004-07-28 19:40               ` Jon Harrop
2004-07-28 20:18                 ` Brandon J. Van Every
2004-07-29  6:01                   ` Alex Baretta
2004-07-28 21:22                 ` brogoff
2004-07-29  9:13                   ` Daniel Andor
2004-07-29  9:25                     ` Keith Wansbrough
2004-07-29  9:41                       ` Nicolas Cannasse
2004-07-29  9:57                       ` Xavier Leroy
2004-07-29 10:44                         ` Daniel Andor
2004-07-29 12:56                           ` brogoff
2004-07-29 10:11                     ` skaller
2004-07-29 12:41                     ` brogoff
2004-07-29  6:28               ` Alex Baretta [this message]
2004-07-29 14:58                 ` brogoff
2004-07-29 16:12                   ` Brian Hurt
2004-07-29 17:49                     ` james woodyatt
2004-07-29 19:25                       ` Brian Hurt
2004-07-29 20:01                         ` brogoff
2004-07-30  4:42                           ` james woodyatt
2004-07-29 17:44                   ` james woodyatt
2004-07-29 23:12                     ` skaller
2004-07-29 22:42                   ` Alex Baretta
2004-07-30  2:38                     ` Corey O'Connor
     [not found]                     ` <200407300136.14042.jon@jdh30.plus.com>
2004-07-30 12:45                       ` Alex Baretta
2004-07-30 17:07                     ` brogoff
2004-07-30 18:25                       ` [Caml-list] kaplan-okasaki-tarjan deque (was "looping recursion") james woodyatt
2004-07-30 21:20                         ` brogoff
2004-07-31  5:37                           ` james woodyatt
2004-07-28  7:27       ` [Caml-list] looping recursion skaller
2004-07-28 14:36         ` Brian Hurt
2004-07-28 22:05           ` skaller
2004-07-28  0:37 ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=410898F0.6030804@baretta.com \
    --to=alex@baretta.com \
    --cc=brogoff@speakeasy.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).