caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Yang Shouxun <yangsx@fltrp.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] stack overflow
Date: Wed, 9 Apr 2003 17:23:30 +0800	[thread overview]
Message-ID: <200304091723.30890.yangsx@fltrp.com> (raw)
In-Reply-To: <20030409081451.GA18772@mail4.ai.univie.ac.at>

On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
> On Wed, 09 Apr 2003, Yang Shouxun wrote:
> > Yes, the decision tree building function is not tail recursive. I heared
> > people saying C4.5 (in C) also has stack overflow problem when the
> > training dataset becomes very large.
>
> I can't imagine that this is the problem: either the data is
> well-distributed, in which case the stack size will grow roughly
> logarithmically with the size of the data due to partitioning. And if not,
> the maximum depth of the tree is limited by the number of available input
> variables anyway. You'd need many, many thousands of those before this
> becomes a problem, which even large, industrial datasets that I know do
> not exceed.

My training data contain statistical values for word combinations (or 
collocations) extracted from a corpus. The number is indeed very large.

> > I don't know how to write a tail recursive version to build trees.
> > If there are not that many continuous attributes and the dataset is
> > not so large, the tree stops growing before stack overflow.
>
> The trick is to use continuation passing style (CPS): you pass a function
> closure (continuation) containing everything that's needed in subsequent
> computations.  Instead of returning a result, the sub-function calls the
> continuation with the result, which makes the functions tail-recursive.

I've learned this style in Scheme. Yet I feel paralyzed when trying to write 
in it to build trees. The type declaration may make my point clearer.
--8<--
type  dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
--8<--
The problems are that unless the next call returns, the tree is not complete 
yet and it may have several calls on itself.

> But anyway, I think there must be some fishy operation going on. Why not
> use the debugger to find out? Or even better: send a link to the code :-)

I suppose the program is not buggy so far as it works as expected.  It's buggy 
in the design itself: it must recurse (so far as I can implement) and it 
cannot afford recurse too deeply. Sorry, I don't have a homepage.

> > Can one know the maximal number of calls before it overflow the stack?
>
> It depends: byte-code uses its own stack, which you can query using the
> Gc-module. Otherwise, for native-code, call the ulimit-program (Unix),
> which displays resource limits including stack usage or interface to
> the system call "getrlimit".

I'm running Debian unstale. I checked just now on my laptop and "ulimit -s" 
reurned "unlimited". I suppose the desktop that actually ran the program was 
similarly configured.

> In any case, it would be interesting to see your code. Are you going to
> release it under some free license?

Yes, I'm going to release it under GPL. As you can see, I basically use free 
software and am willing to pay it back. I intend to register a project for it 
on savannah soon. Be warned that my code may look rather ugly.

I also downloaded your AIFAD and had a cursive look at it. I found it does not 
handle continuous attributes yet and your design goal is quite different from 
mine. So I wrote mine from scratch and called it DTLR (Decision Tree Learner 
for Retrieval).

If you are interested, I can send a copy to you tomorrow. It does not 
implement all the features I planned, without documentation except some 
comments, but it is enough for my own needs right now.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2003-04-09  9:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-09  2:10 Yang Shouxun
2003-04-09  2:19 ` brogoff
2003-04-09  2:45   ` Yang Shouxun
2003-04-09  8:14     ` Markus Mottl
2003-04-09  9:23       ` Yang Shouxun [this message]
2003-04-09 11:34         ` Markus Mottl
2003-04-10  4:12           ` Parallel CPS? (was Re: [Caml-list] stack overflow) Yang Shouxun
2003-04-10  4:58             ` Mike Lin
2003-04-09 14:14         ` CPS folds " Neel Krishnaswami
2003-04-09 16:54           ` brogoff
2003-04-09 17:23             ` Mike Lin
2003-04-09  2:43 ` [Caml-list] stack overflow David Brown
     [not found] ` <200304091034.45256.yangsx@fltrp.com>
     [not found]   ` <16019.34434.468479.586884@barrow.artisan.com>
2003-04-09  2:53     ` Yang Shouxun
2003-04-09  6:45 ` David Monniaux
2003-04-13 15:42 ` John Max Skaller
2006-03-31 20:44 Stack_overflow mulhern
2006-03-30 23:03 ` [Caml-list] Stack_overflow Jon Harrop
2006-03-31 21:38 ` Eric Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200304091723.30890.yangsx@fltrp.com \
    --to=yangsx@fltrp.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).