From: Yang Shouxun <yangsx@fltrp.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] stack overflow
Date: Wed, 9 Apr 2003 17:23:30 +0800 [thread overview]
Message-ID: <200304091723.30890.yangsx@fltrp.com> (raw)
In-Reply-To: <20030409081451.GA18772@mail4.ai.univie.ac.at>
On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
> On Wed, 09 Apr 2003, Yang Shouxun wrote:
> > Yes, the decision tree building function is not tail recursive. I heared
> > people saying C4.5 (in C) also has stack overflow problem when the
> > training dataset becomes very large.
>
> I can't imagine that this is the problem: either the data is
> well-distributed, in which case the stack size will grow roughly
> logarithmically with the size of the data due to partitioning. And if not,
> the maximum depth of the tree is limited by the number of available input
> variables anyway. You'd need many, many thousands of those before this
> becomes a problem, which even large, industrial datasets that I know do
> not exceed.
My training data contain statistical values for word combinations (or
collocations) extracted from a corpus. The number is indeed very large.
> > I don't know how to write a tail recursive version to build trees.
> > If there are not that many continuous attributes and the dataset is
> > not so large, the tree stops growing before stack overflow.
>
> The trick is to use continuation passing style (CPS): you pass a function
> closure (continuation) containing everything that's needed in subsequent
> computations. Instead of returning a result, the sub-function calls the
> continuation with the result, which makes the functions tail-recursive.
I've learned this style in Scheme. Yet I feel paralyzed when trying to write
in it to build trees. The type declaration may make my point clearer.
--8<--
type dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
--8<--
The problems are that unless the next call returns, the tree is not complete
yet and it may have several calls on itself.
> But anyway, I think there must be some fishy operation going on. Why not
> use the debugger to find out? Or even better: send a link to the code :-)
I suppose the program is not buggy so far as it works as expected. It's buggy
in the design itself: it must recurse (so far as I can implement) and it
cannot afford recurse too deeply. Sorry, I don't have a homepage.
> > Can one know the maximal number of calls before it overflow the stack?
>
> It depends: byte-code uses its own stack, which you can query using the
> Gc-module. Otherwise, for native-code, call the ulimit-program (Unix),
> which displays resource limits including stack usage or interface to
> the system call "getrlimit".
I'm running Debian unstale. I checked just now on my laptop and "ulimit -s"
reurned "unlimited". I suppose the desktop that actually ran the program was
similarly configured.
> In any case, it would be interesting to see your code. Are you going to
> release it under some free license?
Yes, I'm going to release it under GPL. As you can see, I basically use free
software and am willing to pay it back. I intend to register a project for it
on savannah soon. Be warned that my code may look rather ugly.
I also downloaded your AIFAD and had a cursive look at it. I found it does not
handle continuous attributes yet and your design goal is quite different from
mine. So I wrote mine from scratch and called it DTLR (Decision Tree Learner
for Retrieval).
If you are interested, I can send a copy to you tomorrow. It does not
implement all the features I planned, without documentation except some
comments, but it is enough for my own needs right now.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
next prev parent reply other threads:[~2003-04-09 9:18 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-09 2:10 Yang Shouxun
2003-04-09 2:19 ` brogoff
2003-04-09 2:45 ` Yang Shouxun
2003-04-09 8:14 ` Markus Mottl
2003-04-09 9:23 ` Yang Shouxun [this message]
2003-04-09 11:34 ` Markus Mottl
2003-04-10 4:12 ` Parallel CPS? (was Re: [Caml-list] stack overflow) Yang Shouxun
2003-04-10 4:58 ` Mike Lin
2003-04-09 14:14 ` CPS folds " Neel Krishnaswami
2003-04-09 16:54 ` brogoff
2003-04-09 17:23 ` Mike Lin
2003-04-09 2:43 ` [Caml-list] stack overflow David Brown
[not found] ` <200304091034.45256.yangsx@fltrp.com>
[not found] ` <16019.34434.468479.586884@barrow.artisan.com>
2003-04-09 2:53 ` Yang Shouxun
2003-04-09 6:45 ` David Monniaux
2003-04-13 15:42 ` John Max Skaller
2006-03-31 20:44 Stack_overflow mulhern
2006-03-30 23:03 ` [Caml-list] Stack_overflow Jon Harrop
2006-03-31 21:38 ` Eric Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200304091723.30890.yangsx@fltrp.com \
--to=yangsx@fltrp.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).