caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] stack overflow
@ 2003-04-09  2:10 Yang Shouxun
  2003-04-09  2:19 ` brogoff
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Yang Shouxun @ 2003-04-09  2:10 UTC (permalink / raw)
  To: caml-list

Dear OCaml users,

I've written a modified version of C4.5 program in OCaml. However, when the 
input is big, say over 50000, the program (native code on Debian) died for 
stack overflow. Otherwise, it runs as expected.

Can anybody explain possible reasons causing stack overflow in OCaml?

Thanks for your time!
shouxun

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  2:10 [Caml-list] stack overflow Yang Shouxun
@ 2003-04-09  2:19 ` brogoff
  2003-04-09  2:45   ` Yang Shouxun
  2003-04-09  2:43 ` [Caml-list] stack overflow David Brown
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: brogoff @ 2003-04-09  2:19 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

The most likely explanation is that you created a very large list, say of 
length over 50_000, and tried to apply some non-tail-recursive operation to it, 
perhaps even implicitly. There was a very recent thread on this topic. 

The second explanation is that you wrote some (non-tail) recursive function 
and it blew the stack. 

If you switch to the bytecode compiler and run with stack trace, you'll find out 
pretty quickly. 

-- Brian


On Wed, 9 Apr 2003, Yang Shouxun wrote:

> Dear OCaml users,
> 
> I've written a modified version of C4.5 program in OCaml. However, when the 
> input is big, say over 50000, the program (native code on Debian) died for 
> stack overflow. Otherwise, it runs as expected.
> 
> Can anybody explain possible reasons causing stack overflow in OCaml?
> 
> Thanks for your time!
> shouxun
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  2:10 [Caml-list] stack overflow Yang Shouxun
  2003-04-09  2:19 ` brogoff
@ 2003-04-09  2:43 ` David Brown
       [not found] ` <200304091034.45256.yangsx@fltrp.com>
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: David Brown @ 2003-04-09  2:43 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

On Wed, Apr 09, 2003 at 10:10:37AM +0800, Yang Shouxun wrote:

> I've written a modified version of C4.5 program in OCaml. However, when the 
> input is big, say over 50000, the program (native code on Debian) died for 
> stack overflow. Otherwise, it runs as expected.
> 
> Can anybody explain possible reasons causing stack overflow in OCaml?

Where do you catch the End_of_file exception.  A common mistake is to
add a try clause inside of a tail recursive call.  The call is no longer
tail-recursive, and makes a chain of exception handlers for each
invocation.  e.g.

   let rec loop () =
     try let line = input_line () in
     ...;
     loop ()
     with End_of_file -> ...

instead of

   let rec loop () =
     let line = input_line () in
     ...;
     loop ()
   in
   try loop ()
   with End_of_file -> ...

Dave

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  2:19 ` brogoff
@ 2003-04-09  2:45   ` Yang Shouxun
  2003-04-09  8:14     ` Markus Mottl
  0 siblings, 1 reply; 17+ messages in thread
From: Yang Shouxun @ 2003-04-09  2:45 UTC (permalink / raw)
  To: caml-list

On Wednesday 09 April 2003 10:19, brogoff@speakeasy.net wrote:
> The most likely explanation is that you created a very large list, say of
> length over 50_000, and tried to apply some non-tail-recursive operation to
> it, perhaps even implicitly. There was a very recent thread on this topic.
>
> The second explanation is that you wrote some (non-tail) recursive function
> and it blew the stack.

Yes, the decision tree building function is not tail recursive. I heared 
people saying C4.5 (in C) also has stack overflow problem when the training 
dataset becomes very large.

I don't know how to write a tail recursive version to build trees.  If there 
are not that many continuous attributes and the dataset is not so large, the 
tree stops growing before stack overflow.

Can one know the maximal number of calls before it overflow the stack?

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
       [not found]   ` <16019.34434.468479.586884@barrow.artisan.com>
@ 2003-04-09  2:53     ` Yang Shouxun
  0 siblings, 0 replies; 17+ messages in thread
From: Yang Shouxun @ 2003-04-09  2:53 UTC (permalink / raw)
  To: caml-list

On Wednesday 09 April 2003 10:33, John Gerard Malecki wrote:
> Yang Shouxun wrote (2003-04-09T10:34:45+0800):
>  > On Wednesday 09 April 2003 10:17, John Gerard Malecki wrote:
>  > > What is C4.5?
>  >
>  > It's a decision tree learner.
>
> Interesting.  Do you have a web site where I can learn more?

You can download the source code for C4.5 from 
http://www.cse.unsw.edu.au/~quinlan/ and some papers about it. Markus wrote 
AIFAD (http://www.oefai.at/~markus/aifad).

> I notice that Brian Rogoff just published some info which is very
> handy.  Use the byte-code compiler and run with the environment
> variable OCAMLRUNPARAM='b=1' and you will get a stack trace.  It could
> be that the problem is not your code but the library routines which
> you are using.
>
> Good luck.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  2:10 [Caml-list] stack overflow Yang Shouxun
                   ` (2 preceding siblings ...)
       [not found] ` <200304091034.45256.yangsx@fltrp.com>
@ 2003-04-09  6:45 ` David Monniaux
  2003-04-13 15:42 ` John Max Skaller
  4 siblings, 0 replies; 17+ messages in thread
From: David Monniaux @ 2003-04-09  6:45 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

On Wed, 9 Apr 2003, Yang Shouxun wrote:

> I've written a modified version of C4.5 program in OCaml. However, when the 
> input is big, say over 50000, the program (native code on Debian) died for 
> stack overflow. Otherwise, it runs as expected.

In the case of native code, OCaml uses the ordinary system-provided stack.
Please make sure the system stack size limit is not too low (ulimit -a on
certain shells to see it, ulimit -s to set it, in kilobytes).

David Monniaux            http://www.di.ens.fr/~monniaux
Laboratoire d'informatique de l'École Normale Supérieure,
Paris, France

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  2:45   ` Yang Shouxun
@ 2003-04-09  8:14     ` Markus Mottl
  2003-04-09  9:23       ` Yang Shouxun
  0 siblings, 1 reply; 17+ messages in thread
From: Markus Mottl @ 2003-04-09  8:14 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

On Wed, 09 Apr 2003, Yang Shouxun wrote:
> Yes, the decision tree building function is not tail recursive. I heared 
> people saying C4.5 (in C) also has stack overflow problem when the training 
> dataset becomes very large.

I can't imagine that this is the problem: either the data is
well-distributed, in which case the stack size will grow roughly
logarithmically with the size of the data due to partitioning. And if not,
the maximum depth of the tree is limited by the number of available input
variables anyway. You'd need many, many thousands of those before this
becomes a problem, which even large, industrial datasets that I know do
not exceed.

> I don't know how to write a tail recursive version to build trees.
> If there are not that many continuous attributes and the dataset is
> not so large, the tree stops growing before stack overflow.

The trick is to use continuation passing style (CPS): you pass a function
closure (continuation) containing everything that's needed in subsequent
computations.  Instead of returning a result, the sub-function calls the
continuation with the result, which makes the functions tail-recursive.

But anyway, I think there must be some fishy operation going on. Why not
use the debugger to find out? Or even better: send a link to the code :-)

> Can one know the maximal number of calls before it overflow the stack?

It depends: byte-code uses its own stack, which you can query using the
Gc-module. Otherwise, for native-code, call the ulimit-program (Unix),
which displays resource limits including stack usage or interface to
the system call "getrlimit".

In any case, it would be interesting to see your code. Are you going to
release it under some free license?

Regards,
Markus Mottl

-- 
Markus Mottl          http://www.oefai.at/~markus          markus@oefai.at

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  8:14     ` Markus Mottl
@ 2003-04-09  9:23       ` Yang Shouxun
  2003-04-09 11:34         ` Markus Mottl
  2003-04-09 14:14         ` CPS folds " Neel Krishnaswami
  0 siblings, 2 replies; 17+ messages in thread
From: Yang Shouxun @ 2003-04-09  9:23 UTC (permalink / raw)
  To: caml-list

On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
> On Wed, 09 Apr 2003, Yang Shouxun wrote:
> > Yes, the decision tree building function is not tail recursive. I heared
> > people saying C4.5 (in C) also has stack overflow problem when the
> > training dataset becomes very large.
>
> I can't imagine that this is the problem: either the data is
> well-distributed, in which case the stack size will grow roughly
> logarithmically with the size of the data due to partitioning. And if not,
> the maximum depth of the tree is limited by the number of available input
> variables anyway. You'd need many, many thousands of those before this
> becomes a problem, which even large, industrial datasets that I know do
> not exceed.

My training data contain statistical values for word combinations (or 
collocations) extracted from a corpus. The number is indeed very large.

> > I don't know how to write a tail recursive version to build trees.
> > If there are not that many continuous attributes and the dataset is
> > not so large, the tree stops growing before stack overflow.
>
> The trick is to use continuation passing style (CPS): you pass a function
> closure (continuation) containing everything that's needed in subsequent
> computations.  Instead of returning a result, the sub-function calls the
> continuation with the result, which makes the functions tail-recursive.

I've learned this style in Scheme. Yet I feel paralyzed when trying to write 
in it to build trees. The type declaration may make my point clearer.
--8<--
type  dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
--8<--
The problems are that unless the next call returns, the tree is not complete 
yet and it may have several calls on itself.

> But anyway, I think there must be some fishy operation going on. Why not
> use the debugger to find out? Or even better: send a link to the code :-)

I suppose the program is not buggy so far as it works as expected.  It's buggy 
in the design itself: it must recurse (so far as I can implement) and it 
cannot afford recurse too deeply. Sorry, I don't have a homepage.

> > Can one know the maximal number of calls before it overflow the stack?
>
> It depends: byte-code uses its own stack, which you can query using the
> Gc-module. Otherwise, for native-code, call the ulimit-program (Unix),
> which displays resource limits including stack usage or interface to
> the system call "getrlimit".

I'm running Debian unstale. I checked just now on my laptop and "ulimit -s" 
reurned "unlimited". I suppose the desktop that actually ran the program was 
similarly configured.

> In any case, it would be interesting to see your code. Are you going to
> release it under some free license?

Yes, I'm going to release it under GPL. As you can see, I basically use free 
software and am willing to pay it back. I intend to register a project for it 
on savannah soon. Be warned that my code may look rather ugly.

I also downloaded your AIFAD and had a cursive look at it. I found it does not 
handle continuous attributes yet and your design goal is quite different from 
mine. So I wrote mine from scratch and called it DTLR (Decision Tree Learner 
for Retrieval).

If you are interested, I can send a copy to you tomorrow. It does not 
implement all the features I planned, without documentation except some 
comments, but it is enough for my own needs right now.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  9:23       ` Yang Shouxun
@ 2003-04-09 11:34         ` Markus Mottl
  2003-04-10  4:12           ` Parallel CPS? (was Re: [Caml-list] stack overflow) Yang Shouxun
  2003-04-09 14:14         ` CPS folds " Neel Krishnaswami
  1 sibling, 1 reply; 17+ messages in thread
From: Markus Mottl @ 2003-04-09 11:34 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

On Wed, 09 Apr 2003, Yang Shouxun wrote:
> My training data contain statistical values for word combinations (or
> collocations) extracted from a corpus. The number is indeed very large.

Funny, I am currently also applying my tool to NLP (natural language
processing): because of the isomorphism between context-free grammars and
algebraic datatyes, it is possible to learn propositions about derivation
trees (or even more general: learn non-recursive functions). The problem
there is rather the size of CFG extracted from a large, annotated
corpus for German (many, many thousands of productions), which really
looks messy.

> I've learned this style in Scheme. Yet I feel paralyzed when trying to write 
> in it to build trees. The type declaration may make my point clearer.
> --8<--
> type  dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
> --8<--
> The problems are that unless the next call returns, the tree is not complete 
> yet and it may have several calls on itself.

But that's what the closure is for: it abstracts away the subtree that
still needs to be computed.

> I'm running Debian unstale. I checked just now on my laptop and "ulimit -s" 
> reurned "unlimited". I suppose the desktop that actually ran the program was 
> similarly configured.

Given that you already run into problems for comparatively small sizes,
I suppose that you are using the byte-code interpreter? Its builtin
stack space is 256KB, i.e. 64K-words.

> I also downloaded your AIFAD and had a cursive look at it. I found it
> does not handle continuous attributes yet and your design goal is quite
> different from mine. So I wrote mine from scratch and called it DTLR
> (Decision Tree Learner for Retrieval).

Yes, I haven't yet implemented handling of continuous attributes, because
I am aiming at an even more general system, where you can specify abstract
algebras (signatures) that describe how to handle values of some abstract
types, i.e. not only continuous (numeric) values. I have already done
so separately in another project, but wasn't very satisfied with the
design. Furthermore, I'd like to integrate it into AIFAD.

> If you are interested, I can send a copy to you tomorrow. It does not 
> implement all the features I planned, without documentation except some 
> comments, but it is enough for my own needs right now.

That would be great! - Thanks!

Regards,
Markus

-- 
Markus Mottl          http://www.oefai.at/~markus          markus@oefai.at

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPS folds (was Re: [Caml-list] stack overflow)
  2003-04-09  9:23       ` Yang Shouxun
  2003-04-09 11:34         ` Markus Mottl
@ 2003-04-09 14:14         ` Neel Krishnaswami
  2003-04-09 16:54           ` brogoff
  1 sibling, 1 reply; 17+ messages in thread
From: Neel Krishnaswami @ 2003-04-09 14:14 UTC (permalink / raw)
  To: caml-list

Yang Shouxun writes:
> On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
> >
> > The trick is to use continuation passing style (CPS): you pass a
> > function closure (continuation) containing everything that's
> > needed in subsequent computations.  Instead of returning a result,
> > the sub-function calls the continuation with the result, which
> > makes the functions tail-recursive.
> 
> I've learned this style in Scheme. Yet I feel paralyzed when trying
> to write in it to build trees. The problems are that unless the next
> call returns, the tree is not complete yet and it may have several
> calls on itself.

You are 90% of the way to figuring out how to use CPS in this
situation, actually! The trick is to use continuation functions to
represent the computation yet to be done. I'll illustrate with
an example -- let's take a tree type:

type 'a tree =
  | Leaf
  | Node of 'a tree * 'a * 'a tree

Now, let's write a regular, recursive fold for this datatype:

let rec fold ~leaf ~node tree =
  match tree with
  | Leaf -> leaf
  | Node(left, x, right) ->
      let v1 = fold ~leaf ~node left in
      let v2 = fold ~leaf ~node right in
      node v1 x v2

val fold : leaf:'a -> node:('a -> 'b -> 'a -> 'a) -> 'b tree -> 'a

As you expect, this particular fold exhibits non-constant
stack growth, because there are function calls that aren't in
tail position:

  | Node(left, x, right) ->
      let v1 = fold ~leaf ~node left in   (* Not a tail-call *)
      let v2 = fold ~leaf ~node right in  (* Not a tail-call *)
      node v1 x v2

What we want to do is to make all of the recursive calls to fold
tail-recursive, so that they don't grow the stack. Let's be
stupid simple for a second, and throw away all of things keeping that
from happening:

let rec tr_fold ~leaf ~node tree =
  match tree with
  | Leaf -> leaf
  | Node(left, x, right) ->
      tr_fold ~leaf ~node left 

Now tr_fold is tail-recursive, at the cost of forgetting how to
call itself on the right subtree and applying the node function.
Let's fix that by adding a new parameter to tr_fold that will
keep track of that -- and the value that tracks "stuff to execute"
is a function:

let rec tr_fold' ~leaf ~node tree k =
  match tree with
  | Leaf -> k leaf  (* Call leaf to "stuff to execute" *)
  | Node(left, x, right) ->
      tr_fold' ~leaf ~node left  (* Now a tail-call! *)
	(fun v1 ->
	  let v2 = tr_fold' ~leaf ~node right k in (* Not a tail-call *)
	  node v1 x v2)

tr_fold' :
  leaf:'a -> node:('a -> 'b -> 'c -> 'c) -> 'b tree -> ('a -> 'c) -> 'c

Now we just need to repeat the exercise on the second let body,
and we get:

let rec kfold ~leaf ~node tree k =
  match tree with
  | Leaf -> k leaf
  | Node(left, x, right) ->
      kfold ~leaf ~node left 
	(fun v1 ->
	  kfold ~leaf ~node right
	    (fun v2 -> k (node v1 x v2)))

val kfold :
  leaf:'a -> node:('a -> 'b -> 'a -> 'a) -> 'b tree -> ('a -> 'c) -> 'c

At this point, kfold has no stack growth in it, because every
self-call it makes is a tail call. All of the control context is
stored in the closures -- the continuations -- we have build during
its execution. You can see the close correspondence between the
original and the CPS version by playing a few games with formatting:

  | Node(left, x, right) ->
      let v1 = fold ~leaf ~node left in   
      let v2 = fold ~leaf ~node right in  
      node v1 x v2

vs:

  | Node(left, x, right) ->
      kfold ~leaf ~node left  (fun v1 ->
      kfold ~leaf ~node right (fun v2 -> 
        k (node v1 x v2)))

You can see that what we do in each case is "compute the left/right
subtree and then bind the result to v1/v2". Finally, you can get back
the original fold by passing in the identity function as the "starter"
continuation:

let fold ~leaf ~node tree = kfold ~leaf ~node tree (fun x -> x)

val fold : leaf:'a -> node:('a -> 'b -> 'a -> 'a) -> 'b tree -> 'a


(As an aside: Yes, I didn't completely CPS-convert the program -- the
node function is not in CPS-form in kfold.)

-- 
Neel Krishnaswami
neelk@alum.mit.edu

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPS folds (was Re: [Caml-list] stack overflow)
  2003-04-09 14:14         ` CPS folds " Neel Krishnaswami
@ 2003-04-09 16:54           ` brogoff
  2003-04-09 17:23             ` Mike Lin
  0 siblings, 1 reply; 17+ messages in thread
From: brogoff @ 2003-04-09 16:54 UTC (permalink / raw)
  To: Neel Krishnaswami; +Cc: caml-list

On Wed, 9 Apr 2003, Neel Krishnaswami wrote:
> Yang Shouxun writes:
> > On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
> > >
> > > The trick is to use continuation passing style (CPS): you pass a
> > > function closure (continuation) containing everything that's
> > > needed in subsequent computations.  Instead of returning a result,
> > > the sub-function calls the continuation with the result, which
> > > makes the functions tail-recursive.
> > 
> > I've learned this style in Scheme. Yet I feel paralyzed when trying
> > to write in it to build trees. The problems are that unless the next
> > call returns, the tree is not complete yet and it may have several
> > calls on itself.

Before going any further, see if you can recompile the program in bytecode, and 
detect the error location using stack tracing. 

CPS is not that helpful in cases where you can't find some clever 
defunctionalization of the CPS function into accumulator passing style or 
something else. All those closures that get allocated so that you don't 
hit the stack have a cost, too. Measure before using. 

That said, all of this CPS stuff is fascinating, and well worth learning for any 
computing professional with an interest in functional programming. If you know 
Scheme, buy yourself a copy of "Essentials of Programming Languages". There are 
some really great chapters on CPS in that book. 

-- Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPS folds (was Re: [Caml-list] stack overflow)
  2003-04-09 16:54           ` brogoff
@ 2003-04-09 17:23             ` Mike Lin
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Lin @ 2003-04-09 17:23 UTC (permalink / raw)
  To: Neel Krishnaswami; +Cc: caml-list

If y'all want to see a real example, Yaxpo is a whole XML parser 
structured in CPS. One of the things it has to do is, of course, to 
build up a DOM tree.

http://mikelin.mit.edu/yaxpo

-Mike

On Wednesday, April 9, 2003, at 12:54  PM, brogoff@speakeasy.net wrote:

> On Wed, 9 Apr 2003, Neel Krishnaswami wrote:
>> Yang Shouxun writes:
>>> On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
>>>>
>>>> The trick is to use continuation passing style (CPS): you pass a
>>>> function closure (continuation) containing everything that's
>>>> needed in subsequent computations.  Instead of returning a result,
>>>> the sub-function calls the continuation with the result, which
>>>> makes the functions tail-recursive.
>>>
>>> I've learned this style in Scheme. Yet I feel paralyzed when trying
>>> to write in it to build trees. The problems are that unless the next
>>> call returns, the tree is not complete yet and it may have several
>>> calls on itself.
>
> Before going any further, see if you can recompile the program in 
> bytecode, and
> detect the error location using stack tracing.
>
> CPS is not that helpful in cases where you can't find some clever
> defunctionalization of the CPS function into accumulator passing style 
> or
> something else. All those closures that get allocated so that you don't
> hit the stack have a cost, too. Measure before using.
>
> That said, all of this CPS stuff is fascinating, and well worth 
> learning for any
> computing professional with an interest in functional programming. If 
> you know
> Scheme, buy yourself a copy of "Essentials of Programming Languages". 
> There are
> some really great chapters on CPS in that book.
>
> -- Brian
>
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: 
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Parallel CPS? (was Re: [Caml-list] stack overflow)
  2003-04-09 11:34         ` Markus Mottl
@ 2003-04-10  4:12           ` Yang Shouxun
  2003-04-10  4:58             ` Mike Lin
  0 siblings, 1 reply; 17+ messages in thread
From: Yang Shouxun @ 2003-04-10  4:12 UTC (permalink / raw)
  To: caml-list

On Wednesday 09 April 2003 19:34, Markus Mottl wrote:
> Funny, I am currently also applying my tool to NLP (natural language
> processing): because of the isomorphism between context-free grammars and
> algebraic datatyes, it is possible to learn propositions about derivation
> trees (or even more general: learn non-recursive functions). The problem
> there is rather the size of CFG extracted from a large, annotated
> corpus for German (many, many thousands of productions), which really
> looks messy.

I was a linguistics students before switching to NLP. I can understand the 
situation. I guess you need simplify a little bit, just like pruning the 
decision tree. Your grammar need not 100% coverage of the corpus.

> > I've learned this style in Scheme. Yet I feel paralyzed when trying to
> > write in it to build trees. The type declaration may make my point
> > clearer. --8<--
> > type  dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
> > --8<--
> > The problems are that unless the next call returns, the tree is not
> > complete yet and it may have several calls on itself.
>
> But that's what the closure is for: it abstracts away the subtree that
> still needs to be computed.

What if the continuation is not sequential, but parallel? If the tree is 
uniformly binary branching, I guess it would be easier. 

> Given that you already run into problems for comparatively small sizes,
> I suppose that you are using the byte-code interpreter? Its builtin
> stack space is 256KB, i.e. 64K-words.

Would you consider tens of thousands small size?

> That would be great! - Thanks!

Then I'll send a copy to you in private.

Thanks for other listers as well. The major problem is what if the 
continuation is not singular (sequential) but parallel?

Best!
shouxun

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Parallel CPS? (was Re: [Caml-list] stack overflow)
  2003-04-10  4:12           ` Parallel CPS? (was Re: [Caml-list] stack overflow) Yang Shouxun
@ 2003-04-10  4:58             ` Mike Lin
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Lin @ 2003-04-10  4:58 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

>
>>> I've learned this style in Scheme. Yet I feel paralyzed when trying 
>>> to
>>> write in it to build trees. The type declaration may make my point
>>> clearer. --8<--
>>> type  dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
>>> --8<--
>>> The problems are that unless the next call returns, the tree is not
>>> complete yet and it may have several calls on itself.
>

> What if the continuation is not sequential, but parallel? If the tree 
> is
> uniformly binary branching, I guess it would be easier.

> Thanks for other listers as well. The major problem is what if the
> continuation is not singular (sequential) but parallel?

I'm not exactly sure what you mean by 'parallel' here. The observation 
I'll make is that I'm assuming you're not designing your code to only 
work on SMP or distributed machines, so that there must exist an 
equivalent sequential control structure for it.

I'll describe a problem which I hope is related to yours.

Say we had code structured like:

type  tree =
	  Leaf of int
	| Tree of tree * tree (* left and right branches *)

let build_tree (data:some_type) =
	Tree(build_left_subtree data, build_right_subtree data)

...and we wanted to CPS it. So assume we are given:

build_left_subtree' : some_type -> (Tree->unit) -> unit
build_right_subtree' : some_type -> (Tree->unit) -> unit

So sometimes people get confuzled because they have to issue a 
tail-call to build_left_subtree to be good CPSers, but they still need 
additional information to finish constructing the tree.

The solution is to use an intermediate closure as follows:

let build_tree' (data:some_type) (kont:(Tree->unit)) =
	build_left_subtree data
		(fun (left:tree) ->
			build_right_subtree data
				(fun (right:tree) ->
					kont (left,right)))

Again, I hope I understood your problem correctly.

By the way, if there's anyone reading who understands the workings of 
the compiler better than I do. In the above example, left (the argument 
to the first closure) should get stack-allocated since it is an 
argument. Thereafter, we issue a tail-call with a downward funarg that 
references that argument. On the one hand, you want to blow away the 
activation record and allocate the closure on the heap in order to be a 
constant-stack-space tail call. On the other hand, you want to allocate 
the closure on the stack, since it's a downward funarg and you don't 
want to go digging around for memory in the heap. Which does the 
compiler do? I hope the former, or else there's a big problem with my 
technique...

-Mike


--
Mike Lin
mikelin@mit.edu

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] stack overflow
  2003-04-09  2:10 [Caml-list] stack overflow Yang Shouxun
                   ` (3 preceding siblings ...)
  2003-04-09  6:45 ` David Monniaux
@ 2003-04-13 15:42 ` John Max Skaller
  4 siblings, 0 replies; 17+ messages in thread
From: John Max Skaller @ 2003-04-13 15:42 UTC (permalink / raw)
  To: Yang Shouxun; +Cc: caml-list

Yang Shouxun wrote:

> Dear OCaml users,
> 
> I've written a modified version of C4.5 program in OCaml. However, when the 
> input is big, say over 50000, the program (native code on Debian) died for 
> stack overflow. Otherwise, it runs as expected.
> 
> Can anybody explain possible reasons causing stack overflow in OCaml?


Which version of Ocaml?

For Linux, there is no way a tiny dataset like 50,000 elements
could cause a stack overflow .. unless you're running an accounting
package which limits the stack/memory of a process you get ALL
of virtual memory for your stack.

Yet I had this problem, and it turned out to be
a code generation bug in Ocaml 3.03/4. That problem
has been fixed in 3.05.

The diagnostic I got, by the way, was not 'stack overflow'
but 'out of memory' when the stack really did overflow:
in my case non-tail recursive lexing was easy to make
blow the stack with a 7 Meg file: it took minutes to
dump, and the disk thrashed a lot .. the mouse froze ..
and i couldn't kill the process :(

If you are not getting these symptoms, it isn't a stack
overflow. [I am, of course, talking about the i86 native
code compiler]

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Stack_overflow
  2006-03-31 20:44 Stack_overflow mulhern
  2006-03-30 23:03 ` [Caml-list] Stack_overflow Jon Harrop
@ 2006-03-31 21:38 ` Eric Cooper
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Cooper @ 2006-03-31 21:38 UTC (permalink / raw)
  To: caml-list

On Fri, Mar 31, 2006 at 02:44:26PM -0600, mulhern wrote:
> I'm trying to compile an automatically generated list definition with
> approximately 12,000 elements.
> [...]
> Unfortunately, I get a Stack_overflow exception.

Try using the natively-compiled compilers, ocamlc.opt or ocamlopt.opt.

-- 
Eric Cooper             e c c @ c m u . e d u


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Stack_overflow
  2006-03-31 20:44 Stack_overflow mulhern
@ 2006-03-30 23:03 ` Jon Harrop
  2006-03-31 21:38 ` Eric Cooper
  1 sibling, 0 replies; 17+ messages in thread
From: Jon Harrop @ 2006-03-30 23:03 UTC (permalink / raw)
  To: caml-list

On Friday 31 March 2006 21:44, mulhern wrote:
> Unfortunately, I get a Stack_overflow exception. If I roughly half the
> number of elements in the list the Stack_overflow exception goes away.

Try increasing the stack size using:

  export CAMLRUNPARAM='...'

where you get "..." from the manual. Something like 'l=100M', IIRC.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-03-31 23:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-09  2:10 [Caml-list] stack overflow Yang Shouxun
2003-04-09  2:19 ` brogoff
2003-04-09  2:45   ` Yang Shouxun
2003-04-09  8:14     ` Markus Mottl
2003-04-09  9:23       ` Yang Shouxun
2003-04-09 11:34         ` Markus Mottl
2003-04-10  4:12           ` Parallel CPS? (was Re: [Caml-list] stack overflow) Yang Shouxun
2003-04-10  4:58             ` Mike Lin
2003-04-09 14:14         ` CPS folds " Neel Krishnaswami
2003-04-09 16:54           ` brogoff
2003-04-09 17:23             ` Mike Lin
2003-04-09  2:43 ` [Caml-list] stack overflow David Brown
     [not found] ` <200304091034.45256.yangsx@fltrp.com>
     [not found]   ` <16019.34434.468479.586884@barrow.artisan.com>
2003-04-09  2:53     ` Yang Shouxun
2003-04-09  6:45 ` David Monniaux
2003-04-13 15:42 ` John Max Skaller
2006-03-31 20:44 Stack_overflow mulhern
2006-03-30 23:03 ` [Caml-list] Stack_overflow Jon Harrop
2006-03-31 21:38 ` Eric Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).