caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* OCaml program crashes after computing fine for 2 days during grep on multiMB output file
@ 2006-03-01 11:03 Andries Hekstra
  2006-03-01 12:41 ` [Caml-list] " Richard Jones
  2006-03-01 13:45 ` Gerd Stolpmann
  0 siblings, 2 replies; 3+ messages in thread
From: Andries Hekstra @ 2006-03-01 11:03 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

Dear OCaml-list,

I use OCaml under 64-bit Linux to do signal processing simulations of next 
generation optical storage devices. So far, I have really enjoyed 
programming in OCaml, e.g. as program texts are considerable shorter than 
in C++ for computations that involve many arrays. My computations run for 
many days if not a week, and produce output files of ca. 20 MB. I run them 
in a job queue. 

Recently I have been plagued by programs that crash when I do a "grep" on 
the output file (opened with open_out). E.g. the program has been running 
succesfully for a few days. I do a "grep @ *.out" in the directory to 
monitor progress as important lines in the output file start with a "@". A 
few minutes later I receive mails from the queuing system saying that 
everything crashed.

What is the cause of these crashes? Can somebody give me a clue?

Thanx,

Andries

------------------------------------------------------------
# LSBATCH: User input
qtb -par Exp107.txt > Exp107.txt.log -codes 
gallager_10b_1023l_1048576w.txt
------------------------------------------------------------

Exited with exit code 2.

Resource usage summary:

    CPU time   : 163606.88 sec.
    Max Memory :      3014 MB
    Max Swap   :      3044 MB

    Max Processes  :         3

The output (if any) follows:

Fatal error: exception Sys_error("Stale NFS file handle")




------------------------------------------------------------------------
Dr. Ir. Andries P. Hekstra
Philips Research 
High Tech Campus 27  (WL-1-4.15)
5656 AG Eindhoven
Tel./Fax/Secr. +31 40 27 42048/42566/44051 
   *  Good open source break software for computer users : 
http://www.workrave.org 

[-- Attachment #2: Type: text/html, Size: 2282 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] OCaml program crashes after computing fine for 2 days during grep on multiMB output file
  2006-03-01 11:03 OCaml program crashes after computing fine for 2 days during grep on multiMB output file Andries Hekstra
@ 2006-03-01 12:41 ` Richard Jones
  2006-03-01 13:45 ` Gerd Stolpmann
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Jones @ 2006-03-01 12:41 UTC (permalink / raw)
  To: Andries Hekstra; +Cc: caml-list

On Wed, Mar 01, 2006 at 12:03:34PM +0100, Andries Hekstra wrote:
> Fatal error: exception Sys_error("Stale NFS file handle")

Good old NFS :-)

How is your NFS partition mounted? (look in /etc/fstab)

Does your program write to the logfiles infrequently?

Does the NFS server get rebooted occasionally?

Have you deleted any ".nfsXXXX" files in a directory thinking that
they are unimportant?

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] OCaml program crashes after computing fine for 2 days during grep on multiMB output file
  2006-03-01 11:03 OCaml program crashes after computing fine for 2 days during grep on multiMB output file Andries Hekstra
  2006-03-01 12:41 ` [Caml-list] " Richard Jones
@ 2006-03-01 13:45 ` Gerd Stolpmann
  1 sibling, 0 replies; 3+ messages in thread
From: Gerd Stolpmann @ 2006-03-01 13:45 UTC (permalink / raw)
  To: Andries Hekstra; +Cc: caml-list

Am Mittwoch, den 01.03.2006, 12:03 +0100 schrieb Andries Hekstra:
> 
> Dear OCaml-list, 
> 
> I use OCaml under 64-bit Linux to do signal processing simulations of
> next generation optical storage devices. So far, I have really enjoyed
> programming in OCaml, e.g. as program texts are considerable shorter
> than in C++ for computations that involve many arrays. My computations
> run for many days if not a week, and produce output files of ca. 20
> MB. I run them in a job queue.  
> 
> Recently I have been plagued by programs that crash when I do a "grep"
> on the output file (opened with open_out). E.g. the program has been
> running succesfully for a few days. I do a "grep @ *.out" in the
> directory to monitor progress as important lines in the output file
> start with a "@". A few minutes later I receive mails from the queuing
> system saying that everything crashed. 
> 
> What is the cause of these crashes? Can somebody give me a clue? 

A stale NFS file handle normally means that the file disappeared on the
NFS server. (The server does not keep files open while clients have them
open in order to support proper POSIX semantics; it just re-opens them
whenever clients access the files.) As you are grepping the file, this
cannot be the case here.

Stale handles may also result if the NFS server is rebooted and
something goes wrong. Normally, the server keeps file handles across
reboots, but there are many reports that this does not work for some
users. Maybe these NFS servers are just buggy. (For example, some OS do
not guarantee stable device numbers, so every time the system is booting
the disks get new numbers, and all file handles become stale.)

You should also ensure that you are hard-mounting (option "o=hard" in
the mount command). Use NFS version 3 if possible.

In general, I would advise not to use NFS for long-running processes.
Write the file to /var/tmp and move it to its final location when it is
fully written.

Gerd

> ------------------------------------------------------------
> # LSBATCH: User input
> qtb -par Exp107.txt > Exp107.txt.log -codes
> gallager_10b_1023l_1048576w.txt
> ------------------------------------------------------------
> 
> Exited with exit code 2.
> 
> Resource usage summary:
> 
>    CPU time   : 163606.88 sec.
>    Max Memory :      3014 MB
>    Max Swap   :      3044 MB
> 
>    Max Processes  :         3
> 
> The output (if any) follows:
> 
> Fatal error: exception Sys_error("Stale NFS file handle")
> 
> 
> 
> 
> ------------------------------------------------------------------------
> Dr. Ir. Andries P. Hekstra
> Philips Research 
> High Tech Campus 27  (WL-1-4.15)
> 5656 AG Eindhoven
> Tel./Fax/Secr. +31 40 27 42048/42566/44051 
>   *  Good open source break software for computer users :
> http://www.workrave.org   
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-03-01 13:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-01 11:03 OCaml program crashes after computing fine for 2 days during grep on multiMB output file Andries Hekstra
2006-03-01 12:41 ` [Caml-list] " Richard Jones
2006-03-01 13:45 ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).