caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Andries Hekstra <andries.hekstra@philips.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] OCaml program crashes after computing fine for 2 days during grep on multiMB output file
Date: Wed, 01 Mar 2006 14:45:09 +0100	[thread overview]
Message-ID: <1141220710.10329.80.camel@localhost.localdomain> (raw)
In-Reply-To: <OF159CADE2.4EE4262D-ONC1257124.003C262F-C1257124.003CDDBD@philips.com>

Am Mittwoch, den 01.03.2006, 12:03 +0100 schrieb Andries Hekstra:
> 
> Dear OCaml-list, 
> 
> I use OCaml under 64-bit Linux to do signal processing simulations of
> next generation optical storage devices. So far, I have really enjoyed
> programming in OCaml, e.g. as program texts are considerable shorter
> than in C++ for computations that involve many arrays. My computations
> run for many days if not a week, and produce output files of ca. 20
> MB. I run them in a job queue.  
> 
> Recently I have been plagued by programs that crash when I do a "grep"
> on the output file (opened with open_out). E.g. the program has been
> running succesfully for a few days. I do a "grep @ *.out" in the
> directory to monitor progress as important lines in the output file
> start with a "@". A few minutes later I receive mails from the queuing
> system saying that everything crashed. 
> 
> What is the cause of these crashes? Can somebody give me a clue? 

A stale NFS file handle normally means that the file disappeared on the
NFS server. (The server does not keep files open while clients have them
open in order to support proper POSIX semantics; it just re-opens them
whenever clients access the files.) As you are grepping the file, this
cannot be the case here.

Stale handles may also result if the NFS server is rebooted and
something goes wrong. Normally, the server keeps file handles across
reboots, but there are many reports that this does not work for some
users. Maybe these NFS servers are just buggy. (For example, some OS do
not guarantee stable device numbers, so every time the system is booting
the disks get new numbers, and all file handles become stale.)

You should also ensure that you are hard-mounting (option "o=hard" in
the mount command). Use NFS version 3 if possible.

In general, I would advise not to use NFS for long-running processes.
Write the file to /var/tmp and move it to its final location when it is
fully written.

Gerd

> ------------------------------------------------------------
> # LSBATCH: User input
> qtb -par Exp107.txt > Exp107.txt.log -codes
> gallager_10b_1023l_1048576w.txt
> ------------------------------------------------------------
> 
> Exited with exit code 2.
> 
> Resource usage summary:
> 
>    CPU time   : 163606.88 sec.
>    Max Memory :      3014 MB
>    Max Swap   :      3044 MB
> 
>    Max Processes  :         3
> 
> The output (if any) follows:
> 
> Fatal error: exception Sys_error("Stale NFS file handle")
> 
> 
> 
> 
> ------------------------------------------------------------------------
> Dr. Ir. Andries P. Hekstra
> Philips Research 
> High Tech Campus 27  (WL-1-4.15)
> 5656 AG Eindhoven
> Tel./Fax/Secr. +31 40 27 42048/42566/44051 
>   *  Good open source break software for computer users :
> http://www.workrave.org   
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


      parent reply	other threads:[~2006-03-01 13:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-01 11:03 Andries Hekstra
2006-03-01 12:41 ` [Caml-list] " Richard Jones
2006-03-01 13:45 ` Gerd Stolpmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1141220710.10329.80.camel@localhost.localdomain \
    --to=info@gerd-stolpmann.de \
    --cc=andries.hekstra@philips.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).