caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Taylor Venable <taylor@metasyntax.net>
To: caml-list <caml-list@inria.fr>
Subject: [Caml-list] Problem with un-flushed output getting mangled
Date: Thu, 29 Sep 2011 20:00:47 -0400	[thread overview]
Message-ID: <CAAmKFxeFBaqpbW8-rt3GOwvhNKaf2MjZdP20ZydpptksQcTVZA@mail.gmail.com> (raw)

Hi there, I'm having a problem with some code I've written. The gist
of my program (whose code I unfortunately cannot share at this time,
I'll have to get approval first) is this:

1. Read elements from an XML file and turn them into objects.
2. Place these objects into hash tables.
3. Do some stuff with those objects.
4. Print them out to various files (depending on the type of the object).

My trouble is this: if I fail to call flush on the output channels in
step #4, I get mangled output. By mangled I mean that in the middle of
one line, suddenly the data from another line appears. The other line
exists elsewhere in the output. Sometimes lines are simply duplicated.
I found this highly strange and thought the problem was in my code at
first. But I couldn't find anything, so I decided to make a call to
flush after every line I wrote to the output. Suddenly my problem
disappeared! My understanding, however, is that flush shouldn't be
required to do this correctly. After all, I simply open the output
channel, write to it a bunch, and then finally close it.

I haven't yet been able to come up with a simple case that exhibits
the problem. I can't share the code with you yet, and I can't share
the data either, so I'll try to give as much information as I can.

A. I only call open_out, output_string, output_char, and close_out.
B. Although I link to Batteries (version 1.4.1) I don't use its IO
layer. I just call the functions that I need directly (e.g.
BatString.join)
C. There are two files that exhibit the problem.
D. The problems in the output file occur in exactly the same position
every time, even if the data itself changes!
D1. In one file, it's position 2883585. At that location, it
duplicates text from position 794139.

venatc01	01	Clinton	William	clinwj01@some.domain	1234567		J				1600
Pennsylvania Ave		Washington DC	12345	US						1	Y

This is a sanitized example of what the output looks like. It's
supposed to be the information for user venatc01, but suddenly in the
middle of the line the information for a certain Bill Clinton is
injected. The row describing Bill Clinton appears earlier in the file.
This particular file is quite long, and there are several duplicate
lines: position 2610356 is duplicated at 2883693, position 2435496 is
duplicated at 2883819.

D2. In the other file, it's position 20481. At that location, it
duplicates text from either position 6434 or 10494. (You can't tell
because it's the same data in both spots.)

line 232: 11667.201210	venatc01	S	Y	Y
line 378: 14900.201210	venatc01	S	Y	Y
line 737: 1241210	venatc01	S	Y	Y

Everything after the 124 above is copied either from line 232 or 378.

E. Flushing the output of one file after every line printed fixes that
one file, but does not affect the position of the problem in the other
file, which remains the same.
F. Adjusting the heap size using OCAMLRUNPARAM=s=4M,i=32M,o=150 had no effect.
G. The problem exists both with byte compilation and native compilation.
H. I'm using OCaml 3.12.1 on Linux x86_64.

I've assumed that you don't need to call flush periodically to avoid
problems like this, but maybe that's not the case? Should one expect
any problems or difficulties if one doesn't explicitly flush every so
often?

If anybody has any ideas on how to debug this, I will be greatly
appreciative. I don't know that much about OCaml internals and how to
debug things like this. If I can provide some more information, let me
know. If it will help to have the code, I'll speak with my boss. In
the mean time, I'll keep trying to reproduce with a much simpler
program. Thanks for any thoughts.

-- 
Taylor C. Venable
http://metasyntax.net/

             reply	other threads:[~2011-09-30  0:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-30  0:00 Taylor Venable [this message]
2011-09-30  9:45 ` Jerome Vouillon
2011-10-01  1:55   ` Taylor Venable
2011-10-01  7:11     ` Török Edwin
2011-10-03 10:51       ` Taylor Venable
2011-10-02 11:41 ` Tiphaine Turpin
     [not found] ` <CAGyUfm24stnWFHwWGt4p0gZWA72FM97aXUZTe7wo1i9WDj7nFA@mail.gmail.com>
2011-10-03 19:00   ` Pierre Chopin
2011-10-04  1:39     ` Taylor Venable

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAmKFxeFBaqpbW8-rt3GOwvhNKaf2MjZdP20ZydpptksQcTVZA@mail.gmail.com \
    --to=taylor@metasyntax.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).