caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Problem with un-flushed output getting mangled
@ 2011-09-30  0:00 Taylor Venable
  2011-09-30  9:45 ` Jerome Vouillon
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Taylor Venable @ 2011-09-30  0:00 UTC (permalink / raw)
  To: caml-list

Hi there, I'm having a problem with some code I've written. The gist
of my program (whose code I unfortunately cannot share at this time,
I'll have to get approval first) is this:

1. Read elements from an XML file and turn them into objects.
2. Place these objects into hash tables.
3. Do some stuff with those objects.
4. Print them out to various files (depending on the type of the object).

My trouble is this: if I fail to call flush on the output channels in
step #4, I get mangled output. By mangled I mean that in the middle of
one line, suddenly the data from another line appears. The other line
exists elsewhere in the output. Sometimes lines are simply duplicated.
I found this highly strange and thought the problem was in my code at
first. But I couldn't find anything, so I decided to make a call to
flush after every line I wrote to the output. Suddenly my problem
disappeared! My understanding, however, is that flush shouldn't be
required to do this correctly. After all, I simply open the output
channel, write to it a bunch, and then finally close it.

I haven't yet been able to come up with a simple case that exhibits
the problem. I can't share the code with you yet, and I can't share
the data either, so I'll try to give as much information as I can.

A. I only call open_out, output_string, output_char, and close_out.
B. Although I link to Batteries (version 1.4.1) I don't use its IO
layer. I just call the functions that I need directly (e.g.
BatString.join)
C. There are two files that exhibit the problem.
D. The problems in the output file occur in exactly the same position
every time, even if the data itself changes!
D1. In one file, it's position 2883585. At that location, it
duplicates text from position 794139.

venatc01	01	Clinton	William	clinwj01@some.domain	1234567		J				1600
Pennsylvania Ave		Washington DC	12345	US						1	Y

This is a sanitized example of what the output looks like. It's
supposed to be the information for user venatc01, but suddenly in the
middle of the line the information for a certain Bill Clinton is
injected. The row describing Bill Clinton appears earlier in the file.
This particular file is quite long, and there are several duplicate
lines: position 2610356 is duplicated at 2883693, position 2435496 is
duplicated at 2883819.

D2. In the other file, it's position 20481. At that location, it
duplicates text from either position 6434 or 10494. (You can't tell
because it's the same data in both spots.)

line 232: 11667.201210	venatc01	S	Y	Y
line 378: 14900.201210	venatc01	S	Y	Y
line 737: 1241210	venatc01	S	Y	Y

Everything after the 124 above is copied either from line 232 or 378.

E. Flushing the output of one file after every line printed fixes that
one file, but does not affect the position of the problem in the other
file, which remains the same.
F. Adjusting the heap size using OCAMLRUNPARAM=s=4M,i=32M,o=150 had no effect.
G. The problem exists both with byte compilation and native compilation.
H. I'm using OCaml 3.12.1 on Linux x86_64.

I've assumed that you don't need to call flush periodically to avoid
problems like this, but maybe that's not the case? Should one expect
any problems or difficulties if one doesn't explicitly flush every so
often?

If anybody has any ideas on how to debug this, I will be greatly
appreciative. I don't know that much about OCaml internals and how to
debug things like this. If I can provide some more information, let me
know. If it will help to have the code, I'll speak with my boss. In
the mean time, I'll keep trying to reproduce with a much simpler
program. Thanks for any thoughts.

-- 
Taylor C. Venable
http://metasyntax.net/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-04  1:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-30  0:00 [Caml-list] Problem with un-flushed output getting mangled Taylor Venable
2011-09-30  9:45 ` Jerome Vouillon
2011-10-01  1:55   ` Taylor Venable
2011-10-01  7:11     ` Török Edwin
2011-10-03 10:51       ` Taylor Venable
2011-10-02 11:41 ` Tiphaine Turpin
     [not found] ` <CAGyUfm24stnWFHwWGt4p0gZWA72FM97aXUZTe7wo1i9WDj7nFA@mail.gmail.com>
2011-10-03 19:00   ` Pierre Chopin
2011-10-04  1:39     ` Taylor Venable

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).