caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Problem with un-flushed output getting mangled
@ 2011-09-30  0:00 Taylor Venable
  2011-09-30  9:45 ` Jerome Vouillon
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Taylor Venable @ 2011-09-30  0:00 UTC (permalink / raw)
  To: caml-list

Hi there, I'm having a problem with some code I've written. The gist
of my program (whose code I unfortunately cannot share at this time,
I'll have to get approval first) is this:

1. Read elements from an XML file and turn them into objects.
2. Place these objects into hash tables.
3. Do some stuff with those objects.
4. Print them out to various files (depending on the type of the object).

My trouble is this: if I fail to call flush on the output channels in
step #4, I get mangled output. By mangled I mean that in the middle of
one line, suddenly the data from another line appears. The other line
exists elsewhere in the output. Sometimes lines are simply duplicated.
I found this highly strange and thought the problem was in my code at
first. But I couldn't find anything, so I decided to make a call to
flush after every line I wrote to the output. Suddenly my problem
disappeared! My understanding, however, is that flush shouldn't be
required to do this correctly. After all, I simply open the output
channel, write to it a bunch, and then finally close it.

I haven't yet been able to come up with a simple case that exhibits
the problem. I can't share the code with you yet, and I can't share
the data either, so I'll try to give as much information as I can.

A. I only call open_out, output_string, output_char, and close_out.
B. Although I link to Batteries (version 1.4.1) I don't use its IO
layer. I just call the functions that I need directly (e.g.
BatString.join)
C. There are two files that exhibit the problem.
D. The problems in the output file occur in exactly the same position
every time, even if the data itself changes!
D1. In one file, it's position 2883585. At that location, it
duplicates text from position 794139.

venatc01	01	Clinton	William	clinwj01@some.domain	1234567		J				1600
Pennsylvania Ave		Washington DC	12345	US						1	Y

This is a sanitized example of what the output looks like. It's
supposed to be the information for user venatc01, but suddenly in the
middle of the line the information for a certain Bill Clinton is
injected. The row describing Bill Clinton appears earlier in the file.
This particular file is quite long, and there are several duplicate
lines: position 2610356 is duplicated at 2883693, position 2435496 is
duplicated at 2883819.

D2. In the other file, it's position 20481. At that location, it
duplicates text from either position 6434 or 10494. (You can't tell
because it's the same data in both spots.)

line 232: 11667.201210	venatc01	S	Y	Y
line 378: 14900.201210	venatc01	S	Y	Y
line 737: 1241210	venatc01	S	Y	Y

Everything after the 124 above is copied either from line 232 or 378.

E. Flushing the output of one file after every line printed fixes that
one file, but does not affect the position of the problem in the other
file, which remains the same.
F. Adjusting the heap size using OCAMLRUNPARAM=s=4M,i=32M,o=150 had no effect.
G. The problem exists both with byte compilation and native compilation.
H. I'm using OCaml 3.12.1 on Linux x86_64.

I've assumed that you don't need to call flush periodically to avoid
problems like this, but maybe that's not the case? Should one expect
any problems or difficulties if one doesn't explicitly flush every so
often?

If anybody has any ideas on how to debug this, I will be greatly
appreciative. I don't know that much about OCaml internals and how to
debug things like this. If I can provide some more information, let me
know. If it will help to have the code, I'll speak with my boss. In
the mean time, I'll keep trying to reproduce with a much simpler
program. Thanks for any thoughts.

-- 
Taylor C. Venable
http://metasyntax.net/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
  2011-09-30  0:00 [Caml-list] Problem with un-flushed output getting mangled Taylor Venable
@ 2011-09-30  9:45 ` Jerome Vouillon
  2011-10-01  1:55   ` Taylor Venable
  2011-10-02 11:41 ` Tiphaine Turpin
       [not found] ` <CAGyUfm24stnWFHwWGt4p0gZWA72FM97aXUZTe7wo1i9WDj7nFA@mail.gmail.com>
  2 siblings, 1 reply; 8+ messages in thread
From: Jerome Vouillon @ 2011-09-30  9:45 UTC (permalink / raw)
  To: Taylor Venable; +Cc: caml-list

Hello,

On Thu, Sep 29, 2011 at 08:00:47PM -0400, Taylor Venable wrote:
> My trouble is this: if I fail to call flush on the output channels in
> step #4, I get mangled output. By mangled I mean that in the middle of
> one line, suddenly the data from another line appears. The other line
> exists elsewhere in the output. Sometimes lines are simply duplicated.

Are you using Unix.fork? When you fork, the buffers are duplicated and
can thus end up being flushed several times. In particular, the 'exit'
function from the Pervasives module flushes all open output channels.
I don't have a good workaround. You can call 'flush_all' before
forking, but any write error is silently ignored by this function.
Or use 'Unix.execv "/bin/true" [||]' rather than 'exit' to terminate
subprocesses.

-- Jerome

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
  2011-09-30  9:45 ` Jerome Vouillon
@ 2011-10-01  1:55   ` Taylor Venable
  2011-10-01  7:11     ` Török Edwin
  0 siblings, 1 reply; 8+ messages in thread
From: Taylor Venable @ 2011-10-01  1:55 UTC (permalink / raw)
  To: Jerome Vouillon; +Cc: caml-list

On Fri, Sep 30, 2011 at 05:45, Jerome Vouillon <vouillon@pps.jussieu.fr> wrote:
> On Thu, Sep 29, 2011 at 08:00:47PM -0400, Taylor Venable wrote:
>> My trouble is this: if I fail to call flush on the output channels in
>> step #4, I get mangled output. By mangled I mean that in the middle of
>> one line, suddenly the data from another line appears. The other line
>> exists elsewhere in the output. Sometimes lines are simply duplicated.
>
> Are you using Unix.fork? When you fork, the buffers are duplicated and
> can thus end up being flushed several times. In particular, the 'exit'
> function from the Pervasives module flushes all open output channels.
> I don't have a good workaround. You can call 'flush_all' before
> forking, but any write error is silently ignored by this function.
> Or use 'Unix.execv "/bin/true" [||]' rather than 'exit' to terminate
> subprocesses.

Nope, I'm not forking. I also tried putting an explicit exit at the
end of my program, but I already close_out my file before then so it
doesn't make a difference. Thanks for the ideas, though.

-- 
Taylor C. Venable
http://metasyntax.net/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
  2011-10-01  1:55   ` Taylor Venable
@ 2011-10-01  7:11     ` Török Edwin
  2011-10-03 10:51       ` Taylor Venable
  0 siblings, 1 reply; 8+ messages in thread
From: Török Edwin @ 2011-10-01  7:11 UTC (permalink / raw)
  To: caml-list

On 10/01/2011 04:55 AM, Taylor Venable wrote:
> On Fri, Sep 30, 2011 at 05:45, Jerome Vouillon <vouillon@pps.jussieu.fr> wrote:
>> On Thu, Sep 29, 2011 at 08:00:47PM -0400, Taylor Venable wrote:
>>> My trouble is this: if I fail to call flush on the output channels in
>>> step #4, I get mangled output. By mangled I mean that in the middle of
>>> one line, suddenly the data from another line appears. The other line
>>> exists elsewhere in the output. Sometimes lines are simply duplicated.
>>
>> Are you using Unix.fork? When you fork, the buffers are duplicated and
>> can thus end up being flushed several times. In particular, the 'exit'
>> function from the Pervasives module flushes all open output channels.
>> I don't have a good workaround. You can call 'flush_all' before
>> forking, but any write error is silently ignored by this function.
>> Or use 'Unix.execv "/bin/true" [||]' rather than 'exit' to terminate
>> subprocesses.
> 
> Nope, I'm not forking. I also tried putting an explicit exit at the
> end of my program, but I already close_out my file before then so it
> doesn't make a difference. Thanks for the ideas, though.
> 

Do you use threads?

--Edwin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
  2011-09-30  0:00 [Caml-list] Problem with un-flushed output getting mangled Taylor Venable
  2011-09-30  9:45 ` Jerome Vouillon
@ 2011-10-02 11:41 ` Tiphaine Turpin
       [not found] ` <CAGyUfm24stnWFHwWGt4p0gZWA72FM97aXUZTe7wo1i9WDj7nFA@mail.gmail.com>
  2 siblings, 0 replies; 8+ messages in thread
From: Tiphaine Turpin @ 2011-10-02 11:41 UTC (permalink / raw)
  To: caml-list

Le 30/09/2011 02:00, Taylor Venable a écrit :
> B. Although I link to Batteries (version 1.4.1) I don't use its IO
> layer.
If so, then my experience is probably irrelevant. But sometimes with
batteries there are "implicit" things. A long time ago I had a problem
which looked similar (output seemed interleaved some strange and very
sparse way). It turned out that the [open_in] function from batteries
was not using the right flags, in particular the "truncate" flag was not
set, so I was only overwriting the beginning of the file, which caused a
problem when a previous and longer version existed. This should have
been corrected now, but as batteries IO are a completely different
implementation, I would check the use of batteries with attention.

Tiphaine Turpin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
  2011-10-01  7:11     ` Török Edwin
@ 2011-10-03 10:51       ` Taylor Venable
  0 siblings, 0 replies; 8+ messages in thread
From: Taylor Venable @ 2011-10-03 10:51 UTC (permalink / raw)
  To: Török Edwin; +Cc: caml-list

2011/10/1 Török Edwin <edwintorok@gmail.com>:
> On 10/01/2011 04:55 AM, Taylor Venable wrote:
>> On Fri, Sep 30, 2011 at 05:45, Jerome Vouillon <vouillon@pps.jussieu.fr> wrote:
>>> On Thu, Sep 29, 2011 at 08:00:47PM -0400, Taylor Venable wrote:
>>>> My trouble is this: if I fail to call flush on the output channels in
>>>> step #4, I get mangled output. By mangled I mean that in the middle of
>>>> one line, suddenly the data from another line appears. The other line
>>>> exists elsewhere in the output. Sometimes lines are simply duplicated.
>>>
>>> Are you using Unix.fork? When you fork, the buffers are duplicated and
>>> can thus end up being flushed several times. In particular, the 'exit'
>>> function from the Pervasives module flushes all open output channels.
>>> I don't have a good workaround. You can call 'flush_all' before
>>> forking, but any write error is silently ignored by this function.
>>> Or use 'Unix.execv "/bin/true" [||]' rather than 'exit' to terminate
>>> subprocesses.
>>
>> Nope, I'm not forking. I also tried putting an explicit exit at the
>> end of my program, but I already close_out my file before then so it
>> doesn't make a difference. Thanks for the ideas, though.
>>
>
> Do you use threads?

No, I'm not using threads.

-- 
Taylor C. Venable
http://metasyntax.net/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
       [not found] ` <CAGyUfm24stnWFHwWGt4p0gZWA72FM97aXUZTe7wo1i9WDj7nFA@mail.gmail.com>
@ 2011-10-03 19:00   ` Pierre Chopin
  2011-10-04  1:39     ` Taylor Venable
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre Chopin @ 2011-10-03 19:00 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 747 bytes --]

Hi

2011/9/29 Taylor Venable <taylor@metasyntax.net>

> Hi there, I'm having a problem with some code I've written. The gist
> of my program (whose code I unfortunately cannot share at this time,
> I'll have to get approval first) is this:
>
> 1. Read elements from an XML file and turn them into objects.
> 2. Place these objects into hash tables.
> 3. Do some stuff with those objects.
> 4. Print them out to various files (depending on the type of the object).
>
> My trouble is this: if I fail to call flush on the output channels in
> step #4, I get mangled output.



Do you have more than one output channel on the same file open at the same
time ?

-- 
Pierre Chopin,
Chief Technology Officer and co-founder
punchup LLC
pierre@punchup.com

[-- Attachment #2: Type: text/html, Size: 1280 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Problem with un-flushed output getting mangled
  2011-10-03 19:00   ` Pierre Chopin
@ 2011-10-04  1:39     ` Taylor Venable
  0 siblings, 0 replies; 8+ messages in thread
From: Taylor Venable @ 2011-10-04  1:39 UTC (permalink / raw)
  To: Pierre Chopin; +Cc: caml-list

On Mon, Oct 3, 2011 at 15:00, Pierre Chopin <pierre@punchup.com> wrote:
> 2011/9/29 Taylor Venable <taylor@metasyntax.net>
>>
>> Hi there, I'm having a problem with some code I've written. The gist
>> of my program (whose code I unfortunately cannot share at this time,
>> I'll have to get approval first) is this:
>>
>> 1. Read elements from an XML file and turn them into objects.
>> 2. Place these objects into hash tables.
>> 3. Do some stuff with those objects.
>> 4. Print them out to various files (depending on the type of the object).
>>
>> My trouble is this: if I fail to call flush on the output channels in
>> step #4, I get mangled output.
>
> Do you have more than one output channel on the same file open at the same
> time ?

I looked, and sure enough there was some old code that was writing the
same file earlier in the program's execution, and it never closed the
channel when it was done. Removing that other code fixed the problem.
So even though the writing was never interleaved, I presume the
buffering of the two channels interfered with each other and caused
the problems I was seeing in the output. Thank you!

-- 
Taylor C. Venable
http://metasyntax.net/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-04  1:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-30  0:00 [Caml-list] Problem with un-flushed output getting mangled Taylor Venable
2011-09-30  9:45 ` Jerome Vouillon
2011-10-01  1:55   ` Taylor Venable
2011-10-01  7:11     ` Török Edwin
2011-10-03 10:51       ` Taylor Venable
2011-10-02 11:41 ` Tiphaine Turpin
     [not found] ` <CAGyUfm24stnWFHwWGt4p0gZWA72FM97aXUZTe7wo1i9WDj7nFA@mail.gmail.com>
2011-10-03 19:00   ` Pierre Chopin
2011-10-04  1:39     ` Taylor Venable

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).