mailing list of musl libc
 help / color / mirror / code / Atom feed
* stdio fixes & internals documentation
@ 2015-05-29  1:39 Rich Felker
  2015-05-29  3:05 ` Rich Felker
  2015-05-29  4:32 ` Rich Felker
  0 siblings, 2 replies; 3+ messages in thread
From: Rich Felker @ 2015-05-29  1:39 UTC (permalink / raw)
  To: musl

I'm writing this up because I need to fix a bug in ungetc -- it's
failing when the target FILE is at eof rather than clearing the eof
status and succeeding, despite having logic in the source that was
intended to make this case work right. So I'm going to go ahead and
make some tentative documentation of stdio internals for my own use
and for others who want to review these changes. This stuff can
eventually be cleaned up to go into the musl manual... :-)

So far, just for stuff used in reading:

f->read() assumes reading is currently a valid operation (so __toread
was called successfully without a subsequent __towrite) and the read
buffer is empty (but the values of the pointers don't matter). Right
now it's responsible for setting EOF and error flags for the file.
This is probably a bad idea because the logic, which is stdio-global
and not specific to the FILE type, is repeated in different underlying
FILE type implementations' read functions.

__toread() assumes the FILE is in a valid state to begin reading, but
does not misbehave horribly even if it was in the middle of writing.
If the FILE's fopen mode does not permit reading, it sets error status
and returns EOF. If the file is at EOF, it also returns EOF. This
means the caller (such as ungetc) cannot distinguish EOF from "error
going into reading mode" without temporarily clearing and restoring
the error flag (which would be ugly and inefficient). On failure
(either eof status or error switching to read mode) the buffer
pointers are left unchanged; on success, they're all set to the
beginning of the buffer.

The first thing I want to do is fix the known bug in ungetc, and I
think the easiest way to do that is to make __toread set valid read
buffer pointers when it fails due to eof status. Then, instead of
ungetc checking the return value of __toread, it can instead call
__toread and then just check rpos. That is, instead of:

	if ((!f->rend && __toread(f)) || f->rpos <= f->buf - UNGET) {
		// error

it can instead do:

	if (!f->rend) __toread(f);
	if (f->rpos <= f->buf - UNGET) {
		// error

or perhaps:

	if (!f->rpos) __toread(f);
	if (!f->rpos || f->rpos <= f->buf - UNGET) {
		// error

I like the second version better because it does not assume that a
null pointer compares <= any valid pointer, which could be wrong if
pointer <= is implemented as a signed comparison. I changed the
conditional from !f->rend to !f->rpos to simultaneously optimize two
code paths:

1. If f->rpos is initially non-null, the compiler can skip re-checking
   that it's non-null in the second if.

2. If f->rpos is initially null, then after __toread it needs to be
   re-read, but only one field (rpos) needs to be loaded from memory
   instead of two (rend and rpos).

Note that if the file is not open in a mode that permits writing, then
f->rpos is initially a null pointer and __toread will never set it to
something non-null, nor will f->read (since it's not permitted to call
f->read without a successful __toread).

Once these fixes are taken care of I'd like to look at the EOF logic
in f->read() and moving it out to the callers (only __uflow and fread)
where we won't have to worry about bugs (which I think exist) in the
FILE-type-specific read functions.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: stdio fixes & internals documentation
  2015-05-29  1:39 stdio fixes & internals documentation Rich Felker
@ 2015-05-29  3:05 ` Rich Felker
  2015-05-29  4:32 ` Rich Felker
  1 sibling, 0 replies; 3+ messages in thread
From: Rich Felker @ 2015-05-29  3:05 UTC (permalink / raw)
  To: musl

On Thu, May 28, 2015 at 09:39:37PM -0400, Rich Felker wrote:
> The first thing I want to do is fix the known bug in ungetc, and I
> think the easiest way to do that is to make __toread set valid read
> buffer pointers when it fails due to eof status. Then, instead of
> ungetc checking the return value of __toread, it can instead call
> __toread and then just check rpos. That is, instead of:
> 
> 	if ((!f->rend && __toread(f)) || f->rpos <= f->buf - UNGET) {
> 		// error
> 
> it can instead do:
> 
> 	if (!f->rend) __toread(f);
> 	if (f->rpos <= f->buf - UNGET) {
> 		// error
> 
> or perhaps:
> 
> 	if (!f->rpos) __toread(f);
> 	if (!f->rpos || f->rpos <= f->buf - UNGET) {
> 		// error
> 
> I like the second version better because it does not assume that a
> null pointer compares <= any valid pointer, which could be wrong if
> pointer <= is implemented as a signed comparison. [...]

I have a fix ready to commit here, but sometime between now and
release I think we need some tests for it. The immportant things to
test are that ungetc and ungetwc work correctly on newly-opened files,
files in the middle of reading from the buffer, and files at eof; that
they fail for files not opened in a mode compatible with reading; and
that the new __toread behavior doesn't allow getc or fread to read
when the eof flag is set for the file but the underlying fd has more
data available (in the real world this only happens on terminals or
growing files; the latter would be easier to test I think).

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: stdio fixes & internals documentation
  2015-05-29  1:39 stdio fixes & internals documentation Rich Felker
  2015-05-29  3:05 ` Rich Felker
@ 2015-05-29  4:32 ` Rich Felker
  1 sibling, 0 replies; 3+ messages in thread
From: Rich Felker @ 2015-05-29  4:32 UTC (permalink / raw)
  To: musl

On Thu, May 28, 2015 at 09:39:37PM -0400, Rich Felker wrote:
> Once these fixes are taken care of I'd like to look at the EOF logic
> in f->read() and moving it out to the callers (only __uflow and fread)
> where we won't have to worry about bugs (which I think exist) in the
> FILE-type-specific read functions.

Actually there don't seem to be such bugs, so I'm not in a big hurry
to make changes here. I did notice one issue though with __toread and
__towrite: they set the error status for the FILE when an illegal
operation (per the fopen mode) is attempted, but they don't set errno.
glibc sets errno to EBADF in this case, which I'm not sure is a
permitted behavior. POSIX says:

[EBADF]
    [CX] [Option Start] The file descriptor underlying stream is not a
    valid file descriptor open for reading. [Option End]

So if the underlying fd allows reading but the fopen mode doesn't, I
think this would be reusing a specified errno value with a meaning
distinct from the specified meaning, and thus non-conforming. I don't
know what would be an appropriate errno value, though.

On the other hand, the general text for EBADF reads:

[EBADF]
    Bad file descriptor. A file descriptor argument is out of range,
    refers to no open file, or a read (write) request is made to a
    file that is only open for writing (reading).

which makes it sound more permissible here.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-29  4:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-29  1:39 stdio fixes & internals documentation Rich Felker
2015-05-29  3:05 ` Rich Felker
2015-05-29  4:32 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).