mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: string-backed FILEs mess
Date: Fri, 14 Sep 2018 11:52:27 -0400	[thread overview]
Message-ID: <20180914155227.GL1878@brightrain.aerifal.cx> (raw)
In-Reply-To: <20180912171824.GZ1878@brightrain.aerifal.cx>

On Wed, Sep 12, 2018 at 01:18:24PM -0400, Rich Felker wrote:
> On Wed, Sep 12, 2018 at 12:33:45PM -0400, Rich Felker wrote:
> > OK, I've been properly initializing the FILE rather than leaving it
> > uninitialized except for the important fields like the old code did.
> > Changing that, it's 1.44s with step 8, 1.60s with step 24. I also
> > confirmed that this version of the code is almost as fast as the
> > existing code with the memchr removed (just assuming it can read
> > ahead).
> 
> Uhg, the source of the "almost" here makes me even more convinced the
> current code must go. Part of the reason it's not as fast was that I
> was still setting f.read=__string_read, which requires (this is on
> i386, 32-bit) setting up the GOT pointer.
> 
> What was the old code doing? f.read was uninitialized. But the new
> code crashes in that case when hitting the end of the string. Why
> doesn't the old code crash? Because f.rend is set way past the end of
> the string and never reached. If it were reached:
> 
> 1. The shgetc macro calls the __shgetc function.
> 2. The __shgetc function calls __uflow.
> 3. __uflow calls __toread.
> 4. __toread inspects uninitialized f->wpos/f->wbase fields and,
>    depending on the values it sees, calls f->write, which is also
>    uninitialized.
> 5. If it gets past that, next __uflow calls the uninitialized f->read.
> 
> The fact that any of this works at all is a fragile shit-show, and
> completely depends on __intscan/__floatscan just using (via shgetc) a
> tiny undocumented subset of the FILE structure and a tiny undocumented
> subset of the stdio interfaces on it.
> 
> Really the existing code is just a poor substitute for having an
> abstraction for windowed string iteration, using the stdio FILE
> structure in a way that also works with real FILEs. It's clever, but
> this kind of clever is not a good thing.
> 
> I'm still not sure what the right way forward is, though.

OK, a small breakthrough that makes this mess a lot simpler:

The __intscan and __floatscan backends do not (and are not allowed to,
but this should be documented and isn't) call any stdio functions on
the fake FILE* passed to them except for the shgetc and shunget
macros, defined as:

#define shgetc(f) (((f)->rpos < (f)->shend) ? *(f)->rpos++ : __shgetc(f))
#define shunget(f) ((f)->shend ? (void)(f)->rpos-- : (void)0)

If the < is merely replaced by !=, which is functionally equivalent,
then shend can be any type of sentinel pointer we want (e.g. (void*)-1
or even just &localvar) to use the buffer as a string with no known
length, and we have a guarantee that __shgetc is never called.

I think this -1+2-byte change is an acceptable resolution to the issue
for now.

I did however perform some casual performance testing to determine
that strtol on short numbers would be 30% faster if it operated
directly on strings only rather than going through the fake FILE and
having to check end pointer and including code to conditionally call
__shgetc. So it might make sense in the longer term to redesign the
interface to operate only on strings, and have scanf be responsible
for transforming the input to a bounded-size string that will produce
the same result. (Note: vfscanf could shortcut out of having to do
this if it could detect that its underlying FILE is a vsscanf string
buffer.) Such a redesign would also reduce the static linking cost for
programs that use strto* but not stdio input functions.

Rich


  reply	other threads:[~2018-09-14 15:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 14:02 Rich Felker
2018-09-12 15:09 ` Markus Wichmann
2018-09-12 15:43   ` Rich Felker
2018-09-12 16:33     ` Rich Felker
2018-09-12 17:18       ` Rich Felker
2018-09-14 15:52         ` Rich Felker [this message]
2018-09-14 16:24           ` Rich Felker
2018-09-14 20:39             ` Rich Felker
2018-09-12 17:41     ` Markus Wichmann
2018-09-12 18:03       ` Rich Felker
2018-09-12 18:48       ` A. Wilcox
2018-09-12 19:30         ` Markus Wichmann
2018-09-12 19:46           ` Rich Felker
2018-09-12 15:55   ` A. Wilcox
2018-09-12 16:35     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180914155227.GL1878@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).