mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Subject: Re: string-backed FILEs mess
Date: Wed, 12 Sep 2018 17:09:41 +0200	[thread overview]
Message-ID: <20180912150941.GB13976@voyager> (raw)
In-Reply-To: <20180912140239.GV1878@brightrain.aerifal.cx>

On Wed, Sep 12, 2018 at 10:02:39AM -0400, Rich Felker wrote:
> While working on the headers/declarations/hidden stuff, I've run
> across some issues that should be documented as needing fix. One big
> one is this:
> 
> strto* is still setting up a fake FILE with buffer end pointer as
> (void*)-1 as a hack to read directly from a string of unknown length.
> This is of course nonsensical UB and might lead to actual problems
> (signed overflow) in places where rend-rpos is computed. strtol.c
> handles that aspect already by "if ((size_t)s > (size_t)-1/2)" but
> it's still horribly wrong. There's a __string_read function sscanf
> uses that avoids the whole problem by incrementally measuring string
> length and copying it to a real stdio buffer, which also makes ungetc
> safe, and this is the obvious *right* thing to do, but it might make
> strto* unacceptably slower. I haven't done any measurement.
> 

Well, first of all, I might set my foot wrong here very badly, but I
generally don't care about C standard UB as long as the behavior is
defined elsewhere. In this case, the behavior is undefined because the C
standard only defines a partial mapping from pointer to integer and back
(namely null pointer <-> 0 and non-null pointer <-> nonzero, but nothing
else), and that is why most operations on pointers are undefined.
However, Linux defines that it only works on a linear address space, and
so completes the mapping from pointer to integer, and so avoids the UB
by defining it.

We have done similar things in the past, like assuming long to be large
enough for a pointer, as Linux depends on that as well.

As for the overflow: That would be fixed if you set rend to rpos +
PTRDIFF_MAX unless that overflows. Technically still UB, but at least
the calculations work now.

The __string_read approach shouldn't be all that much slower, though. It
only adds overhead when the read buffer is empty, which is going to be
on the first call and then every 256 characters. And the overhead is
mostly a memchr()... on second thought, that might take a while.

> The other "mostly right" (modulo ungetc not being available then)
> approach would be getting rid of the whole current buffer design with
> start/end pointers and using indices instead. This would solve a lot
> of stupid gratuitous UB in stdio, like (T*)0-(T*)0 being undefined.
> It's not clear to me whether it would be more or less efficient. It
> would "break" glibc ABI-compat for stdio -- the original reason I used
> the pointer-based design -- but that could be fixed by putting
> "must-be-null" fields in place of the buffer pointers so that any
> glibc code using getc_unlocked/putc_unlocked macros would hit the
> "no buffer space" code path and call an actual function. In many ways
> that's desirable anyway.
> 

I don't know how important the glibc-compat still is for people. I don't
need it, but then, I have glibc available if necessary. And with Steam
moving to Linux, closed-source software is certainly going to be on the
rise for Linux/PC (i.e. Linux/i386 and Linux/amd64). Which is going to
be a problem for musl-based distributions, if they want to support that.

> Probably the right next step here is measuring whether just using
> __string_read would make anything measurably slower.
> 
> Rich

Probably. Although the memchr() in that function might access past the
string, but only if the string isn't terminated, so we can invoke the
GIGO principle, right?

Ciao,
Markus


  reply	other threads:[~2018-09-12 15:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12 14:02 Rich Felker
2018-09-12 15:09 ` Markus Wichmann [this message]
2018-09-12 15:43   ` Rich Felker
2018-09-12 16:33     ` Rich Felker
2018-09-12 17:18       ` Rich Felker
2018-09-14 15:52         ` Rich Felker
2018-09-14 16:24           ` Rich Felker
2018-09-14 20:39             ` Rich Felker
2018-09-12 17:41     ` Markus Wichmann
2018-09-12 18:03       ` Rich Felker
2018-09-12 18:48       ` A. Wilcox
2018-09-12 19:30         ` Markus Wichmann
2018-09-12 19:46           ` Rich Felker
2018-09-12 15:55   ` A. Wilcox
2018-09-12 16:35     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912150941.GB13976@voyager \
    --to=nullplan@gmx.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).