From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: stdio review
Date: Tue, 13 Feb 2018 23:33:04 -0500 [thread overview]
Message-ID: <20180214043304.GP1627@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAE5zrZm-nVZVRs5nNOOEiw=247+AFXtJ6u20vj2mxnrv4B4bEw@mail.gmail.com>
On Tue, Feb 13, 2018 at 10:49:04PM -0500, Dale Weiler wrote:
> >> fgetpos.c:
> >> fgetpos: [bug]
> >> using *(off_t*) to write _Int64 data into fpos_t structure when
> >> memcpy should be used, this breaks strict aliasing. Maybe add
> >> an off_t to the fpos_t union?
>
> > My leaning would be to add the off_t, but the type might not be
> > exposed and thus we would need to find a matching type that is
> > exposed. memcpy would be the nicest solution, but only if we had a way
> > of allowing the compiler to use builtin memcpy; otherwise it's a
> > gratuitous call.
>
> Seeing as the type _is_ exposed, adding the off_t to the union is likewise
It's not exposed in plain C profile, only with POSIX or higher feature
test macros. We could just use long long though since the actual type
doesn't need to match; it only needs to be an integer type capable of
holding the range of off_t.
> > Compound literal table to reference whence as a lookup table
> > as a single expression.
>
> > I thought this was cute.
>
> It's definitely cute, but it does depend on the seek argument being
> one of the macro definitions in the [0, 2] range which I had to check,
> obviously those have no reason to ever be anything but those values;
> ABI and all but it's just additional mental load to ensure they weren't
> hence why I brought it up.
The check right above demonstrates the assumption of those
definitions. I suppose we could use designated initializers to make it
so the order is clear without changing the generated code...
> >> fwrite.c:
> >> fwrite: [question]
> >> Should there be a check for size*nmemb overflow?
>
> > This is actually a complicated topic. Formally, I think the C standard
> > reads such that it would be valid for size*nmemb to exceed the size of
> > the data object to be written if you somehow know you'll hit a write
> > error before that happens. However real world implementations don't
> > work like that. In particular, the kernel will error out with EFAULT
> > if the buffer length extends past the valid userspace address range,
> > even if the writes would never happen; the only way to avoid this
> > would be to break longer-than-page writes down into separate
> > page-sized writes. So I think for practical purposes, we have to
> > interpret the standard as requiring that size*nmemb actually reflect
> > the size of the object passed in, and in that case, the multiplication
> > necessarily does not overflow. If there's an interpretation from WG14
> > contrary to this, we'll have to revisit it.
>
> > See also https://sourceware.org/bugzilla/show_bug.cgi?id=19165
>
> That is an interesting and somewhat odd edge case. Maybe for the time
> being a comment within here w.r.t it maybe needing to be revisited
> wouldn't hurt. In either case it doesn't appear to be harming anything.
Yes, a short comment here (and in fread.c) might be nice.
> >> gets.c:
> >> gets: [optimize]
> >> The length of the string is calculated twice to strip the
> >> newline character off it. Why not rewrite it as:
> >> if (ret) { size_t i = strlen(s)-1; if (s[i] == '\n') s[i] = 0; }
>
> > Seriously, this is gets. It's always unsafe, deprecated, removed from
> > the current C standard. If it's gratuitously slow too, great. :-)
>
> Yes it's gets, but fixing it for O(n) instead of O(n*2) does make the musl
> static set slightly smaller, also makes programs using it crash twice as
> fast ;-)
:-)
> >> stdio_impl.h: [style]
> >> FUNLOCK macro has a trailing else which prompted me to look at every
> >> single instance of FUNLOCK to make sure all of them contained a
> >> semicolon. This is just dangerous, why not use the more common
> >> idiom of do { } while (0).
>
> > Indeed that should be fine.
>
> I think it's better understood by most folks as well, glad we're on the same
> page w.r.t this one. At least then you cannot fail to forget the semicolon.
Yes, exactly.
> >> intscan.h: [style]
> >> It isn't apparent for why <stdio.h> needs to be included. Should
> >> just forward declare struct FILE; here instead.
>
> > That would not work, because it's *not* struct FILE, it's FILE, which
> > happens to be defined as "struct _IO_FILE", but that's an
> > implementation detail. Including <stdio.h> is the clean way to have
> > that.
>
> I don't understand why you couldn't replicate that behavior. It's what
> stdio.h already _does_ and seeing as the associated translation unit
> already includes stdio.h it seems gratuitously excessive. It's just an
> opaque pointer type being passed, how is a forward declaration
> incorrect. Does C distinguish between "opaque T" and "opaque T"
> with different underlying struct? If so I have many of code that needs
> to be changed on my end.
See 6.2.7 Compatible type and composite type ¶1:
"Moreover, two structure, union, or enumerated types declared
in separate translation units are compatible if their tags and
members satisfy the following requirements: If one is declared
with a tag, the other shall be declared with the same tag. If
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
both are completed anywhere within their respective
translation units, then the following additional requirements
apply: there shall be a one-to-one correspondence between
their members such that each pair of corresponding members are
declared with compatible types; if one member of the pair is
declared with an alignment specifier, the other is declared
with an equivalent alignment specifier; and if one member of
the pair is declared with a name, the other is declared with
the same name. For two structures, corresponding members shall
be declared in the same order. For two structures or unions,
corresponding bit-fields shall have the same widths. For two
enumerations, corresponding members shall have the same
values."
Without fancy linker-level checks for UB or LTO, you might say the UB
from declarations with incompatible types is purely theoretical, but
the bigger issue is that, when you did include both intscan.h with
"struct FILE *" and stdio_impl.h in the implementation file intscan.c,
the conflicting definitions would be visible and you'd get an error.
Yes of course we could declare it explicitly using "struct _IO_FILE *"
in intscan.h, but that would violate DRY and would break if the tag
were ever changed (e.g. in a future ABI not using glibc compat cruft).
Including a header that defines FILE really is the right way to do
this.
Rich
prev parent reply other threads:[~2018-02-14 4:33 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-13 16:19 Dale Weiler
2018-02-13 16:59 ` Rich Felker
2018-02-13 19:21 ` Rich Felker
2018-02-14 3:49 ` Dale Weiler
2018-02-14 4:33 ` Rich Felker [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180214043304.GP1627@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).