mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: stdio review
Date: Tue, 13 Feb 2018 11:59:53 -0500	[thread overview]
Message-ID: <20180213165953.GN1627@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAE5zrZmBJw0iq2D-fFVAso+3KB4JTg2P8E_XajZVU7tCPw5HvA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

On Tue, Feb 13, 2018 at 11:19:26AM -0500, Dale Weiler wrote:
> The following has been a several work week code review of musl's stdio
> code and supplementary internal code for stdio. While I would supply
> the review as is inline with this email I'm concerned about the email
> RFC line width for plain text emails and gmail's handling of plain
> text so instead I've hosted it on gist.github.com and can be seen
> here.
> 
> https://gist.githubusercontent.com/graphitemaster/927316c85587d4ad7b6870857ff681d9/raw/0ec1a90f788d3188ae14b40347af69eeca745a23/musl%2520stdio%2520review.txt

I'm moving it to email so it's available for comment/discussion. Here:

[-- Attachment #2: musl_stdio_review.txt --]
[-- Type: text/plain, Size: 28404 bytes --]

musl stdio review

All work presented in here was based off the last effective stdio
related commit b64539ae06aa91a407359238f4e909adb9bfab3d

Unless otherwise stated, the approach taken in the review was to view
each stdio related file in alphabeticalized order and setup test
harnesses for each with no pre-conditions or invariants until a unit
made explicit reference to it, for instance if foo.c had function foo
taking pointer to FILE*, and did not check for FILE* to be NULL, but
references to foo did, then those were noted and the harness was
updated with that invariant. This approach, while tedious ensures that
all references maintain the same invariants throughout.

The review results are layed out in a tree like structure where the
first most column line is the file and each indentation level within
it is a function within that translation unit proceeded with a status
that is right justified on the 80th column, of either [nothing],
indicating that nothing of merit was discovered, [question] for
something I'm unsure about, [style] for tricky statements or expressions
that made it difficult to understand what was happening, [bug] for
things that I precieved as potential bugs, [comment] for things that
should be commented and [optimize] for things that can be optimized.

Similarly, if a function has multiple tags like [bug] and [comment] then
sections pertaining to each are in separate sections delimited by a new
line. Functions that are aliases, or wrappers for existing functions are
treated together as two cases with an implied fallthrough much like
cases in a switch statement.

There is two parts to this review. The initial stdio infastructure that
is in src/stdio which encompasses the first part of the review and which
is the longest. Then there is the supplemental part for the undocumented
named "scan helper" interface and stdio implementation details which
can be found in src/internal.

========================================================================
    src/stdio
========================================================================

__fclose_ca.c:
    __fclose_ca:                                               [nothing]

__fdopen.c:
    __fdopen:                                                  [nothing]
    fdopen:                                                    [nothing]

__fmodeflags.c:
    __fmodeflags:                                              [nothing]

__fopen_rb_ca.c:
    __fopen_rb_ca:
                                                              [question]
        Is it possible for this function to be called with a length less
        than UNGET?
                                                               [comment]
        I was unsure what the ca meant until finding a reference in a
        deeply nested internal header that it was "caller allocated",
        maybe emphasis that in the source file to make review easier.

__lockfile.c:
    __lockfile:                                                [nothing]
    __unlockfile:                                              [nothing]

__overflow.c:
    __overflow:                                               [question]
        The interface contract of this function is it takes int and
        returns int, yet the passed in character is truncated through
        narrowing conversion to unsigned char for the purposes of
        stdio, should that same narrowed conversion be returned or
        should the unnarrowed result be returned?

__stdio_close.c:
    __dummy:                                                   [nothing]
    __aio_close:                                               [nothing]
    __stdio_close:                                             [nothing]

__stdio_exit.c:
    close_file:                                               [question]
        Function acquires lock on FILE but never unlocks it, this looks
        intended since we're exiting stdio? If that's the case this
        should probably be commented.
    __stdio_exit:                                             [question]
    __stdio_exit_needed:                                      [question]
                                                               [comment]
        Function acquires lock via __ofl_lock but never calls
        __ofl_unlock, this looks intended since we're exiting stdio?
        If that's the case this should probably be commented.

__stdio_read.c:
    __stdio_read:                                                [style]
        Tricky expression involving F_EOF/F_ERR and xor masks.

__stdio_seek.c:
    __stdio_seek:                                              [nothing]

__stdio_write.c:
    __stdio_write:                                            [question]
        Is it possible for len to be less than f->wpos-f->wbase?

__stdout_write.c:
    __stdout_write:                                           [question]
        Is it possible for len to be less than f->wpos-f->wbase?

__string_read.c:
    __string_read:                                             [nothing]

__toread.c:
    __toread:                                                  [comment]
        It's sister function __towrite comments its' intent, maybe
        this should as well for consistency.

__towrite.c:
    __towrite:                                                   [style]
        What is the purpose of parenthesis around F_NOWR? does it imply
        a missing flag in the list, looks suspicious.

__uflow.c:
    __uflow:                                                   [nothing]

asprintf.c:
    asprintf:                                                  [nothing]

clearerr.c:
    clearerr:                                                  [nothing]
    clearerr_unlocked:                                        [question]
        An alias for clearerr but clearerr locks, is this correct?
        or should clearerr_unlocked just remove EOF/ERR flags, and
        clearerr hold a lock calling clearerr_unlocked?

dprintf.c:
    dprintf:                                                   [nothing]

ext.c:
    __flushlbf:                                                [nothing]
    __fsetlocking:                                             [nothing]
    __fwriting:                                                [nothing]
    __freading:                                                [nothing]
    __freadable:                                               [nothing]
    __fwritable:                                               [nothing]
    __flbf:                                                    [nothing]
    __fbufsize:                                                [nothing]
    __fpending:                                                [nothing]
    __fpurge:                                                      [bug]
    fpurge:                                                        [bug]
        The interface contract for this function is that it returns void
        and not int, whereas fpurge returns int, the latter does not
        even appear to be part of Linux yet musl aliases them to the
        same. Should musl's return void?

ext2.c:
    __freadahead:                                              [nothing]
    __freadptr:                                                [nothing]
    __freadptrinc:                                             [nothing]
    __fseterr:                                                 [nothing]

fclose.c:
    fclose:                                                      [style]
        Checking for null pointer before calling free for f->getln_buf

feof.c:                                                          [style]
    There's two instances of feof involved, one provided by stdio_impl
    that just checks flags without a lock held and this one which means
    this file has to #undef. That just seems tricky to me, you could see
    an feof somewhere that is using the macro definition where others
    could be using a function (depending on includes and include order.)

    feof:                                                      [nothing]
    feof_unlocked:                                            [question]
    _IO_feof_unlocked:                                        [question]
        Similar to clearerr_unlocked, is this correct? The unlocked
        variants are actually locked.

ferror.c:                                                        [style]
    There's two instances of ferror involved, one provided by stdio_impl
    that just checks flags without a lock held and this one which means
    this file has to #undef. That just seems tricky to me, you could see
    an ferror somewhere that is using the macro definition where others
    could be using a function (depending on includes and include oder.)

    ferror:                                                    [nothing]
    ferror_unlocked:                                          [question]
    _IO_ferror_unlocked:                                      [question]
        Similar to clearerr_unlocked and feof_unlocked, is this correct?
        The unlocked variants are actually locked.

fflush.c:
    fflush:                                                    [nothing]

fgetc.c:
    fgetc:                                                     [nothing]

fgetln.c:
    fgetln:                                                    [nothing]

fgetpos.c:
    fgetpos:                                                       [bug]
        using *(off_t*) to write _Int64 data into fpos_t structure when
        memcpy should be used, this breaks strict aliasing. Maybe add
        an off_t to the fpos_t union?

                                                               [comment]
        POSIX states this function shall fail if the data cannot be
        stored in the fpos_t structure but the structure is 16-byte in
        size and this is a narrowing conversion. Maybe just comment that
        it can never fail. This came up in the test harness several
        times.

fgets.c:
    fgets:                                                     [nothing]

fgetwc.c:
    __fgetwc_unlocked_internal:                               [question]
        I don't understand the conditional if (l+1 >= 1), is that not
        the same as just if (l) since the only time when l+1 >=1 is
        if l==0, even for overflow. In which case the l += !l part
        in the body is even more concerning because !l will always
        be false?
    __fgetwc_unlocked:                                         [nothing]
    fgetwc:                                                    [nothing]
    fgetwc_unlocked:                                           [nothing]
    getwc_unlocked:                                            [nothing]

fgetws.c:
    fgetws:                                                    [nothing]
    __fgetws_unlocked:                                        [question]
        Similar to clearerr_unlocked, feof_unlocked and ferror_unlocked
        is this correct? The unlocked variants are actually locked.

fileno.c:
    fileno:                                                    [nothing]
    fileno_unlocked:                                          [question]
        Similar to clearerr_unlocked, feof_unlocked, ferror_unlocked
        and __fetws_unlocked is this correct? The unlocked variants are
        actually locked.

flockfile.c:
    flockfile:                                                 [nothing]

fmemopen.c:
    mseek:                                                       [style]
        It does goto upwards.

        Compound literal table to reference whence as a lookup table
        as a single expression.

        The function would be less tricky if just rewritten.
    mread:                                                     [nothing]
    mwrite:                                                    [nothing]
    fmemopen:                                                      [bug]
        The size check for fmemopen cookie to report ENOMEM is incorrect
        with the calloc below.

                                                                 [style]
        The cookie allocation handling can be greatly simplified with a
        nother structure that embeds everything and using the sizeof
        that structure instead.

fopen.c:
    fopen:                                                     [nothing]

fopencookie.c:
    fopencookie:                                                   [bug]
        Several other functions which utilize __ofl_add do the following
        if (!libc.threaded) f->lock = -1; However this one does not.

fprintf.c:
    fprintf:                                                   [nothing]

fput.c:
    fput:                                                      [nothing]

fputs.c:
    fputs:                                                     [nothing]
    fputs_unlocked:                                           [question]
        See all other instances of this question.

fputwc.c:
    __fputwc_unlocked:                                         [nothing]
    fputwc:                                                    [nothing]
    fputwc_unlocked:                                           [nothing]
    putwc_unlocked:                                            [nothing]

fputws.c:
    fputws:                                                    [nothing]
    fputws_unlocked:                                          [question]
        See all other instances of this question.

fread.c:
    fread:                                                     [nothing]
    fread_unlocked:                                           [question]
        See all other instances of this question.

freopen.c:
    freopen:                                                       [bug]
        In the case of no filename and the F_SETFL system call failing,
        and also __dup3 failing, the file is still locked.

fscanf.c:
    fscanf:                                                    [nothing]
    __isoc99_fscanf:                                           [nothing]

fseek.c:
    __fseeko_unlocked:                                         [nothing]
    __fseeko:                                                  [nothing]
    fseek:                                                     [nothing]
    fseeko:                                                    [nothing]

fsetpos.c:
    fsetpos:                                                       [bug]
        off_t is written into here incorrectly and also read incorrectly
        as this breaks strict aliasing. Maybe consider putting off_t in
        the fpos_t union and referencing it that way to avoid memcpy?

ftell.c:
    __ftello_unlocked:                                           [style]
        The whole thing is formatted in a confusing to read manner.
    __ftello:                                                  [nothing]
    ftell:                                                     [nothing]
    ftello:                                                    [nothing]

ftrylockfile.c:
    __do_orphaned_stdio_locks:                                 [nothing]
    __unlocked_locked_file:                                    [nothing]
    ftrylockfile:                                              [nothing]

funlockfile.c:
    funlockfile:                                               [nothing]

fwide.c:
    fwide:                                                     [nothing]

fwprintf.c:
    fwprintf:                                                  [nothing]

fwrite.c:
    fwrite:                                                   [question]
        Should there be a check for size*nmemb overflow?
    fwrite_unlocked:                                          [question]
        See all other instances of this question.

fwscanf.c:
    fwscanf:                                                   [nothing]
    __isoc99_fwscanf:                                          [nothing]

getc.c:
    getc:                                                      [nothing]
    _IO_getc:                                                  [nothing]

getc_unlocked.c:
    getc_unlocked:                                             [nothing]
    fgetc_unlocked:                                            [nothing]
    _IO_getc_unlocked:                                         [nothing]

getchar.c:
    getchar:                                                   [nothing]

getchar_unlocked.c:
    getchar_unlocked:                                          [nothing]

getdelim.c:                                                      [style]
    The MIN macro is never used in this translation unit.

    getdelim:                                                  [nothing]
    __getdelim:                                                [nothing]

getline.c:
    getline:                                                   [nothing]

gets.c:
    gets:                                                     [optimize]
        The length of the string is calculated twice to strip the
        newline character off it. Why not rewrite it as:
        if (ret) { size_t i = strlen(s)-1; if (s[i] == '\n') s[i] = 0; }

getw.c:
    getw:                                                      [nothing]

getwc.c:
    getwc:                                                     [nothing]

getwchar.c:
    getwchar:                                                  [nothing]
    getwchar_unlocked:                                        [question]
        This isn't actually unlocked.

ofl.c:
    __ofl_lock:                                                [nothing]
    __ofl_unlock:                                              [nothing]

ofl_add.c:
    __ofl_add:                                                 [nothing]

open_memstream.c:
    ms_seek:                                                     [style]
        It does goto upwards.

        Compound literal table to reference whence as a lookup table
        as a single expression.

        The function would be less tricky if just rewritten.
    ms_write:                                                  [nothing]
    ms_close:                                                  [nothing]
    open_memstream:                                           [question]
        Why does this allocate a 1 byte buffer? That seems small.

                                                                 [style]
        This is also doing the weird arithmetic as well for the cookie
        This function can be rewritten much nicer.

open_wmemstream.c:
    This has all the same problems as openmem_stream.c

pclose.c:
    pclose:                                                    [nothing]

perror.c:
    perror:                                                    [nothing]

popen.c:
    popen:                                                     [nothing]

printf.c:
    printf:                                                    [nothing]

putc.c:
    putc:                                                      [nothing]
    _IO_putc:                                                  [nothing]

putc_unlocked.c:
    putc_unlocked:                                             [nothing]
    fputc_unlocked:                                            [nothing]
    _IO_putc_unlocked:                                         [nothing]

putchar.c
    putchar:                                                   [nothing]

putchar_unlocked.c
    putchar_unlocked:                                          [nothing]

puts.c:
    puts:                                                      [nothing]

putw.c:
    putw:                                                      [nothing]

putwc.c:
    putwc:                                                     [nothing]

putwchar.c:
    putwchar:                                                  [nothing]
    putwchar_unlocked:                                        [question]
        This isn't actually unlocked.

remove.c:
    remove:                                                    [nothing]

rename.c:
    rename:                                                    [nothing]

rewind.c:
    rewind:                                                    [nothing]

scanf.c:
    scanf:                                                     [nothing]

setbuf.c:
    setbuf:                                                    [nothing]

setbuffer.c:
    setbuffer:                                                 [nothing]

setlinebuf.c:
    setlinebuf:                                                [nothing]

setvbuf.c:
    setvbuf:                                                       [bug]
        No locks are ever held on the file when changing fields of it.

snprintf.c:
    snprintf:                                                  [nothing]

sprintf.c:
    sprintf:                                                   [nothing]

sscanf.c:
    sscanf:                                                    [nothing]

stdderr.c:                                                     [nothing]

stdin.c:                                                       [nothing]

stdout.c:                                                      [nothing]

swprintf.c:
    swprintf:                                                  [nothing]

swscanf.c:
    swscanf:                                                   [nothing]
    __isoc99_swscanf:                                          [nothing]

tempnam.c:
    tempnam:                                                   [nothing]

tmpfile.c:
    tmpfile:                                                   [nothing]

tmpnam.c;
    tmpnam:                                                    [nothing]

ungetc.c:
    ungetc:                                                    [nothing]

ungetwc.c:
    ungetwc:                                                     [style]
        Ignoring the overly long if statement, the non-ascii route below
        does memcpy(f->rpos -= l, mbc, l) and that is just odd since I
        read that as f->rpos - l initially which almost looks correct
        but failing to update rpos which is when I noticed it actually
        does -=.

vasprintf.c:
    vasprintf:                                                 [nothing]

vdprintf.c:
    wrap_write:                                                [nothing]
    vdprintf:                                                  [nothing]

vfprintf.c:
    out:                                                       [nothing]
    pad:                                                       [nothing]
    fmt_x:                                                     [nothing]
    fmt_o:                                                     [nothing]
    fmt_u:                                                     [nothing]
    fmt_fp:                                                   [question]
    getint:                                                        [bug]
        This function overflows on INT_MIN for the else case.
    printf_core:                                                 [style]
        if (0) cutting into switch statements specified in a very smart
        order to make for really tiny code. This whole function is
        impossible to audit. I tried, I really did. Things like this:
        *(a=z-(p=1))=arg.i; are really tricky.
    vfprintf:                                                  [nothing]

vfscanf.c:
    store_int:                                                 [nothing]
    arg_n:                                                     [nothing]
    vfscanf:                                                       [bug]
    __isoc99_vfscanf:
        Calculating width using the same 10*n+*c-'0' trick can overflow
        on INT_MIN.
                                                                 [style]
        This function gets really difficult to audit further down. It's
        just very dense and should be split up somehow.

vfwprintf.c:
    Has all the same problems as vfprintf.c

vfwscanf.c:
    Has all the same problems as vfscanf.c

vprintf.c:
    vprintf:                                                   [nothing]

vscanf.c:
    vscanf:                                                    [nothing]
    __isoc99_vscanf:                                           [nothing]

vsnprintf.c:
    sn_write:                                                  [nothing]
    vsnprintf:                                                 [nothing]

vsprintf.c:
    vsprintf:                                                  [nothing]

vsscanf.c:
    do_read:                                                   [nothing]
    vsscanf:                                                   [nothing]
    __isoc99_vsscanf:                                          [nothing]

vswprintf.c:
    sw_write:                                                  [nothing]
    vswprintf:                                                   [style]
        sizeof(FILE) should be sizeof f
        compound initializer should be used for file instead of what
        ever is done here, every other function that creates local FILE
        struct does it as FILE f = { .member = value, ... }, is this
        different because of the memset? is the memset even needed?

vswscanf.c:
    wstring_read:                                              [nothing]
    vswscanf:                                                  [nothing]
    __isoc_vswscanf:                                           [nothing]

vwprintf.c:
    vwprintf:                                                  [nothing]

wvscanf.c:
    vwscanf:                                                   [nothing]
    __isoc99_vwscanf:                                          [nothing]

wprintf.c:
    wprintf:                                                   [nothing]

wscanf.c:
    wscanf:                                                    [nothing]

========================================================================
    src/internal
========================================================================

stdio_impl.h:                                                    [style]
    FUNLOCK macro has a trailing else which prompted me to look at every
    single instance of FUNLOCK to make sure all of them contained a
    semicolon. This is just dangerous, why not use the more common
    idiom of do { } while (0).

                                                               [comment]
    The _IO_FILE is ABI compat with glibc but this isn't documented any
    where in the header which makes members like mustbezero_{1,2} really
    confusion and the layout of members within the structure as well,
    since the size of a FILE can be made much smaller just by sorting
    the members which is something someone may want to do.

                                                                 [style]
    feof/ferror macros which are unlocked, this requires that code
    inside src/stdio actually #undef them when implementing and anything
    inside there that requires these _must_ avoid including stdio.h over
    this to ensure it uses the right ones. I would just call these
    something else like __feof_ui/__ferror_ui (unlocked internal) then
    no weird #undef is needed and the intent is clear.

shgetc.h:                                                      [comment]
    Maybe document this is the "scan helper" interface because it was
    not obvious what the naming was.

shgetc.c:
    __shlim:                                                   [nothing]
    __shgetc:                                                  [nothing]

intscan.h:                                                       [style]
    It isn't apparent for why <stdio.h> needs to be included. Should
    just forward declare struct FILE; here instead.

intscan.c:
    __intscan:                                                 [nothing]

floatscan.h:                                                     [style]
    It isn't apparent for why <stdio.h> needs to be included. Should
    just forward declare struct FILE; here instead.

floatscan.c:
    scanexp:                                                   [nothing]
    decfloat:                                                  [nothing]
    hexfloat:                                                  [nothing]
    __floatscan:                                               [nothing]

  reply	other threads:[~2018-02-13 16:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-13 16:19 Dale Weiler
2018-02-13 16:59 ` Rich Felker [this message]
2018-02-13 19:21   ` Rich Felker
2018-02-14  3:49     ` Dale Weiler
2018-02-14  4:33       ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180213165953.GN1627@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).