mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Bartosz Brachaczek <b.brachaczek@gmail.com>
To: musl@lists.openwall.com
Subject: Re: [PATCH] handle whitespace before %% in scanf
Date: Mon, 10 Jul 2017 10:22:37 +0200	[thread overview]
Message-ID: <c2c7d1ce-1e0d-a504-f8be-313fe7385240@gmail.com> (raw)
In-Reply-To: <20170710020047.GL1627@brightrain.aerifal.cx>

Hello,

On 7/10/2017 4:00 AM, Rich Felker wrote:
> On Sun, Jul 09, 2017 at 11:00:18PM +0200, Bartosz Brachaczek wrote:
>> this is mandated by C and POSIX standards and is in accordance with
>    ^^^^
>> glibc behavior.
> 
> Can you explain exactly what "this" refers to?

Ah, poor wording choice on my part. Yes, I meant that %% consumes 
whitespace. Shall I resend the patch with restated commit message if you 
think it's otherwise good?

> It looks like you're claiming %% consumes space, which I can't find
> any support for in the C standard. Has this topic been discussed
> somewhere I should see?

Sorry, I didn't think this would be controversial. No prior discussion. 
Let me present my reasoning below.

The following paragraph in the description of the fscanf function in the 
C11 standard, §7.21.6.2, establishes that '%%' is a "conversion 
specification", where '%' is the "conversion specifier":

> The format shall be a multibyte character sequence, beginning and
> ending in its initial shift state. The format is composed of zero or
> more directives: one or more white-space characters, an ordinary
> multibyte character (neither '%' nor a white-space character), or a
> conversion specification. Each conversion specification is introduced
> by the character '%'. After the '%', the following appear in sequence:
> 
> -- . . .
> 
> -- A "conversion specifier" character that specifies the type of
>    conversion to be applied.

That '%' is a valid conversion specifier is established a few paragraphs 
below:

> The conversion specifiers and their meanings are:
> 
> . . .
> 
> '%'     Matches a single '%' character; no conversion or assignment
>         occurs. The complete conversion specification shall be '%%'.

Between the above paragraphs, there is a definition of how a conversion 
specification is executed:

> A directive that is a conversion specification defines a set of matching
> input sequences, as described below for each specifier. A conversion
> specification is executed in the following steps:
> 
> Input white-space characters (as specified by the 'isspace' function)
> are skipped, unless the specification includes a '[', 'c', or 'n'
> specifier.
> 
> . . .

 From the above I conclude that all conversion specifications, except 
'%[', '%c', and '%n', consume whitespace. This includes the '%%' 
conversion specification.

The above can be applied just as well to C99. However, C11 added a new 
example (still in §7.21.6.2) that seems to confirm my reading of the 
normative text:

> EXAMPLE 5 The call:
> 
>     #include <stdio.h>
>     /* ... */
>     int n, i;
>     n = sscanf("foo % bar 42", "foo%%bar%d", &i);
> 
> will assign to 'n' the value 1 and to 'i' the value 42 because input
> white-space characters are skipped for both the '%' and 'd' conversion
> specifiers.

Now, the code in the example is clearly broken, as either the format 
string should be "foo%% bar%d" or the input string should be
"foo %bar 42", but the explanation does imply that '%%' consumes whitespace.

Bartosz


  reply	other threads:[~2017-07-10  8:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-09 21:00 Bartosz Brachaczek
2017-07-10  2:00 ` Rich Felker
2017-07-10  8:22   ` Bartosz Brachaczek [this message]
2017-07-10 14:44     ` Jens Gustedt
2017-07-10 20:59       ` Rich Felker
2017-09-04 22:00       ` Bartosz Brachaczek
2017-09-04 22:57         ` Jens Gustedt
2017-07-11  1:20 ` Rich Felker
2017-09-04 20:59   ` Rich Felker
2017-09-04 21:56     ` Bartosz Brachaczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2c7d1ce-1e0d-a504-f8be-313fe7385240@gmail.com \
    --to=b.brachaczek@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).