From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11710 Path: news.gmane.org!.POSTED!not-for-mail From: Bartosz Brachaczek Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] handle whitespace before %% in scanf Date: Mon, 10 Jul 2017 10:22:37 +0200 Message-ID: References: <20170709210018.16369-1-b.brachaczek@gmail.com> <20170710020047.GL1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1499674977 7729 195.159.176.226 (10 Jul 2017 08:22:57 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 10 Jul 2017 08:22:57 +0000 (UTC) User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 To: musl@lists.openwall.com Original-X-From: musl-return-11723-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jul 10 10:22:48 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dUTxv-0001a7-AS for gllmg-musl@m.gmane.org; Mon, 10 Jul 2017 10:22:47 +0200 Original-Received: (qmail 32260 invoked by uid 550); 10 Jul 2017 08:22:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32240 invoked from network); 10 Jul 2017 08:22:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=oBkGPOQFylKExMdOv0cS3MhZ+iUWybYpfU7Fdb4c0+o=; b=QzgzDoRv4PwM4J6xnaToBTYqpqamjl1iYVbmykE2V0NdngUOYhfNa96mJ/+vS70Ljz 1xa3zGVCURpWj+dZn9gdRd1R+HMmd75EJB2taoLuxAXqZTY7uqWvoFli/B2WvoTibulk 1H+pDlHysfvQ7IaA7gSelSRbT2L3nDGnNEJuGaXK/chzRKBSsDVg2MtveBopj9uTXElB ypZYy9EI3zi7/JwlN5q1jVV31T/JTSEVWYcaT+5+Q7RRCJjeLDkfhbWaZS55XKAD5dqu Rn++brAfHiuXPomlZE3vs+243IJY4bKDOkH1BKHUqC6+7rF/mb9k4epTdGoHztwvzop/ Gfyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=oBkGPOQFylKExMdOv0cS3MhZ+iUWybYpfU7Fdb4c0+o=; b=rpfUPXneY5XLjtoUiW/0PqZPNkFVT32ANReV8kPcfbuhSpICZa1YnZRg0sQI/biukc 4RhrcV0YgG70KlQf0zYAObljjk2Ju+tqXy/x4aLRXsUZl9fTOLBPcKJuzJISSah7+PFH 4vp7/h88bQ9boMs/48Psw0b/QnPIf1+ubW1k5rc+b9D/X8xxduYvZQMTfCymtK3KPJBy QwtR1ox7yin6XdSSy4QiCv4e2X1eqoZCYNLLV7wDtDqRr8D6ll7e2krU7pSxz4cenCAX Mkr5pzGukSKheW2uiE06C11J8NVQB9bUSnVeteXHtz7xj01DpNTACK0wvyM36DQDx1rM 4BTA== X-Gm-Message-State: AIVw113fk++ssfXUBDWKWG/wWqv7Jx+iSdiXELm6BtCdjN/H11FT2EvZ F5ujQeL6S/mOKeu97Xg= X-Received: by 10.223.151.213 with SMTP id t21mr7254495wrb.20.1499674957917; Mon, 10 Jul 2017 01:22:37 -0700 (PDT) In-Reply-To: <20170710020047.GL1627@brightrain.aerifal.cx> Content-Language: en-US Xref: news.gmane.org gmane.linux.lib.musl.general:11710 Archived-At: Hello, On 7/10/2017 4:00 AM, Rich Felker wrote: > On Sun, Jul 09, 2017 at 11:00:18PM +0200, Bartosz Brachaczek wrote: >> this is mandated by C and POSIX standards and is in accordance with > ^^^^ >> glibc behavior. > > Can you explain exactly what "this" refers to? Ah, poor wording choice on my part. Yes, I meant that %% consumes whitespace. Shall I resend the patch with restated commit message if you think it's otherwise good? > It looks like you're claiming %% consumes space, which I can't find > any support for in the C standard. Has this topic been discussed > somewhere I should see? Sorry, I didn't think this would be controversial. No prior discussion. Let me present my reasoning below. The following paragraph in the description of the fscanf function in the C11 standard, §7.21.6.2, establishes that '%%' is a "conversion specification", where '%' is the "conversion specifier": > The format shall be a multibyte character sequence, beginning and > ending in its initial shift state. The format is composed of zero or > more directives: one or more white-space characters, an ordinary > multibyte character (neither '%' nor a white-space character), or a > conversion specification. Each conversion specification is introduced > by the character '%'. After the '%', the following appear in sequence: > > -- . . . > > -- A "conversion specifier" character that specifies the type of > conversion to be applied. That '%' is a valid conversion specifier is established a few paragraphs below: > The conversion specifiers and their meanings are: > > . . . > > '%' Matches a single '%' character; no conversion or assignment > occurs. The complete conversion specification shall be '%%'. Between the above paragraphs, there is a definition of how a conversion specification is executed: > A directive that is a conversion specification defines a set of matching > input sequences, as described below for each specifier. A conversion > specification is executed in the following steps: > > Input white-space characters (as specified by the 'isspace' function) > are skipped, unless the specification includes a '[', 'c', or 'n' > specifier. > > . . . From the above I conclude that all conversion specifications, except '%[', '%c', and '%n', consume whitespace. This includes the '%%' conversion specification. The above can be applied just as well to C99. However, C11 added a new example (still in §7.21.6.2) that seems to confirm my reading of the normative text: > EXAMPLE 5 The call: > > #include > /* ... */ > int n, i; > n = sscanf("foo % bar 42", "foo%%bar%d", &i); > > will assign to 'n' the value 1 and to 'i' the value 42 because input > white-space characters are skipped for both the '%' and 'd' conversion > specifiers. Now, the code in the example is clearly broken, as either the format string should be "foo%% bar%d" or the input string should be "foo %bar 42", but the explanation does imply that '%%' consumes whitespace. Bartosz