mailing list of musl libc
 help / color / mirror / code / Atom feed
From: "A. Wilcox" <awilfox@adelielinux.org>
To: Jeff King <peff@peff.net>, Kevin Daudt <me@ikke.info>
Cc: git@vger.kernel.org, musl@lists.openwall.com
Subject: Re: Git 2.14.1: t6500: error during test on musl libc
Date: Fri, 15 Sep 2017 23:58:41 -0500	[thread overview]
Message-ID: <59BCAF81.3090206@adelielinux.org> (raw)
In-Reply-To: <20170915113011.emko6q5utb7x4bvu@sigill.intra.peff.net>


[-- Attachment #1.1: Type: text/plain, Size: 5341 bytes --]

On 15/09/17 06:30, Jeff King wrote:
> On Fri, Sep 15, 2017 at 08:37:40AM +0200, Kevin Daudt wrote:
> 
>> On Thu, Sep 14, 2017 at 09:43:12PM -0500, A. Wilcox wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> Hi there,
>>>
>>> While bumping Git's version for our Linux distribution to 2.14.1, I've
>>> run in to a new test failure in t6500-gc.sh.  This is the output of
>>> the failing test with debug=t verbose=t:
>>
>> This is a new test introduced by c45af94db 
>> (gc: run pre-detach operations under lock, 2017-07-11) which was
>> included in v2.14.0.
>>
>> So it might be that this was already a problem for a longer time, only
>> just recently uncovered.
> 
> The code change there is not all that big. Mostly we're just checking
> that the lock is actually respected. The lock code doesn't exercise libc
> all that much. It does use fscanf, which I guess is a little exotic for
> us. It's also possible that hostname() doesn't behave quite as we
> expect.
> 
> If you instrument gc like the patch below, what does it report when you
> run:
> 
>   GIT_TRACE=1 ./t6500-gc.sh --verbose-only=8
> 
> I get:
> 
>   [...]
>   trace: built-in: git 'gc' '--auto'
>   Auto packing the repository in background for optimum performance.
>   See "git help gc" for manual housekeeping.
>   debug: gc lock already held by $my_hostname
>   [...]
> 
> If you get "acquired gc lock", then the problem is in
> lock_repo_for_gc(), and I'd suspect some problem with fscanf or
> hostname.
> 
> -Peff


Hey there Peff,

What a corner-y corner case we have here.  I believe the actual error is
in the POSIX standard itself[1], as it is not clear what happens when
there are not enough characters to 'fill' the width specified with %c in
fscanf:

> c
>    Matches a sequence of bytes of the number specified by the field
> width (1 if no field width is present in the conversion
> specification).

I've tested a number of machines:

* OpenBSD 5.7/amd64
* NetBSD 7.0/i386
* FreeBSD 12/PowerPC
* glibc/arm
* Windows 7 with Microsoft Visual C++ 2013

All of them will allow a so-called "short read" and give you as many
characters as they can, treating the phrase "a sequence of bytes of the
number specified" as meaning "*up to* the number".

The musl libc treats this phrase as meaning "*exactly* the number", and
fails if it cannot give you exactly the number you ask.

IBM z/OS explicitly states in their documentation[2]:

> Sequence of one or more characters as specified by field width

And Microsoft similarly states[3]:

> The width field is a positive decimal integer controlling the maximum
> number of characters to be read for that field. No more than width
> characters are converted and stored at the corresponding argument.
> Fewer than width characters may be read if a whitespace character
> (space, tab, or newline) or a character that cannot be converted
> according to the given format occurs before width is reached.

While musl's reading is correct from an English grammar point of view,
it does not seem to be how any other implementation has read the standard.


However!  It gets better.

The ISO C standard, committee draft version April 12, 2011, states[4]:

> c    Matches a sequence of characters of exactly the number specified
> by the field width (1 if no field width is present in the directive).


Since "[t]his volume of POSIX.1-2008 defers to the ISO C standard", it
stands to reason that this is the intended meaning and behaviour.  Thus
meaning that literally every implementation, with the exception of the
musl libc, is breaking the ISO C standard.


Since Git is specifically attempting to read in a host name, there may
be a solution: while 'c' guarantees that any byte will be read, and 's'
will skip whitespace, RFCs 952 and 1123 §2.1[5] specify that a network
host name must never contain whitespace.  IDNA2008 §2.3.2.1[6] (and
IDNA2003 before it) specifically removes ASCII whitespace characters
from the valid set of Unicode codepoints for an IDNA host name[7].
Additionally, the buffer `locking_host` is already defined as an array
of char of size HOST_NAME_MAX + 1, and the width specifier in fscanf is
specified as HOST_NAME_MAX.  Therefore, it should be safe to change git
to use the 's' type character.  Additionally, I can confirm that this
change (patch attached) allows the Git test suite to pass on musl.


I hope this message is informative.  This was an exhausting, but
necessary, exercise in trying to ensure code correctness.

I am cc'ing the musl list so that this information may live there as
well, in case someone in the future has issues with the 'c' type
specifier with fscanf on musl.


All the best,
--arw


[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html
[2]:
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/fscanf.htm
[3]: https://msdn.microsoft.com/en-us/library/xdb9w69d.aspx
[4]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
[5]: https://tools.ietf.org/html/rfc1123#section-2.1
[6]: https://tools.ietf.org/html/rfc5890#section-2.3.2.1
[7]: http://unicode.org/faq/idn.html#33
-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-gc-use-s-type-character-for-fscanf.patch --]
[-- Type: text/x-patch; name="0001-gc-use-s-type-character-for-fscanf.patch", Size: 1149 bytes --]

From afceb0f7755a87d0dd2194e95f26c9dc8f4bc688 Mon Sep 17 00:00:00 2001
From: "A. Wilcox" <AWilcox@Wilcox-Tech.com>
Date: Fri, 15 Sep 2017 23:55:57 -0500
Subject: [PATCH] gc: use 's' type character for fscanf

The ISO C standard states that using a field width together with the 'c'
type character will read the exact amount specified; if that amount of
bytes is not available, a match error occurs.

This patch allows the t6500 test to pass on the musl libc, and `git gc`
to behave correctly on syystems utilising musl.

Signed-off-by: A. Wilcox <AWilcox@Wilcox-Tech.com>
---
 builtin/gc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 3c78fcb..bb2d6c1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -258,7 +258,7 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 		int should_exit;
 
 		if (!scan_fmt)
-			scan_fmt = xstrfmt("%s %%%dc", "%"SCNuMAX, HOST_NAME_MAX);
+			scan_fmt = xstrfmt("%s %%%ds", "%"SCNuMAX, HOST_NAME_MAX);
 		fp = fopen(pidfile_path, "r");
 		memset(locking_host, 0, sizeof(locking_host));
 		should_exit =
-- 
2.10.0


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

       reply	other threads:[~2017-09-16  4:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <59BB3E40.7020804@adelielinux.org>
     [not found] ` <20170915063740.GB21499@alpha.vpn.ikke.info>
     [not found]   ` <20170915113011.emko6q5utb7x4bvu@sigill.intra.peff.net>
2017-09-16  4:58     ` A. Wilcox [this message]
2017-09-16 16:13       ` Rich Felker
2017-09-17  0:36       ` Junio C Hamano
2017-09-17  1:17         ` Szabolcs Nagy
2017-09-17  1:58         ` A. Wilcox
2017-09-17  3:16           ` Junio C Hamano
2017-09-17  3:38             ` A. Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59BCAF81.3090206@adelielinux.org \
    --to=awilfox@adelielinux.org \
    --cc=git@vger.kernel.org \
    --cc=me@ikke.info \
    --cc=musl@lists.openwall.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).