From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_LOW,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 26876 invoked from network); 29 May 2023 15:59:44 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 29 May 2023 15:59:44 -0000 Received: (qmail 21772 invoked by uid 550); 29 May 2023 15:59:42 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 21740 invoked from network); 29 May 2023 15:59:41 -0000 Date: Mon, 29 May 2023 11:59:29 -0400 From: Rich Felker To: =?utf-8?B?SuKCkeKCmeKCmw==?= Gustedt Cc: musl@lists.openwall.com Message-ID: <20230529155929.GV4163@brightrain.aerifal.cx> References: <20230529123202.63f09fc2@inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230529123202.63f09fc2@inria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] changes for scanf in C23 On Mon, May 29, 2023 at 12:32:02PM +0200, Jā‚‘ā‚™ā‚› Gustedt wrote: > Hi, > we already discussed this but it doesn't seem that we have come to a > conclusion. > > The problem is that for C23 semantics of several string to integer > conversion functions change: a 'b' or 'B' that previously was the stop > condition for integer parsing may become part of the integer > string. This concerns all `scanf` and `strto` derivatives. > > This is probably not a problem for most applications that parse > strings to integers, but it could be in some situations, and in > particular it could open vulnerabilities. E.g network addresses that > are read with base `0` (musl does this at some point to allow to have > decimal or hex strings) could be open to attacks, once people start > using binary encodings for integers more often. Another scenario where > this could lead to harm is automatically produced output that is > automatically scanned, and where nobody previously took care of proper > word boundaries. > > My current idea is to have two sets of these functions, one that has > the old semantics and one that has the new. This was rejected already in the first proposal (thread here): Message-ID: <20230503000045.GU4163@brightrain.aerifal.cx> https://www.openwall.com/lists/musl/2023/05/03/1 "There are not going to be different versions of scanf/strto* because there's just no way to do that in a conforming way..." There are other reasons for this too that basically amount to not repeating glibc mistakes. At some point I proposed a way that we could do C-version-specific behavior via branching on an extern defined by linking in c23+ mode, if this is really necessary. This probably needs more thought to flesh out a design that's robust and has the right properties and make sure we don't do anything that locks us into future trouble. However, as I've said before, C users have survived multiple repeated incompatible changes of this form, including the same thing happening with hex floats. Moreover, strto* already are permitted to accept arbitrary additional implementation-defined sequences except in the C locale, so there's only any change at all in the C locale. My leaning, if the committee is going to make these kinds of incompatible changes, is to say that applications just have to be prepared to deal with them and do any additional validation they deem necessary to their usage cases. Rich