From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 31483 invoked from network); 6 Jan 2023 11:17:41 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 6 Jan 2023 11:17:41 -0000 Received: (qmail 15549 invoked by uid 550); 6 Jan 2023 11:17:38 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 15508 invoked from network); 6 Jan 2023 11:17:37 -0000 Date: Fri, 6 Jan 2023 06:17:24 -0500 From: Rich Felker To: Markus Wichmann Cc: musl@lists.openwall.com Message-ID: <20230106111724.GA4163@brightrain.aerifal.cx> References: <20230104150719.252185-1-gabravier@gmail.com> <20230106092010.GA2032@voyager> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230106092010.GA2032@voyager> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] fix return value of wcs{,n}cmp for near-limits signed wchar_t values On Fri, Jan 06, 2023 at 10:20:10AM +0100, Markus Wichmann wrote: > On Wed, Jan 04, 2023 at 04:07:19PM +0100, Gabriel Ravier wrote: > > int wcscmp(const wchar_t *l, const wchar_t *r) > > { > > for (; *l==*r && *l && *r; l++, r++); > > I just noticed this line. Isn't the "&& *r" part superfluous? If r is a > prefix of l, then *r and *l will be unequal, and the loop will terminate > because of the first condition alone. (If l is a prefix of r, we need > the second condition to terminate the loop.) Yes, but I would assume the compiler would make the same optimization anyway. The original motivation here may have just been writing it symmetrically in l and r. > > - return *l - *r; > > + return *l < *r ? -1 : *l > *r; > > } > > Ah, that old bug. The problem is that the difference between two 32-bit > values takes up 33 bits to save. I wonder if it would be worth it to > just cast the values to 64 bits, then dividing the result by two. You > know, like > > return ((int64_t)*l - *r) >> 1; > > Although that does presuppose that wchar_t is smaller than 64 bits, > which the proposed change does not require. As you noted later this doesn't work but I think the core flaw here is not the same as the classic bug. Rather, I probably had a wrong initial assumption that the function was intended only to work for valid wide character values of wchar_t not arbitrary integers that fit in wchar_t arrays. In that case they would be 21-bit and the difference would never overflow. Rich