From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10555 Path: news.gmane.org!.POSTED!not-for-mail From: Jeff King Newsgroups: gmane.comp.version-control.git,gmane.linux.lib.musl.general Subject: Re: Regression: git no longer works with musl libc's regex impl Date: Tue, 4 Oct 2016 11:27:22 -0400 Message-ID: <20161004152722.ex2nox43oj5ak4yi@sigill.intra.peff.net> References: <20161004150848.GA7949@brightrain.aerifal.cx> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: blaine.gmane.org 1475594854 4204 195.159.176.226 (4 Oct 2016 15:27:34 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 4 Oct 2016 15:27:34 +0000 (UTC) Cc: git@vger.kernel.org, musl@lists.openwall.com To: Rich Felker Original-X-From: git-owner@vger.kernel.org Tue Oct 04 17:27:28 2016 Return-path: Envelope-to: gcvg-git-2@m.gmane.org Original-Received: from vger.kernel.org ([209.132.180.67]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1brRcu-0000Jc-CD for gcvg-git-2@m.gmane.org; Tue, 04 Oct 2016 17:27:28 +0200 Original-Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751969AbcJDP11 (ORCPT ); Tue, 4 Oct 2016 11:27:27 -0400 Original-Received: from cloud.peff.net ([104.130.231.41]:52055 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751393AbcJDP10 (ORCPT ); Tue, 4 Oct 2016 11:27:26 -0400 Original-Received: (qmail 28976 invoked by uid 109); 4 Oct 2016 15:27:25 -0000 Original-Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Tue, 04 Oct 2016 15:27:25 +0000 Original-Received: (qmail 5998 invoked by uid 111); 4 Oct 2016 15:27:42 -0000 Original-Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Tue, 04 Oct 2016 11:27:42 -0400 Original-Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 04 Oct 2016 11:27:22 -0400 Content-Disposition: inline In-Reply-To: <20161004150848.GA7949@brightrain.aerifal.cx> Original-Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Xref: news.gmane.org gmane.comp.version-control.git:306074 gmane.linux.lib.musl.general:10555 Archived-At: On Tue, Oct 04, 2016 at 11:08:48AM -0400, Rich Felker wrote: > This commit broke support for using git with musl libc: > > https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202 Yep. The idea is that you would compile git with NO_REGEX=1, and it would use the included compat routines. Is there something in particular you want to get out of using musl's regex that is not supported in the compat library? > Rather than depending on non-portable GNU regex extensions, there is a > simple portable fix for the issue this code was added to work around: > When a text file is being mmapped for use with string functions which > depend on null termination, if the file size: > > 1. is nonzero mod page size, it just works; the remainder of the last > page reads as zero bytes when mmapped. Is that a portable assumption? > 2. if an exact multiple of the page size, then instead of directly > mmapping the file, first mmap a mapping 1 byte (thus 1 page) larger > with MAP_ANON, then use MAP_FIXED to map the file over top of all > but the last page. Now the mmapped buffer can safely be used as a C > string. I'm not sure whether all of our compat layers for mmap would be happy with that (e.g., see compat/win32mmap.c). So it seems like any mmap-related solutions would have to be conditional, too. And then regexec_buf() would have to become something like: int regexec_buf(...) { #if defined(REG_STARTEND) ... set up match ... return regexec(..., REG_STARTEND); #elif defined(MMAP_ALWAYS_HAS_NUL) /* * We assume that every buffer we see is always NUL-terminated * eventually, either because it comes from xmallocz() or our * mmap layer always ensures an extra NUL. */ return regexec(...); #else #error "Nope, you need either NO_REGEX or USE_MMAP_NUL" #endif } The assumption in the middle case feels pretty hacky, though. It fails if we get a buffer from somewhere besides those two sources. It fails if somebody calls regexec_buf() on a subset of a string. It also doesn't handle matching past embedded NULs in the string. That's not something we're relying on yet, but it would be nice to support consistently in the long run. If there's a compelling reason, it might be worth making that tradeoff. But I am not sure what the compelling reason is to use musl's regex (aside from the obvious of "less code in the resulting executable"). -Peff