From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10572 Path: news.gmane.org!.POSTED!not-for-mail From: Johannes Schindelin Newsgroups: gmane.linux.lib.musl.general,gmane.comp.version-control.git Subject: Re: Re: Regression: git no longer works with musl libc's regex impl Date: Wed, 5 Oct 2016 13:17:49 +0200 (CEST) Message-ID: References: <20161004150848.GA7949@brightrain.aerifal.cx> <20161004152722.ex2nox43oj5ak4yi@sigill.intra.peff.net> <20161004154045.GT19318@brightrain.aerifal.cx> <20161004173926.GA19318@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Trace: blaine.gmane.org 1475666312 6164 195.159.176.226 (5 Oct 2016 11:18:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 5 Oct 2016 11:18:32 +0000 (UTC) User-Agent: Alpine 2.20 (DEB 67 2015-01-07) Cc: Jeff King , git@vger.kernel.org, musl@lists.openwall.com To: Rich Felker Original-X-From: musl-return-10585-gllmg-musl=m.gmane.org@lists.openwall.com Wed Oct 05 13:18:27 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1brkD8-0006uT-F2 for gllmg-musl@m.gmane.org; Wed, 05 Oct 2016 13:18:06 +0200 Original-Received: (qmail 26486 invoked by uid 550); 5 Oct 2016 11:18:07 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 26467 invoked from network); 5 Oct 2016 11:18:06 -0000 X-X-Sender: virtualbox@virtualbox In-Reply-To: <20161004173926.GA19318@brightrain.aerifal.cx> X-Provags-ID: V03:K0:zowJfZGxWarcAjldcwv9WK6xqrzHXd+Lwy7HkLA/kFJ3m3VmmnP BnDAGsh005YtRXFlCy2oflNV7YbwMJ01ZpZ9Ct/uHmc6DnstMRO/ooInHbI0ueQBY85mx0Q 5WGj7Bm1kogCZdRj7ViQFSRWoCngsiUbi2O0T03a5GAlSwojSJt5eketgKNhV8ioMopRh46 6JG4jTieDj2QoRiDFgjsA== X-UI-Out-Filterresults: notjunk:1;V01:K0:tdQLrdwCwKE=:8Bi+d/AQpFRlBDi68+dmll aoTdC4cCPtP3wv99NJ870hk4DiGhjHcrq33FXOC3sIx+d7o0Q8qTDsCzqqNaP9r1+UvVOG8Lu HqMyTQwICCQx2Vhdq0Qjv5D18TfpwgUJsenM/Jw/k+JieKw99mPalQvKGyJmIpFZ6NYGBwko/ qRWOM4GSO7v09BTYpFoFHdsRrJcpfOaYDWOcwIO0nWWFmbVSoJbZCPt6IPRnxqoX7qwMRpBlk rKtguiqBh3wtWXmx7XTTRgL3d+AvyUiAQ13frIjc49b2zKNXNWIZXSjP2zcYCMx/jFrZrB4Nx awXSeqnnZKzFLAjXR5QbjnD0do8ZMd4vpAZQXFlUZhl7n7CbnXrMt/GphBhcZEcKl/V1WwAin za7qIZ2E9mrTTq3ktAXBrFTQdfI6mCZqkh9XiinKGjLeTo+g1gCXmBmPH2KwNC5/KWOfaG99i tcHgwizzEZdK3DJkLiMog1yAvXyrFBuayCXHFACgFxL9A+r0sJBi69O84THJEAK2kn8q1CN1x LcNYhdIEtGEzrB94Cny9BFqWlgtdyvm/phBpmeCAwf7ikG38JZfgxeF+8pOIiw/BWewNIFJXx 1veziSO1zdFNsO+EPum+Mx2snl4d8cSQyYf5WoO8a8larsI/ApEz+uJ6Eb6oRl4zJecs4uihW Q5CQMDuSxvPPp6X9AfCQ1vtcwTJrVaji2wiVU8hKt3ZB4AhxTf3hE3rc8CFwlQtcv3edA/jyw dDzCVkLwukvXjLdTFz5j5fnqIFcOZDp/uygJcd63NjVJBeitUD/Th4UzEDw9Tsm+hso4JqZx Xref: news.gmane.org gmane.linux.lib.musl.general:10572 gmane.comp.version-control.git:306181 Archived-At: Hi Rich, On Tue, 4 Oct 2016, Rich Felker wrote: > On Tue, Oct 04, 2016 at 06:08:33PM +0200, Johannes Schindelin wrote: > > > And lastly, the best alternative would be to teach musl about > > REG_STARTEND, as it is rather useful a feature. > > Maybe, but it seems fundamentally costly to support -- it's extra > state in the inner loops that imposes costly spill/reload on archs > with too few registers (x86). It is true that it could cause that. I had a brief look at the source code (you use backtracking... hopefully nobody uses musl to parse regular expressions from untrusted, or inexperienced, sources [*1*]), and it seems that the regex code might spill unnecessarily already (I see, for example, that the reg_notbol, reg_noteol and reg_newline flags all use up complete int registers, not merely bits of a single one). It seems, specifically, that the *match_end_ofs parameter of the two regexec backends is always set to point to eo, which is so far not initialized. You could initialize it to -1 and set it to pmatch[0].rm_eo if the REG_STARTEND flag is set. The GET_NEXT_WCHAR() macro would then need to test something like if (str_byte >= string + *match_end_ofs) { ret = REG_NOMATCH; goto error_exit; } This does not handle non-zero pmatch[0].rm_so, though. I would probably try to pass another input parameter for that, but I have not verified yet that a "^" would be handled properly (if pmatch[0].rm_so > 0 and REG_STARTEND is set, "^" should *not* match). > I'll look at doing this when we overhaul/replace the regex > implementation, and I'm happy to do some performance-regression tests > for adding it now if someone has a simple patch (as was mentioned on the > musl list). I'd be interested to be kept in the loop, if you do not mind Cc:ing me. Ciao, Johannes Footnote *1*: http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016