From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6336 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: regcomp regression? Date: Wed, 15 Oct 2014 21:03:06 -0400 Message-ID: <20141016010306.GU32028@brightrain.aerifal.cx> References: <543EE0A5.2000905@sholland.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1413421460 3541 80.91.229.3 (16 Oct 2014 01:04:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 16 Oct 2014 01:04:20 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-6349-gllmg-musl=m.gmane.org@lists.openwall.com Thu Oct 16 03:04:12 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XeZUC-00044Q-Id for gllmg-musl@plane.gmane.org; Thu, 16 Oct 2014 03:04:12 +0200 Original-Received: (qmail 9370 invoked by uid 550); 16 Oct 2014 01:04:11 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 8052 invoked from network); 16 Oct 2014 01:03:18 -0000 Content-Disposition: inline In-Reply-To: <543EE0A5.2000905@sholland.net> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:6336 Archived-At: On Wed, Oct 15, 2014 at 04:01:25PM -0500, Samuel Holland wrote: > Hello, > > I've been rebuilding packages after the 1.1.5 release, and it's caused > some (apparent) regressions. file no longer compiles as it is unable to > parse one of its magic files. The offending regex is (windows, line 163) > > \\`(\r\n|;|[[]|\xFF\xFE) > > It's testing for the BOM at the beginning of an INI/INF file. I > understand the regex rewrite removed[1] the ability to match arbitrary > bytes (even with the C locale) because it was broken; is this something > you plan to add back? Or is the application wrong? If so, what > workaround do you suggest? It was not supported before either; it was just silently misprocessed as if the regex were: \\`(\r\n|;|[[]|) Obviously this was undesirable. The fixes made to the parser caught this bug. I think there's a patch for file upstream already, but it does not really fix the bug; it just makes the symptom go away again. The problem is that they're attempting to use regex to process binary data, which is not a valid usage. If we add the controversial byte-based C locale that's been discussed, this could be made to work, but that's still an open question whether it will be done. It adds a good deal of ugliness and code duplication to the codebase. Rich