From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7899
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: [PATCH] Byte-based C locale, draft 1
Date: Sat, 6 Jun 2015 21:17:38 -0400
Message-ID: <20150607011738.GB17573@brightrain.aerifal.cx>
References: <20150606214007.GA17398@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="dCSxeJc5W8HZXZrD"
X-Trace: ger.gmane.org 1433639877 4866 80.91.229.3 (7 Jun 2015 01:17:57 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 7 Jun 2015 01:17:57 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-7912-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jun 07 03:17:56 2015
Return-path: <musl-return-7912-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-7912-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1Z1PDm-0003Fo-QJ
	for gllmg-musl@m.gmane.org; Sun, 07 Jun 2015 03:17:54 +0200
Original-Received: (qmail 12130 invoked by uid 550); 7 Jun 2015 01:17:53 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 12112 invoked from network); 7 Jun 2015 01:17:52 -0000
Content-Disposition: inline
In-Reply-To: <20150606214007.GA17398@brightrain.aerifal.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:7899
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/7899>


--dCSxeJc5W8HZXZrD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Sat, Jun 06, 2015 at 05:40:07PM -0400, Rich Felker wrote:
> Before applying this I should probably overhaul fnmatch.c again. I
> believe it has some hard-coded UTF-8 processing code in it for the
> useless "check the tail before middle" step that I've been wanting to
> eliminate. Alternatively I could just apply a quick fix to make it
> work right without any invasive changes.
> 
> Other than possible weird cases with fnmatch (which are largely
> harmless but might inhibit matching high bytes in non-UTF-8 mode),
> this code should be ready for testing. I'd appreciate some feedback
> from anyone interested in the feature.

On further review, the special last-component handling fnmatch does is
not wrong, just wrongly ordered. It should take place after the "sea
of stars" component is processsed, rather than before, to avoid O(n)
operation (essentially strlen) when an early failure could be
detected. But since only the ordering is wrong, I think fixing it is
orthogonal to the bytelocale work, and a single-line patch to add a
case for MB_CUR_MAX==1 should just be added to this proposed patch
(see attached).

Rich

--dCSxeJc5W8HZXZrD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="bytelocale_v1_fnmatch.diff"

diff --git a/src/regex/fnmatch.c b/src/regex/fnmatch.c
index 7f6b65f..978fff8 100644
--- a/src/regex/fnmatch.c
+++ b/src/regex/fnmatch.c
@@ -18,6 +18,7 @@
 #include <stdlib.h>
 #include <wchar.h>
 #include <wctype.h>
+#include "locale_impl.h"
 
 #define END 0
 #define UNMATCHABLE -2
@@ -229,7 +230,7 @@ static int fnmatch_internal(const char *pat, size_t m, const char *str, size_t n
 	 * On illegal sequences we may get it wrong, but in that case
 	 * we necessarily have a matching failure anyway. */
 	for (s=endstr; s>str && tailcnt; tailcnt--) {
-		if (s[-1] < 128U) s--;
+		if (s[-1] < 128U || MB_CUR_MAX==1) s--;
 		else while ((unsigned char)*--s-0x80U<0x40 && s>str);
 	}
 	if (tailcnt) return FNM_NOMATCH;

--dCSxeJc5W8HZXZrD--