From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7899 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] Byte-based C locale, draft 1 Date: Sat, 6 Jun 2015 21:17:38 -0400 Message-ID: <20150607011738.GB17573@brightrain.aerifal.cx> References: <20150606214007.GA17398@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="dCSxeJc5W8HZXZrD" X-Trace: ger.gmane.org 1433639877 4866 80.91.229.3 (7 Jun 2015 01:17:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 7 Jun 2015 01:17:57 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7912-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jun 07 03:17:56 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Z1PDm-0003Fo-QJ for gllmg-musl@m.gmane.org; Sun, 07 Jun 2015 03:17:54 +0200 Original-Received: (qmail 12130 invoked by uid 550); 7 Jun 2015 01:17:53 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 12112 invoked from network); 7 Jun 2015 01:17:52 -0000 Content-Disposition: inline In-Reply-To: <20150606214007.GA17398@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7899 Archived-At: --dCSxeJc5W8HZXZrD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sat, Jun 06, 2015 at 05:40:07PM -0400, Rich Felker wrote: > Before applying this I should probably overhaul fnmatch.c again. I > believe it has some hard-coded UTF-8 processing code in it for the > useless "check the tail before middle" step that I've been wanting to > eliminate. Alternatively I could just apply a quick fix to make it > work right without any invasive changes. > > Other than possible weird cases with fnmatch (which are largely > harmless but might inhibit matching high bytes in non-UTF-8 mode), > this code should be ready for testing. I'd appreciate some feedback > from anyone interested in the feature. On further review, the special last-component handling fnmatch does is not wrong, just wrongly ordered. It should take place after the "sea of stars" component is processsed, rather than before, to avoid O(n) operation (essentially strlen) when an early failure could be detected. But since only the ordering is wrong, I think fixing it is orthogonal to the bytelocale work, and a single-line patch to add a case for MB_CUR_MAX==1 should just be added to this proposed patch (see attached). Rich --dCSxeJc5W8HZXZrD Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="bytelocale_v1_fnmatch.diff" diff --git a/src/regex/fnmatch.c b/src/regex/fnmatch.c index 7f6b65f..978fff8 100644 --- a/src/regex/fnmatch.c +++ b/src/regex/fnmatch.c @@ -18,6 +18,7 @@ #include #include #include +#include "locale_impl.h" #define END 0 #define UNMATCHABLE -2 @@ -229,7 +230,7 @@ static int fnmatch_internal(const char *pat, size_t m, const char *str, size_t n * On illegal sequences we may get it wrong, but in that case * we necessarily have a matching failure anyway. */ for (s=endstr; s>str && tailcnt; tailcnt--) { - if (s[-1] < 128U) s--; + if (s[-1] < 128U || MB_CUR_MAX==1) s--; else while ((unsigned char)*--s-0x80U<0x40 && s>str); } if (tailcnt) return FNM_NOMATCH; --dCSxeJc5W8HZXZrD--