From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2605 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: REG_STARTEND (regex) Date: Tue, 15 Jan 2013 08:42:44 -0500 Message-ID: <20130115134244.GW20323@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1358257386 14047 80.91.229.3 (15 Jan 2013 13:43:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 15 Jan 2013 13:43:06 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-2606-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 15 14:43:24 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Tv6nN-0004Mf-0F for gllmg-musl@plane.gmane.org; Tue, 15 Jan 2013 14:43:17 +0100 Original-Received: (qmail 24477 invoked by uid 550); 15 Jan 2013 13:42:57 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 24469 invoked from network); 15 Jan 2013 13:42:57 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:2605 Archived-At: On Tue, Jan 15, 2013 at 11:34:59AM +0100, Daniel Cegiełka wrote: > Hi, > Is there a chance that musl will support REG_STARTEND? It is used > quite often in *BSD. > > http://www.sourceware.org/ml/libc-alpha/2004-03/msg00038.html Probably not, at least not in the immediate future. The original TRE code actually worked with strings as a base+length rather than null-terminated internally, which meant a lot of things were a lot more expensive they should be; if I remember correctly, even searches for text guaranteed to be found near the beginning of the string required strlen for the whole string, i.e. the whole operation was needlessly O(n). In one of the cleanup rounds, I changed it to use null termination, which simplified a lot of the tests; many checks collapsed away since \0 was automatically not in the set being checked against and thus no second check was requried. If/when we overhaul regex again, I'll certainly consider this request and see if the design can be made such that it's not expensive. But I don't see any easy way to do it right now short of making a temp copy of the string. That _would_ be possible; \0 could be replaced with \xff, and \xff replaced with \fe, and special logic added to allow \xff (which is otherwise an invalid byte and never matchable) while still rejecting \xfe and other invalid bytes. This would require no changes to the internals, but it would have the property of requiring an O(n) malloc/memcpy, which is certainly not very appealing. Rich