From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3665 Path: news.gmane.org!not-for-mail From: Isaac Newsgroups: gmane.linux.lib.musl.general Subject: Re: regex libs (was Re: [musl] embedded newbies site.) Date: Tue, 16 Jul 2013 22:38:11 -0700 Message-ID: <20130717053811.GA1719@newbook> References: <20130716153220.GP29800@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1374039509 5925 80.91.229.3 (17 Jul 2013 05:38:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 17 Jul 2013 05:38:29 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3669-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 17 07:38:29 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UzKRY-00052C-R0 for gllmg-musl@plane.gmane.org; Wed, 17 Jul 2013 07:38:28 +0200 Original-Received: (qmail 6079 invoked by uid 550); 17 Jul 2013 05:38:28 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 6062 invoked from network); 17 Jul 2013 05:38:27 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=lavabit; d=lavabit.com; b=uryQyYPr6o9AkYJBu0ZBFYTCqz2c6wFB/9kmZiuLj9muf6AAcxThF4cC2kACBk8UMND68ZzHhQVqAEl/dr29/l12w4twa9Z+xCuUp/rvUeRj9dBWdtF1ZZOIFpo51bMKdJstyaGZWq0hCW1hZGp3x222HPn9iLdgx3fhNvhJyZ4=; h=Date:From:To:Subject:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent; Content-Disposition: inline In-Reply-To: <20130716153220.GP29800@brightrain.aerifal.cx> User-Agent: Mutt/1.5.20 (2009-06-14) Xref: news.gmane.org gmane.linux.lib.musl.general:3665 Archived-At: On Tue, Jul 16, 2013 at 11:32:20AM -0400, Rich Felker wrote: > On Tue, Jul 16, 2013 at 11:10:34AM -0400, LM wrote: > > On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker wrote: > > > > > The whole concept of regular expressions is that they're regular, > > > meaning they're matchable in O(n) time with O(1) space. PCRE (the > > > implementation) uses backtracking for everything, giving it > > > exponentially-bad performance (JIT cannot fix this), and PCRE (the > > > language) has a lot of features that are fundamentally not regular and > > > thus can't be implemented efficiently. Also, the behavior of some of > > > the features (e.g. greedy vs non-greedy matching) were not designed > > > intentionally but just arose out of the backtracking implementation, > > > and thus don't make a lot of sense unless you think from the > > > standpoint of such an implementation. > > > > > > > Went back and rechecked the documentation ( > > http://www.pcre.org/readme.txt). You're both right, PCRE is offering > > the Perl regular expressions > > implementation even when one uses the pcreposix interface. Would have been > > nice if they offered actual regular expressions handling if you only want > > to use the POSIX compatible part of the interface. The pcreposix stuff is only for those who want perl regexes with an existing POSIX codebase. > > So what are some good regex library solutions? I'm also wondering if there > > POSIX systems provide the regcomp and regexec api as part of libc. On > musl, glibc, uclibc, and (afaik) all the bsd libcs, this gives you a > good implementation. However musl's is the only one I know of with no > conformance bugs, but the conformance bugs in the others are isolated > to obscure subexpression match issues. > > > are some good cross-platform portable library solutions (or if PCRE is the > > best pattern matching solution from a portability standpoint even if it's > > not strictly regex compatible). > > TRE, on which musl's is based, is the closest to being good, in my > opinion. It has optimal big-O but fairly suboptimal constant. However > it's unmaintained and has at least two bugs that aren't fixed > upstream, one of which may be exploitable if an untrusted party can > supply arbitrary regular expressions, and the other which just causes > some POSIX BRE expressions (the legacy style used by sed and non-e > grep) to be misinterpreted. Have these been mentioned to NetBSD? I ask because they use TRE for their libc (the relicense happened at their request). Thanks, Isaac Dunham