From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3656 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: regex libs (was Re: [musl] embedded newbies site.) Date: Tue, 16 Jul 2013 11:32:20 -0400 Message-ID: <20130716153220.GP29800@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1373988754 4161 80.91.229.3 (16 Jul 2013 15:32:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 16 Jul 2013 15:32:34 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3660-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jul 16 17:32:36 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Uz7Ev-0007r2-EP for gllmg-musl@plane.gmane.org; Tue, 16 Jul 2013 17:32:33 +0200 Original-Received: (qmail 11308 invoked by uid 550); 16 Jul 2013 15:32:33 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11300 invoked from network); 16 Jul 2013 15:32:32 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3656 Archived-At: On Tue, Jul 16, 2013 at 11:10:34AM -0400, LM wrote: > On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker wrote: > > > The whole concept of regular expressions is that they're regular, > > meaning they're matchable in O(n) time with O(1) space. PCRE (the > > implementation) uses backtracking for everything, giving it > > exponentially-bad performance (JIT cannot fix this), and PCRE (the > > language) has a lot of features that are fundamentally not regular and > > thus can't be implemented efficiently. Also, the behavior of some of > > the features (e.g. greedy vs non-greedy matching) were not designed > > intentionally but just arose out of the backtracking implementation, > > and thus don't make a lot of sense unless you think from the > > standpoint of such an implementation. > > > > Went back and rechecked the documentation ( > http://www.pcre.org/readme.txt). You're both right, PCRE is offering > the Perl regular expressions > implementation even when one uses the pcreposix interface. Would have been > nice if they offered actual regular expressions handling if you only want > to use the POSIX compatible part of the interface. > > So what are some good regex library solutions? I'm also wondering if there POSIX systems provide the regcomp and regexec api as part of libc. On musl, glibc, uclibc, and (afaik) all the bsd libcs, this gives you a good implementation. However musl's is the only one I know of with no conformance bugs, but the conformance bugs in the others are isolated to obscure subexpression match issues. > are some good cross-platform portable library solutions (or if PCRE is the > best pattern matching solution from a portability standpoint even if it's > not strictly regex compatible). TRE, on which musl's is based, is the closest to being good, in my opinion. It has optimal big-O but fairly suboptimal constant. However it's unmaintained and has at least two bugs that aren't fixed upstream, one of which may be exploitable if an untrusted party can supply arbitrary regular expressions, and the other which just causes some POSIX BRE expressions (the legacy style used by sed and non-e grep) to be misinterpreted. Rich