mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Isaac <idunham@lavabit.com>
To: musl@lists.openwall.com
Subject: Re: regex libs (was Re: [musl] embedded newbies site.)
Date: Tue, 16 Jul 2013 22:38:11 -0700	[thread overview]
Message-ID: <20130717053811.GA1719@newbook> (raw)
In-Reply-To: <20130716153220.GP29800@brightrain.aerifal.cx>

On Tue, Jul 16, 2013 at 11:32:20AM -0400, Rich Felker wrote:
> On Tue, Jul 16, 2013 at 11:10:34AM -0400, LM wrote:
> > On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker <dalias@aerifal.cx> wrote:
> > 
> > > The whole concept of regular expressions is that they're regular,
> > > meaning they're matchable in O(n) time with O(1) space. PCRE (the
> > > implementation) uses backtracking for everything, giving it
> > > exponentially-bad performance (JIT cannot fix this), and PCRE (the
> > > language) has a lot of features that are fundamentally not regular and
> > > thus can't be implemented efficiently. Also, the behavior of some of
> > > the features (e.g. greedy vs non-greedy matching) were not designed
> > > intentionally but just arose out of the backtracking implementation,
> > > and thus don't make a lot of sense unless you think from the
> > > standpoint of such an implementation.
> > >
> > 
> > Went back and rechecked the documentation (
> > http://www.pcre.org/readme.txt).  You're both right, PCRE is offering
> > the Perl regular expressions
> > implementation even when one uses the pcreposix interface.  Would have been
> > nice if they offered actual regular expressions handling if you only want
> > to use the POSIX compatible part of the interface.

The pcreposix stuff is only for those who want perl regexes with an existing
POSIX codebase.

> > So what are some good regex library solutions?  I'm also wondering if there
> 
> POSIX systems provide the regcomp and regexec api as part of libc. On
> musl, glibc, uclibc, and (afaik) all the bsd libcs, this gives you a
> good implementation. However musl's is the only one I know of with no
> conformance bugs, but the conformance bugs in the others are isolated
> to obscure subexpression match issues.
> 
> > are some good cross-platform portable library solutions (or if PCRE is the
> > best pattern matching solution from a portability standpoint even if it's
> > not strictly regex compatible).
> 
> TRE, on which musl's is based, is the closest to being good, in my
> opinion. It has optimal big-O but fairly suboptimal constant. However
> it's unmaintained and has at least two bugs that aren't fixed
> upstream, one of which may be exploitable if an untrusted party can
> supply arbitrary regular expressions, and the other which just causes
> some POSIX BRE expressions (the legacy style used by sed and non-e
> grep) to be misinterpreted.

Have these been mentioned to NetBSD? I ask because they use TRE for their libc
(the relicense happened at their request).

Thanks,
Isaac Dunham



  reply	other threads:[~2013-07-17  5:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 15:10 LM
2013-07-16 15:32 ` Rich Felker
2013-07-17  5:38   ` Isaac [this message]
2013-07-16 15:41 ` Justin Cormack
2013-07-16 16:55   ` Szabolcs Nagy
2013-07-16 17:13 ` Strake
2013-07-16 17:14 ` Kurt H Maier
2013-07-16 17:38   ` Szabolcs Nagy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130717053811.GA1719@newbook \
    --to=idunham@lavabit.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).