On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker <dalias@aerifal.cx> wrote:
The whole concept of regular expressions is that they're regular,
meaning they're matchable in O(n) time with O(1) space. PCRE (the
implementation) uses backtracking for everything, giving it
exponentially-bad performance (JIT cannot fix this), and PCRE (the
language) has a lot of features that are fundamentally not regular and
thus can't be implemented efficiently. Also, the behavior of some of
the features (e.g. greedy vs non-greedy matching) were not designed
intentionally but just arose out of the backtracking implementation,
and thus don't make a lot of sense unless you think from the
standpoint of such an implementation.

Went back and rechecked the documentation ( http://www.pcre.org/readme.txt ).  You're both right, PCRE is offering the Perl regular expressions implementation even when one uses the pcreposix interface.  Would have been nice if they offered actual regular expressions handling if you only want to use the POSIX compatible part of the interface.

So what are some good regex library solutions?  I'm also wondering if there are some good cross-platform portable library solutions (or if PCRE is the best pattern matching solution from a portability standpoint even if it's not strictly regex compatible).

There's http://code.google.com/p/re2/ , but I've read some issues with its performance in a few web articles and didn't have much luck with portability to non-Linux platforms.  There's the glibc solution:  http://sourceforge.net/p/mingw/regex/ci/master/tree/  There's TRE ( https://github.com/laurikari/tre/), which some BSD systems want to use to create their grep ( https://wiki.freebsd.org/BSDgrep ).  There's the Oniguruma library ( http://www.geocities.jp/kosako3/oniguruma/ ). There's Henry Spencer's regex ( http://www.arglist.com/regex ).  That looks promising for portability.  There's http://re2c.org/ Further searching also turns up http://tiny-rex.sourceforge.net/ which may have the same issues as PCRE.  ICU seems to offer regex code (http://userguide.icu-project.org/strings/regexp), probably same issue as PCRE.  (Just my opinion, but ICU seems to do a lot of stuff for just one library.)  There's BOOST's regex ( http://www.boost.org/doc/libs/1_54_0/libs/regex/doc/ ) which a lot of web sites recommend, but I've just never been a fan of the BOOST libraries.  The Heirloom Project has regex code ( http://heirloom.sourceforge.net/ ).  Have I missed any other interesting solutions?

Sounds like I need to better clarify between regex pattern matching libraries and pattern matching libraries on the musl wiki's alternative library page.  If you recommend any of the above libraries or possibly others and think certain implementations would be useful to others, let me know and I'll add the links to the wiki.  I haven't really added anything for pattern-matching libraries beyond benchmark information.

Thanks.

Sincerely,
Laura
http://www.distasis.com