mailing list of musl libc
 help / color / mirror / code / Atom feed
From: LM <lmemsm@gmail.com>
To: musl@lists.openwall.com
Subject: regex libs (was Re: [musl] embedded newbies site.)
Date: Tue, 16 Jul 2013 11:10:34 -0400	[thread overview]
Message-ID: <CAFipMOGkZSz+9w-6a528fbzE1=J4jnnqZvMQmTbkg7iQWeW5hA@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker <dalias@aerifal.cx> wrote:

> The whole concept of regular expressions is that they're regular,
> meaning they're matchable in O(n) time with O(1) space. PCRE (the
> implementation) uses backtracking for everything, giving it
> exponentially-bad performance (JIT cannot fix this), and PCRE (the
> language) has a lot of features that are fundamentally not regular and
> thus can't be implemented efficiently. Also, the behavior of some of
> the features (e.g. greedy vs non-greedy matching) were not designed
> intentionally but just arose out of the backtracking implementation,
> and thus don't make a lot of sense unless you think from the
> standpoint of such an implementation.
>

Went back and rechecked the documentation (
http://www.pcre.org/readme.txt).  You're both right, PCRE is offering
the Perl regular expressions
implementation even when one uses the pcreposix interface.  Would have been
nice if they offered actual regular expressions handling if you only want
to use the POSIX compatible part of the interface.

So what are some good regex library solutions?  I'm also wondering if there
are some good cross-platform portable library solutions (or if PCRE is the
best pattern matching solution from a portability standpoint even if it's
not strictly regex compatible).

There's http://code.google.com/p/re2/ , but I've read some issues with its
performance in a few web articles and didn't have much luck with
portability to non-Linux platforms.  There's the glibc solution:
http://sourceforge.net/p/mingw/regex/ci/master/tree/  There's TRE (
https://github.com/laurikari/tre/), which some BSD systems want to use to
create their grep ( https://wiki.freebsd.org/BSDgrep ).  There's the
Oniguruma library ( http://www.geocities.jp/kosako3/oniguruma/ ). There's
Henry Spencer's regex ( http://www.arglist.com/regex ).  That looks
promising for portability.  There's http://re2c.org/ Further searching also
turns up http://tiny-rex.sourceforge.net/ which may have the same issues as
PCRE.  ICU seems to offer regex code (
http://userguide.icu-project.org/strings/regexp), probably same issue as
PCRE.  (Just my opinion, but ICU seems to do a lot of stuff for just one
library.)  There's BOOST's regex (
http://www.boost.org/doc/libs/1_54_0/libs/regex/doc/ ) which a lot of web
sites recommend, but I've just never been a fan of the BOOST libraries.
The Heirloom Project has regex code ( http://heirloom.sourceforge.net/ ).
Have I missed any other interesting solutions?

Sounds like I need to better clarify between regex pattern matching
libraries and pattern matching libraries on the musl wiki's alternative
library page.  If you recommend any of the above libraries or possibly
others and think certain implementations would be useful to others, let me
know and I'll add the links to the wiki.  I haven't really added anything
for pattern-matching libraries beyond benchmark information.

Thanks.

Sincerely,
Laura
http://www.distasis.com

[-- Attachment #2: Type: text/html, Size: 4091 bytes --]

             reply	other threads:[~2013-07-16 15:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 15:10 LM [this message]
2013-07-16 15:32 ` Rich Felker
2013-07-17  5:38   ` Isaac
2013-07-16 15:41 ` Justin Cormack
2013-07-16 16:55   ` Szabolcs Nagy
2013-07-16 17:13 ` Strake
2013-07-16 17:14 ` Kurt H Maier
2013-07-16 17:38   ` Szabolcs Nagy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFipMOGkZSz+9w-6a528fbzE1=J4jnnqZvMQmTbkg7iQWeW5hA@mail.gmail.com' \
    --to=lmemsm@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).