mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Julien Ramseier <j.ramseier@gmail.com>
Cc: musl@lists.openwall.com, Johannes.Schindelin@gmx.de
Subject: Re: [PATCH] regex: REG_STARTEND support
Date: Wed, 5 Oct 2016 12:23:05 -0400	[thread overview]
Message-ID: <20161005162305.GI19318@brightrain.aerifal.cx> (raw)
In-Reply-To: <B6143A4F-1AB7-46E9-9476-305FF1A2CD49@gmail.com>

On Wed, Oct 05, 2016 at 02:19:35PM +0200, Julien Ramseier wrote:
>    Here's my REG_STARTEND patch, mostly copied from the original tre[1]
>    implementation.
>    It's only lightly tested.
> [...]
> diff --git a/src/regex/regexec.c b/src/regex/regexec.c
> index 16c5d0a..ae65726 100644
> --- a/src/regex/regexec.c
> +++ b/src/regex/regexec.c
> @@ -29,6 +29,7 @@
>  
>  */
>  
> +#include <sys/types.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <wchar.h>
> @@ -51,11 +52,15 @@ tre_fill_pmatch(size_t nmatch, regmatch_t pmatch[], int cflags,
>  
>  #define GET_NEXT_WCHAR() do {                                                 \
>      prev_c = next_c; pos += pos_add_next;                                     \
> -    if ((pos_add_next = mbtowc(&next_c, str_byte, MB_LEN_MAX)) <= 0) {        \
> -        if (pos_add_next < 0) { ret = REG_NOMATCH; goto error_exit; }         \
> -        else pos_add_next++;                                                  \
> +    if (len >= 0 && pos >= len)                                               \
> +        next_c = L'\0';                                                       \

As caught discussing this on #musl yesterday, pos (int) here has the
wrong type, int, which is a big problem. I'm going to work on a test
case to show it and confirm that changing the type fixes it.

> +    else {                                                                    \
> +        if ((pos_add_next = mbtowc(&next_c, str_byte, MB_LEN_MAX)) <= 0) {    \
> +            if (pos_add_next < 0) { ret = REG_NOMATCH; goto error_exit; }     \
> +            else pos_add_next++;                                              \
> +        }                                                                     \
> +        str_byte += pos_add_next;                                             \
>      }                                                                         \
> -    str_byte += pos_add_next;                                                 \

There also seems to be a bug, which was also present in the original
TRE I think, whereby read past len can happen if the buffer up to len
ends with a partial multibyte character. Avoiding this seems rather
costly.

Otherwise this doesn't look too bad. I'll see if we can get some
figures for how it affects performance.

Rich


      reply	other threads:[~2016-10-05 16:23 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-05 12:19 Julien Ramseier
2016-10-05 16:23 ` Rich Felker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161005162305.GI19318@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=j.ramseier@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).