From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/9602 Path: news.gmane.org!not-for-mail From: Julien Ramseier Newsgroups: gmane.linux.lib.musl.general Subject: [PATCH] regex: support non-greedy quantifiers Date: Sun, 13 Mar 2016 12:06:39 +0100 Message-ID: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1457867282 9849 80.91.229.3 (13 Mar 2016 11:08:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 13 Mar 2016 11:08:02 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-9615-gllmg-musl=m.gmane.org@lists.openwall.com Sun Mar 13 12:08:01 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1af3sP-0004wz-BP for gllmg-musl@m.gmane.org; Sun, 13 Mar 2016 12:08:01 +0100 Original-Received: (qmail 20217 invoked by uid 550); 13 Mar 2016 11:07:57 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 20168 invoked from network); 13 Mar 2016 11:07:52 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-transfer-encoding:subject:message-id:date:to :mime-version; bh=blyL2JskpILY8eiBnZPVuH9zef9qnW3LFFUH2/Kog1U=; b=bFuUpaXTisNa+BF9+1qiND4Fi9O04gHq8EzJ6jSo0eJ6YkDgxCDs7NkL2TtmJECfRI T57wXKIZsE9p8woWGPWNLqHL2udH1PGdG2PlKT/KnB38Jg4Nh4nKsYGJ20qzuF58toqg ahYfdEBPU2O8+E8xSYCm5EnBSFFr0ZtM85WySQzroezmPJ6bfMPLJU6tyhwg097PwSCN CdXbgaeZdyFeVEBkQQi79HGhkuAwqXoUFJuHV9/78V9QMx83oNzbNnHgi76KvVA5Lm2g dq/SuyDL6Oqjbd1JYW2VuC3sK77fHE1IfIKjOt4qey8tqvCc8zWMklTPKd2L+n+f1s7u KADg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-transfer-encoding:subject :message-id:date:to:mime-version; bh=blyL2JskpILY8eiBnZPVuH9zef9qnW3LFFUH2/Kog1U=; b=EZV9+JRzKQGYqmBZcKazWFZ2G1+TTePo+laDdePDBaoP+vDfN97tNYHskchoB1RtIg ul2O4NVXyOAbOgzZpLoOYmgeEslFlqHY+Hnm4rY47l+QNXNoILTckQT3MIKoh1V1NgDx y+CyrdbUKkGvnL7d4EVzYOcx/oDIe0TPGS2ExxOjU0mUeeddHjBUwDVTzvcyr0XS2f4y 2dpawyUZpsIS+D0Dy5cS9cKWF9gTShVwhIkBz2bFe2sc2duFx/5wEqDnxRks8ziChaaC NoPzRKxfSxrjU8+6L8GuCH2SHegjHhLXmLU1HSOxIUmHoRR01ReyGu/0p1NcxRIK4yYW rcnw== X-Gm-Message-State: AD7BkJK1AwRpdmiQxLWJgh0uRH52GEnm5KkbB1gfl6gehtSiHLhYpRJ/j5qWV/ulNBiLxQ== X-Received: by 10.194.134.134 with SMTP id pk6mr18960819wjb.176.1457867261443; Sun, 13 Mar 2016 04:07:41 -0700 (PDT) X-Mailer: Apple Mail (2.3112) Xref: news.gmane.org gmane.linux.lib.musl.general:9602 Archived-At: Here's a tiny patch to enable non-greedy regex quantifiers. This is not specified by POSIX, but I think it's a useful extension, and all the code for supporting it is already present. I tested this against the TRE and AT&T test suites (from NetBSD) and didn't found any regressions. However I don't know all the ins and outs of the implementation and I may have missed something obvious. - Julien diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index 5fad98b..cc7d633 100644 --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c @@ -979,6 +979,7 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx) parse_iter: for (;;) { int min, max; + int minimal = 0; if (*s!='\\' && *s!='*') { if (!ere) @@ -1014,11 +1015,16 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx) if (*s == '?') max = 1; s++; + /* Non-greedy */ + if (ere && *s == '?') { + minimal = 1; + s++; + } } if (max == 0) ctx->n = tre_ast_new_literal(ctx->mem, EMPTY, -1, -1); else - ctx->n = tre_ast_new_iter(ctx->mem, ctx->n, min, max, 0); + ctx->n = tre_ast_new_iter(ctx->mem, ctx->n, min, max, minimal); if (!ctx->n) return REG_ESPACE; } -- 2.7.2