mailing list of musl libc
 help / color / mirror / code / Atom feed
* regcomp regression?
@ 2014-10-15 21:01 Samuel Holland
  2014-10-16  1:03 ` Rich Felker
  2014-10-16  1:12 ` Szabolcs Nagy
  0 siblings, 2 replies; 4+ messages in thread
From: Samuel Holland @ 2014-10-15 21:01 UTC (permalink / raw)
  To: musl

Hello,

I've been rebuilding packages after the 1.1.5 release, and it's caused
some (apparent) regressions. file no longer compiles as it is unable to
parse one of its magic files. The offending regex is (windows, line 163)

   \\`(\r\n|;|[[]|\xFF\xFE)

It's testing for the BOM at the beginning of an INI/INF file. I
understand the regex rewrite removed[1] the ability to match arbitrary
bytes (even with the C locale) because it was broken; is this something
you plan to add back? Or is the application wrong? If so, what
workaround do you suggest?

The m4 testsuite also now fails tests 109 and 121; this seems to be
caused by the same change.

[1] 
http://git.musl-libc.org/cgit/musl/commit?id=ec1aed0a144b3e00e16eeb142c9d13362d6048e7

-- 
Regards,
Samuel Holland <samuel@sholland.net>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regcomp regression?
  2014-10-15 21:01 regcomp regression? Samuel Holland
@ 2014-10-16  1:03 ` Rich Felker
  2014-10-16  1:12 ` Szabolcs Nagy
  1 sibling, 0 replies; 4+ messages in thread
From: Rich Felker @ 2014-10-16  1:03 UTC (permalink / raw)
  To: musl

On Wed, Oct 15, 2014 at 04:01:25PM -0500, Samuel Holland wrote:
> Hello,
> 
> I've been rebuilding packages after the 1.1.5 release, and it's caused
> some (apparent) regressions. file no longer compiles as it is unable to
> parse one of its magic files. The offending regex is (windows, line 163)
> 
>   \\`(\r\n|;|[[]|\xFF\xFE)
> 
> It's testing for the BOM at the beginning of an INI/INF file. I
> understand the regex rewrite removed[1] the ability to match arbitrary
> bytes (even with the C locale) because it was broken; is this something
> you plan to add back? Or is the application wrong? If so, what
> workaround do you suggest?

It was not supported before either; it was just silently misprocessed
as if the regex were:

   \\`(\r\n|;|[[]|)

Obviously this was undesirable. The fixes made to the parser caught
this bug. I think there's a patch for file upstream already, but it
does not really fix the bug; it just makes the symptom go away again.
The problem is that they're attempting to use regex to process binary
data, which is not a valid usage.

If we add the controversial byte-based C locale that's been discussed,
this could be made to work, but that's still an open question whether
it will be done. It adds a good deal of ugliness and code duplication
to the codebase.

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regcomp regression?
  2014-10-15 21:01 regcomp regression? Samuel Holland
  2014-10-16  1:03 ` Rich Felker
@ 2014-10-16  1:12 ` Szabolcs Nagy
  2014-10-16  1:43   ` Samuel Holland
  1 sibling, 1 reply; 4+ messages in thread
From: Szabolcs Nagy @ 2014-10-16  1:12 UTC (permalink / raw)
  To: musl

* Samuel Holland <samuel@sholland.net> [2014-10-15 16:01:25 -0500]:
> I've been rebuilding packages after the 1.1.5 release, and it's caused
> some (apparent) regressions. file no longer compiles as it is unable to
> parse one of its magic files. The offending regex is (windows, line 163)
> 
>   \\`(\r\n|;|[[]|\xFF\xFE)
> 
> It's testing for the BOM at the beginning of an INI/INF file. I
> understand the regex rewrite removed[1] the ability to match arbitrary
> bytes (even with the C locale) because it was broken; is this something
> you plan to add back? Or is the application wrong? If so, what
> workaround do you suggest?

this was a bug in file (in theory we could provide such extension, but
it's non-trivial and applications should not rely on it: posix re is not
usable for binary data)

there is upstream fix:
http://bugs.gw.com/view.php?id=383

> The m4 testsuite also now fails tests 109 and 121; this seems to be
> caused by the same change.
> 
> [1] http://git.musl-libc.org/cgit/musl/commit?id=ec1aed0a144b3e00e16eeb142c9d13362d6048e7
> 

this commit only made the bug more visible (fail at regex parse time
instead of building a nonsense state machine in case of invalid
characters)

i didnt know about m4 issues, are you talking about
http://git.savannah.gnu.org/gitweb/?p=m4.git;a=blob;f=tests/testsuite.at
?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regcomp regression?
  2014-10-16  1:12 ` Szabolcs Nagy
@ 2014-10-16  1:43   ` Samuel Holland
  0 siblings, 0 replies; 4+ messages in thread
From: Samuel Holland @ 2014-10-16  1:43 UTC (permalink / raw)
  To: musl

On 10/15/2014 08:12 PM, Szabolcs Nagy wrote:
> this was a bug in file (in theory we could provide such extension,
> but it's non-trivial and applications should not rely on it: posix
> re is not usable for binary data)
>
> there is upstream fix: http://bugs.gw.com/view.php?id=383

I updated from 5.19 to 5.20 and it compiles now, thanks.

> i didnt know about m4 issues, are you talking about
> http://git.savannah.gnu.org/gitweb/?p=m4.git;a=blob;f=tests/testsuite.at

No, when you run `make check' it appears to auto-generate testcases from
the texinfo documentation[1]. the first one is at line 4257, and the
second is at line 4536. You can ignore my report; I misremembered. They
fail on 1.1.4 too. The problem is they use Latin-1 characters that are
invalid UTF-8. So that's one more "the C locale is not binary-safe"
compatibility issue.

[1] 
http://git.savannah.gnu.org/gitweb/?p=m4.git;a=blob;f=doc/m4.texi;h=81dd255e4b9a7ee8fdc73cc8c30e448d5a7718ee;hb=d1bce954ab5f164041541d128fa491c68f2bc1a6

-- 
Regards,
Samuel Holland <samuel@sholland.net>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-16  1:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-15 21:01 regcomp regression? Samuel Holland
2014-10-16  1:03 ` Rich Felker
2014-10-16  1:12 ` Szabolcs Nagy
2014-10-16  1:43   ` Samuel Holland

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).