* [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment
@ 2021-11-08 18:33 thyresias (Thierry Lambert)
2021-11-15 6:30 ` [ruby-core:106065] " duerst
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: thyresias (Thierry Lambert) @ 2021-11-08 18:33 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been reported by thyresias (Thierry Lambert).
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:106065] [Ruby master Bug#18294] error when parsing regexp comment
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
@ 2021-11-15 6:30 ` duerst
2021-11-15 9:59 ` [ruby-core:106070] " thyresias (Thierry Lambert)
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: duerst @ 2021-11-15 6:30 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been updated by duerst (Martin Dürst).
thyresias (Thierry Lambert) wrote:
> The following code generates the error "too short escaped multibyte character"
> ``` ruby
> _re = /
> foo # \M-ca
> /x
> ```
> Removing the \ or doubling it makes the error disappear.
> Since this is in comment text, I would expect to be able to type anything there: am I missing something?
I guess yes. It's somewhat counter-intuitive, but I guess the implementation is handling escapes while it reads the regexp up to the /x, and only then it knows that some parts of it are comments. It would be possible to change the implementation, but I don't know if it's worth it for such an edge case.
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294#change-94656
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:106070] [Ruby master Bug#18294] error when parsing regexp comment
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
2021-11-15 6:30 ` [ruby-core:106065] " duerst
@ 2021-11-15 9:59 ` thyresias (Thierry Lambert)
2021-12-29 22:26 ` [ruby-core:106911] " janosch-x
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: thyresias (Thierry Lambert) @ 2021-11-15 9:59 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been updated by thyresias (Thierry Lambert).
duerst (Martin Dürst) wrote in #note-1:
> I guess yes. It's somewhat counter-intuitive, but I guess the implementation is handling escapes while it reads the regexp up to the /x, and only then it knows that some parts of it are comments. It would be possible to change the implementation, but I don't know if it's worth it for such an edge case.
You have the same issue with this code, where it knows from the start this is an extended regexp, so I guess you explanation does not hold:
```ruby
_re = /(?x)
foo # \M-ca
/
ruby
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294#change-94659
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:106911] [Ruby master Bug#18294] error when parsing regexp comment
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
2021-11-15 6:30 ` [ruby-core:106065] " duerst
2021-11-15 9:59 ` [ruby-core:106070] " thyresias (Thierry Lambert)
@ 2021-12-29 22:26 ` janosch-x
2022-01-31 2:28 ` [ruby-core:107380] " matz (Yukihiro Matsumoto)
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: janosch-x @ 2021-12-29 22:26 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been updated by janosch-x (Janosch Müller).
this affects:
- all String escapes that can be invalid (`\x`, `\u`, `\u{...}`, `\M`, `\C`, `\c`)
- only invalid escapes (e.g. `\x7F` is fine)
- no Regexp-specific escapes such as `\p{...}`, `\g<...>`, `\k<...>`
- Regexp literals (`SyntaxError`) and `Regexp::new` (`RegexpError`)
- end-of-line comments as well as comment groups (these don't require x-mode)
- all Ruby versions
to give an example that is maybe a bit less edge-casy:
```ruby
/ C:\\[a-z]{5} # e.g. C:\users /x
# => ^
# => invalid Unicode escape (SyntaxError)
```
the comment handling in `regparse.c` could probably be changed fairly easily, it only happens [here]( https://github.com/ruby/ruby/blob/efa0c31ce518bb26aca80392cce7fc5471ca9fef/regparse.c#L3884 ) and [here]( https://github.com/ruby/ruby/blob/efa0c31ce518bb26aca80392cce7fc5471ca9fef/regparse.c#L4025 ). i could take this on with a few pointers.
i'm just wondering if the flags [here]( https://github.com/ruby/ruby/blob/efa0c31ce518bb26aca80392cce7fc5471ca9fef/parse.y#L6496 ) mean that escape sequences in Regexp literals are actually pre-processed by Ruby's main parser? it seems like this would make a fix much more complicated.
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294#change-95731
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:107380] [Ruby master Bug#18294] error when parsing regexp comment
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
` (2 preceding siblings ...)
2021-12-29 22:26 ` [ruby-core:106911] " janosch-x
@ 2022-01-31 2:28 ` matz (Yukihiro Matsumoto)
2022-01-31 9:05 ` [ruby-core:107387] " thyresias (Thierry Lambert)
2022-03-26 2:26 ` [ruby-core:108082] " jeremyevans0 (Jeremy Evans)
5 siblings, 0 replies; 7+ messages in thread
From: matz (Yukihiro Matsumoto) @ 2022-01-31 2:28 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been updated by matz (Yukihiro Matsumoto).
I admit this is a bug and it should be fixed. But implementation-wise, it's difficult to fix. Considering the (small) impact of the bug, its priority is low.
We will fix it but it could take a long time.
Matz.
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294#change-96282
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:107387] [Ruby master Bug#18294] error when parsing regexp comment
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
` (3 preceding siblings ...)
2022-01-31 2:28 ` [ruby-core:107380] " matz (Yukihiro Matsumoto)
@ 2022-01-31 9:05 ` thyresias (Thierry Lambert)
2022-03-26 2:26 ` [ruby-core:108082] " jeremyevans0 (Jeremy Evans)
5 siblings, 0 replies; 7+ messages in thread
From: thyresias (Thierry Lambert) @ 2022-01-31 9:05 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been updated by thyresias (Thierry Lambert).
I understand it is not easy to fix, and I sure can live with.
ありがとうMatzさん ^_^
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294#change-96289
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:108082] [Ruby master Bug#18294] error when parsing regexp comment
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
` (4 preceding siblings ...)
2022-01-31 9:05 ` [ruby-core:107387] " thyresias (Thierry Lambert)
@ 2022-03-26 2:26 ` jeremyevans0 (Jeremy Evans)
5 siblings, 0 replies; 7+ messages in thread
From: jeremyevans0 (Jeremy Evans) @ 2022-03-26 2:26 UTC (permalink / raw)
To: ruby-core
Issue #18294 has been updated by jeremyevans0 (Jeremy Evans).
I've submitted a pull request to fix this: https://github.com/ruby/ruby/pull/5721
The basic approach is skip the parse.y checks for regexps, because regexp does the same checks. Modify the regexp code to pass the regexp options to a couple of internal functions, and in the function that handles the unescaping, recognize `#` and ignore characters until the end of the line. This becomes complicated, because `#` is allowed as a regular, non-comment character, inside a character class `[]`. So attempt to handle that.
Additionally, I found this issue is not limited to extended regexps, it affects all regexps using `(?#` comments. So try to handle those comments as well by ignoring content inside them.
I'm not sure the pull request handles all cases, and I'm also not sure it doesn't introduce regressions. It would great if a more knowledgeable committer could review it.
The patch is kind of a hack. A better fix would be to integrate the unescaping code with the regexp parsing code, instead of trying to unescape in a first pass, before parsing in a later pass. This would allow the unescaping to be aware of the regexp parser state, making it simple to avoid unescaping when inside a regexp comment.
----------------------------------------
Bug #18294: error when parsing regexp comment
https://bugs.ruby-lang.org/issues/18294#change-97040
* Author: thyresias (Thierry Lambert)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [i386-mingw32]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The following code generates the error "too short escaped multibyte character"
``` ruby
_re = /
foo # \M-ca
/x
```
Removing the \ or doubling it makes the error disappear.
Since this is in comment text, I would expect to be able to type anything there: am I missing something?
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-03-26 2:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-08 18:33 [ruby-core:105972] [Ruby master Bug#18294] error when parsing regexp comment thyresias (Thierry Lambert)
2021-11-15 6:30 ` [ruby-core:106065] " duerst
2021-11-15 9:59 ` [ruby-core:106070] " thyresias (Thierry Lambert)
2021-12-29 22:26 ` [ruby-core:106911] " janosch-x
2022-01-31 2:28 ` [ruby-core:107380] " matz (Yukihiro Matsumoto)
2022-01-31 9:05 ` [ruby-core:107387] " thyresias (Thierry Lambert)
2022-03-26 2:26 ` [ruby-core:108082] " jeremyevans0 (Jeremy Evans)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).