* [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
@ 2025-11-24 15:15 thyresias (Thierry Lambert) via ruby-core
2025-11-24 16:54 ` [ruby-core:123895] " jeremyevans0 (Jeremy Evans) via ruby-core
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: thyresias (Thierry Lambert) via ruby-core @ 2025-11-24 15:15 UTC (permalink / raw)
To: ruby-core; +Cc: thyresias (Thierry Lambert)
Issue #21709 has been reported by thyresias (Thierry Lambert).
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123895] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
@ 2025-11-24 16:54 ` jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-24 18:01 ` [ruby-core:123896] " thyresias (Thierry Lambert) via ruby-core
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2025-11-24 16:54 UTC (permalink / raw)
To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)
Issue #21709 has been updated by jeremyevans0 (Jeremy Evans).
Status changed from Open to Feedback
This is not a bug, it is deliberate behavior for ASCII-only strings in `rb_reg_quote` (internal function called by `Regexp.escape`):
```c
if (ascii_only) {
rb_enc_associate(tmp, rb_usascii_encoding());
}
```
`US-ASCII` strings will be automatically converted to UTF-8 if necessary:
```ruby
("foo".encode("US-ASCII") + "\u1234").encoding
# => #<Encoding:UTF-8>
```
Does this behavior cause any problems in your application?
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115299
* Author: thyresias (Thierry Lambert)
* Status: Feedback
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123896] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
2025-11-24 16:54 ` [ruby-core:123895] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2025-11-24 18:01 ` thyresias (Thierry Lambert) via ruby-core
2025-11-24 18:52 ` [ruby-core:123897] " jeremyevans0 (Jeremy Evans) via ruby-core
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: thyresias (Thierry Lambert) via ruby-core @ 2025-11-24 18:01 UTC (permalink / raw)
To: ruby-core; +Cc: thyresias (Thierry Lambert)
Issue #21709 has been updated by thyresias (Thierry Lambert).
> Does this behavior cause any problems in your application?
Yes:
```ruby
search_text = "foo"
s_search = Regexp.escape(search_text)
re_prefix = /\p{In_Arabic}.+ /
s_search.prepend re_prefix.source
_re = /^#{s_search}|(?<=– |: )#{s_search}/ #=> encoding mismatch in dynamic regexp : US-ASCII and UTF-8 (RegexpError)
```
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115300
* Author: thyresias (Thierry Lambert)
* Status: Feedback
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123897] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
2025-11-24 16:54 ` [ruby-core:123895] " jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-24 18:01 ` [ruby-core:123896] " thyresias (Thierry Lambert) via ruby-core
@ 2025-11-24 18:52 ` jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-24 20:36 ` [ruby-core:123898] " thyresias (Thierry Lambert) via ruby-core
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2025-11-24 18:52 UTC (permalink / raw)
To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)
Issue #21709 has been updated by jeremyevans0 (Jeremy Evans).
Status changed from Feedback to Open
thyresias (Thierry Lambert) wrote in #note-2:
> > Does this behavior cause any problems in your application?
>
> Yes:
> ```ruby
> search_text = "foo"
> s_search = Regexp.escape(search_text)
> re_prefix = /\p{In_Arabic}.+ /
> s_search.prepend re_prefix.source
> _re = /^#{s_search}|(?<=– |: )#{s_search}/ #=> encoding mismatch in dynamic regexp : US-ASCII and UTF-8 (RegexpError)
> ```
Thank you for providing an example. This seems more like an issue with the literal Regexp support in general than with `Regexp.escape`. You can trigger the issue without `Regexp.escape`:
```ruby
re = /#{"\\p{In_Arabic}".encode("US-ASCII")}\u1234/
# encoding mismatch in dynamic regexp : US-ASCII and UTF-8
```
It seems to require you specify unicode properties inside an interpolated string that isn't in UTF-8.
You get a different error without that unicode character at the end:
```ruby
re = /#{"\\p{In_Arabic}".encode("US-ASCII")}/
# invalid character property name {In_Arabic}: /\p{In_Arabic}/
```
Using `Regexp.new` instead of a literal Regexp may work around the issue:
```ruby
search_text = "foo"
s_search = Regexp.escape(search_text)
re_prefix = /\p{In_Arabic}.+ /
s_search.prepend re_prefix.source
_re = Regexp.new("^#{s_search}|(?<=– |: )#{s_search}")
```
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115301
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123898] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (2 preceding siblings ...)
2025-11-24 18:52 ` [ruby-core:123897] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2025-11-24 20:36 ` thyresias (Thierry Lambert) via ruby-core
2025-11-24 21:50 ` [ruby-core:123899] " jeremyevans0 (Jeremy Evans) via ruby-core
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: thyresias (Thierry Lambert) via ruby-core @ 2025-11-24 20:36 UTC (permalink / raw)
To: ruby-core; +Cc: thyresias (Thierry Lambert)
Issue #21709 has been updated by thyresias (Thierry Lambert).
Ok for the workaround, but don't you think all this is inconsistent?
For me, it's a bug, not a feature. ^_^
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115302
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123899] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (3 preceding siblings ...)
2025-11-24 20:36 ` [ruby-core:123898] " thyresias (Thierry Lambert) via ruby-core
@ 2025-11-24 21:50 ` jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-25 11:21 ` [ruby-core:123903] " thyresias (Thierry Lambert) via ruby-core
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2025-11-24 21:50 UTC (permalink / raw)
To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)
Issue #21709 has been updated by jeremyevans0 (Jeremy Evans).
thyresias (Thierry Lambert) wrote in #note-4:
> Ok for the workaround, but don't you think all this is inconsistent?
> For me, it's a bug, not a feature. ^_^
I agree this represents a bug, which is why I changed the status back to Open. However, I think the bug is in the literal Regexp support, not in `Regexp.escape`.
In general, US-ASCII strings are implicitly convertible to UTF-8 strings, so having `Regexp.escape` return a US-ASCII string for data that is solely US-ASCII is reasonable. This implicit use of US-ASCII happens in other cases:
```
# Literal Symbol
$ ruby -e "p :a.encoding"
#<Encoding:US-ASCII>
# Array#join
$ ruby -e "p [].join.encoding"
#<Encoding:US-ASCII>
# Literal Regexp
$ ruby -e "p //.encoding"
#<Encoding:US-ASCII>
```
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115303
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123903] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (4 preceding siblings ...)
2025-11-24 21:50 ` [ruby-core:123899] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2025-11-25 11:21 ` thyresias (Thierry Lambert) via ruby-core
2025-11-25 16:02 ` [ruby-core:123909] " jeremyevans0 (Jeremy Evans) via ruby-core
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: thyresias (Thierry Lambert) via ruby-core @ 2025-11-25 11:21 UTC (permalink / raw)
To: ruby-core; +Cc: thyresias (Thierry Lambert)
Issue #21709 has been updated by thyresias (Thierry Lambert).
jeremyevans0 (Jeremy Evans) wrote in #note-5:
> I agree this represents a bug, which is why I changed the status back to Open. However, I think the bug is in the literal Regexp support, not in `Regexp.escape`.
Thank you. I agree with your analysis of the bug origin: should I edit this to re-qualify it as "inconsistent Regexp interpolation behavior", and update the example code using your examples?
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115306
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123909] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (5 preceding siblings ...)
2025-11-25 11:21 ` [ruby-core:123903] " thyresias (Thierry Lambert) via ruby-core
@ 2025-11-25 16:02 ` jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-28 10:32 ` [ruby-core:123931] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation thyresias (Thierry Lambert) via ruby-core
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2025-11-25 16:02 UTC (permalink / raw)
To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)
Issue #21709 has been updated by jeremyevans0 (Jeremy Evans).
thyresias (Thierry Lambert) wrote in #note-6:
> Thank you. I agree with your analysis of the bug origin: should I edit this to re-qualify it as "inconsistent Regexp interpolation behavior", and update the example code using your examples?
Sure, that sounds like a good idea.
----------------------------------------
Bug #21709: Inconsistent encoding by Regexp.escape
https://bugs.ruby-lang.org/issues/21709#change-115313
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123931] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (6 preceding siblings ...)
2025-11-25 16:02 ` [ruby-core:123909] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2025-11-28 10:32 ` thyresias (Thierry Lambert) via ruby-core
2025-11-28 17:24 ` [ruby-core:123944] " jeremyevans0 (Jeremy Evans) via ruby-core
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: thyresias (Thierry Lambert) via ruby-core @ 2025-11-28 10:32 UTC (permalink / raw)
To: ruby-core; +Cc: thyresias (Thierry Lambert)
Issue #21709 has been updated by thyresias (Thierry Lambert).
Subject changed from Inconsistent encoding by Regexp.escape to Regexp interpolation is inconsistent with String interpolation
jeremyevans0 (Jeremy Evans) wrote in #note-7:
> Sure, that sounds like a good idea.
It seems I cannot change the description, just the title.
Should I open a new bug report?
As an aside, you said about the encoding of the result of `Regexp.escape`:
> This is not a bug, it is deliberate behavior for ASCII-only strings in `rb_reg_quote` (internal function called by `Regexp.escape`):
What is the logic in this? It's surprising that the encoding of the output does not match the encoding of the input, and I read somewhere that Matz follows the principle of least surprise...
----------------------------------------
Bug #21709: Regexp interpolation is inconsistent with String interpolation
https://bugs.ruby-lang.org/issues/21709#change-115330
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123944] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (7 preceding siblings ...)
2025-11-28 10:32 ` [ruby-core:123931] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation thyresias (Thierry Lambert) via ruby-core
@ 2025-11-28 17:24 ` jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-28 18:13 ` [ruby-core:123945] " thyresias (Thierry Lambert) via ruby-core
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2025-11-28 17:24 UTC (permalink / raw)
To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)
Issue #21709 has been updated by jeremyevans0 (Jeremy Evans).
thyresias (Thierry Lambert) wrote in #note-8:
> jeremyevans0 (Jeremy Evans) wrote in #note-7:
> > Sure, that sounds like a good idea.
>
> It seems I cannot change the description, just the title.
> Should I open a new bug report?
Updating just the title is fine. I don't think you need to open a new bug report.
> As an aside, you said about the encoding of the result of `Regexp.escape`:
>
> > This is not a bug, it is deliberate behavior for ASCII-only strings in `rb_reg_quote` (internal function called by `Regexp.escape`):
>
> What is the logic in this? It's surprising that the encoding of the output does not match the encoding of the input, and I read somewhere that Matz follows the principle of least surprise...
The related line was last changed in commit:0f4199fb56ec12dae32a6fa099f15aaa7e55d10f. However, that appears to be a bug fix, and even before that, the function was designed to return US-ASCII for ASCII-only strings. Looks like the actual change was made in commit:b2e60b2ce7a7cbcb8a67ac78606a18d3c2591d81. The reasoning given:
```
(rb_reg_quote): return ascii-8bit string if the argument is
ascii-only to generate encoding generic regexp if possible.
```
----------------------------------------
Bug #21709: Regexp interpolation is inconsistent with String interpolation
https://bugs.ruby-lang.org/issues/21709#change-115345
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:123945] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (8 preceding siblings ...)
2025-11-28 17:24 ` [ruby-core:123944] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2025-11-28 18:13 ` thyresias (Thierry Lambert) via ruby-core
2025-12-11 9:48 ` [ruby-core:124136] " naruse (Yui NARUSE) via ruby-core
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: thyresias (Thierry Lambert) via ruby-core @ 2025-11-28 18:13 UTC (permalink / raw)
To: ruby-core; +Cc: thyresias (Thierry Lambert)
Issue #21709 has been updated by thyresias (Thierry Lambert).
Ok.
Here is the code that shows the inconsistency Regexp/String for interpolation, from your examples:
```ruby
# inconsistent Regexp/String interpolation behavior
prefix = '\p{In_Arabic}'
suffix = '\p{In_Arabic}'.encode('US-ASCII')
begin
re = /#{prefix}#{suffix}/
rescue => ex
puts "fail: #{ex.message} (#{ex.class})"
# fail: encoding mismatch in dynamic regexp : US-ASCII and UTF-8 (RegexpError)
end
s = "#{prefix}#{suffix}"
re = /#{s}/
puts "ok: #{s.inspect} (#{s.encoding}) -> #{re.inspect} (#{re.encoding})"
# ok: "\\p{In_Arabic}\\p{In_Arabic}" (UTF-8) -> /\p{In_Arabic}\p{In_Arabic}/ (UTF-8)
begin
re = /#{suffix}/
rescue => ex
puts "fail: #{ex.message} (#{ex.class})"
# fail: invalid character property name {In_Arabic}: /\p{In_Arabic}/ (RegexpError)
end
s = "#{suffix}"
re = /#{s}/
puts "ok: #{s.inspect} (#{s.encoding}) -> #{re.inspect} (#{re.encoding})"
# ok: "\\p{In_Arabic}" (UTF-8) -> /\p{In_Arabic}/ (UTF-8)
```
----------------------------------------
Bug #21709: Regexp interpolation is inconsistent with String interpolation
https://bugs.ruby-lang.org/issues/21709#change-115347
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:124136] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (9 preceding siblings ...)
2025-11-28 18:13 ` [ruby-core:123945] " thyresias (Thierry Lambert) via ruby-core
@ 2025-12-11 9:48 ` naruse (Yui NARUSE) via ruby-core
2025-12-15 11:49 ` [ruby-core:124210] " Eregon (Benoit Daloze) via ruby-core
2026-02-26 8:47 ` [ruby-core:124884] " augustingbpe (Augustin Gottlieb) via ruby-core
12 siblings, 0 replies; 14+ messages in thread
From: naruse (Yui NARUSE) via ruby-core @ 2025-12-11 9:48 UTC (permalink / raw)
To: ruby-core; +Cc: naruse (Yui NARUSE)
Issue #21709 has been updated by naruse (Yui NARUSE).
```ruby
re = /#{"\\p{In_Arabic}".encode("US-ASCII")}\u1234/
# encoding mismatch in dynamic regexp : US-ASCII and UTF-8
```
This behavior looks a bug.
----------------------------------------
Bug #21709: Regexp interpolation is inconsistent with String interpolation
https://bugs.ruby-lang.org/issues/21709#change-115590
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:124210] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (10 preceding siblings ...)
2025-12-11 9:48 ` [ruby-core:124136] " naruse (Yui NARUSE) via ruby-core
@ 2025-12-15 11:49 ` Eregon (Benoit Daloze) via ruby-core
2026-02-26 8:47 ` [ruby-core:124884] " augustingbpe (Augustin Gottlieb) via ruby-core
12 siblings, 0 replies; 14+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2025-12-15 11:49 UTC (permalink / raw)
To: ruby-core; +Cc: Eregon (Benoit Daloze)
Issue #21709 has been updated by Eregon (Benoit Daloze).
Right, I think Regexp interpolation should be closer to String interpolation, currently it's its own separate thing with rather weird rules.
It reminds me of some other issues related to Regexp interpolation like #20407 and linked issues.
----------------------------------------
Bug #21709: Regexp interpolation is inconsistent with String interpolation
https://bugs.ruby-lang.org/issues/21709#change-115693
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:124884] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
` (11 preceding siblings ...)
2025-12-15 11:49 ` [ruby-core:124210] " Eregon (Benoit Daloze) via ruby-core
@ 2026-02-26 8:47 ` augustingbpe (Augustin Gottlieb) via ruby-core
12 siblings, 0 replies; 14+ messages in thread
From: augustingbpe (Augustin Gottlieb) via ruby-core @ 2026-02-26 8:47 UTC (permalink / raw)
To: ruby-core; +Cc: augustingbpe (Augustin Gottlieb)
Issue #21709 has been updated by augustingbpe (Augustin Gottlieb).
Hi everyone, I tried to give it a try to fix this issue on this PR, I hope it helps and also to get deeper into the issue, all the tests are passing
https://github.com/ruby/ruby/pull/16224
----------------------------------------
Bug #21709: Regexp interpolation is inconsistent with String interpolation
https://bugs.ruby-lang.org/issues/21709#change-116546
* Author: thyresias (Thierry Lambert)
* Status: Open
* ruby -v: ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x64-mingw-ucrt]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
```ruby
%w(foo être).each do |s|
puts "string: #{s.inspect} -> #{s.encoding}"
puts "escaped: #{Regexp.escape(s).inspect} -> #{Regexp.escape(s).encoding}"
end
```
Output:
```
string: "foo" -> UTF-8
escaped: "foo" -> US-ASCII
string: "être" -> UTF-8
escaped: "être" -> UTF-8
```
The result should always match the encoding of the argument.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-02-26 8:48 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 15:15 [ruby-core:123894] [Ruby Bug#21709] Inconsistent encoding by Regexp.escape thyresias (Thierry Lambert) via ruby-core
2025-11-24 16:54 ` [ruby-core:123895] " jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-24 18:01 ` [ruby-core:123896] " thyresias (Thierry Lambert) via ruby-core
2025-11-24 18:52 ` [ruby-core:123897] " jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-24 20:36 ` [ruby-core:123898] " thyresias (Thierry Lambert) via ruby-core
2025-11-24 21:50 ` [ruby-core:123899] " jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-25 11:21 ` [ruby-core:123903] " thyresias (Thierry Lambert) via ruby-core
2025-11-25 16:02 ` [ruby-core:123909] " jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-28 10:32 ` [ruby-core:123931] [Ruby Bug#21709] Regexp interpolation is inconsistent with String interpolation thyresias (Thierry Lambert) via ruby-core
2025-11-28 17:24 ` [ruby-core:123944] " jeremyevans0 (Jeremy Evans) via ruby-core
2025-11-28 18:13 ` [ruby-core:123945] " thyresias (Thierry Lambert) via ruby-core
2025-12-11 9:48 ` [ruby-core:124136] " naruse (Yui NARUSE) via ruby-core
2025-12-15 11:49 ` [ruby-core:124210] " Eregon (Benoit Daloze) via ruby-core
2026-02-26 8:47 ` [ruby-core:124884] " augustingbpe (Augustin Gottlieb) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).