ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes
@ 2026-02-08  6:41 jneen (Jeanine Adkisson) via ruby-core
  2026-02-08 19:00 ` [ruby-core:124718] " tompng (tomoya ishida) via ruby-core
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-08  6:41 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been reported by jneen (Jeanine Adkisson).

----------------------------------------
Bug #21870: Regexp: Warnings when using multiple non-overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.0, 4.0.1, earlier versions to a lesser extent
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-overlapping set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124718] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-08 19:00 ` tompng (tomoya ishida) via ruby-core
  2026-02-08 19:03 ` [ruby-core:124719] " jneen (Jeanine Adkisson) via ruby-core
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: tompng (tomoya ishida) via ruby-core @ 2026-02-08 19:00 UTC (permalink / raw)
  To: ruby-core; +Cc: tompng (tomoya ishida)

Issue #21870 has been updated by tompng (tomoya ishida).


I found 130 (5 sets of 26 alphabets) characters matching both `\p{S}` and `\p{Word}`.
The visual looks like alphabet-ish symbol character
~~~ruby
(0..0x10ffff).select{(s=''<<it; s=~/\p{Word}/&&s=~/\p{S}/) rescue false}.map{''<<it}.join
# ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ
# ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩ
# 🄰🄱🄲🄳🄴🄵🄶🄷🄸🄹🄺🄻🄼🄽🄾🄿🅀🅁🅂🅃🅄🅅🅆🅇🅈🅉
# 🅐🅑🅒🅓🅔🅕🅖🅗🅘🅙🅚🅛🅜🅝🅞🅟🅠🅡🅢🅣🅤🅥🅦🅧🅨🅩
# 🅰🅱🅲🅳🅴🅵🅶🅷🅸🅹🅺🅻🅼🅽🅾🅿🆀🆁🆂🆃🆄🆅🆆🆇🆈🆉
~~~
I'm not sure how to read unicode properties, but it looks like these characters are Alphabetic:Yes and also in Other_Symbol category https://util.unicode.org/UnicodeJsps/character.jsp?a=%E2%92%B6


----------------------------------------
Bug #21870: Regexp: Warnings when using multiple non-overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116315

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-overlapping set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124719] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
  2026-02-08 19:00 ` [ruby-core:124718] " tompng (tomoya ishida) via ruby-core
@ 2026-02-08 19:03 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-09  5:50 ` [ruby-core:124724] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping " jneen (Jeanine Adkisson) via ruby-core
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-08 19:03 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


I see! So they do have some overlap. Is it really correct to warn here though? "Fixing" the warning would require falling back to manual unicode ranges.

----------------------------------------
Bug #21870: Regexp: Warnings when using multiple non-overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116316

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-overlapping set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124724] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
  2026-02-08 19:00 ` [ruby-core:124718] " tompng (tomoya ishida) via ruby-core
  2026-02-08 19:03 ` [ruby-core:124719] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-09  5:50 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-09  5:54 ` [ruby-core:124725] " jneen (Jeanine Adkisson) via ruby-core
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-09  5:50 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


Another example of this is `/[\p{Word}\p{Cf}]/`, which seem to overlap precisely on ZWNJ (U+200C) and ZWJ (U+200D).

```ruby
[1] pry(main)> (0..0x10ffff).select{(s=[it].pack('U'); s=~/\p{Word}/&&s=~/\p{Cf}/) rescue false}.map{it.to_s 16 }
=> ["200c", "200d"]
[2] pry(main)> /[\p{Word}\p{Cf}]/
(pry):5: warning: character class has duplicated range: /[\p{Word}\p{Cf}]/
=> /[\p{Word}\p{Cf}]/
[3] pry(main)> 
```



----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116324

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124725] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (2 preceding siblings ...)
  2026-02-09  5:50 ` [ruby-core:124724] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-09  5:54 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-09  8:10 ` [ruby-core:124728] " mame (Yusuke Endoh) via ruby-core
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-09  5:54 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).

Description updated

That specific case also appears to have changed, e.g. on 3.4.1:

```ruby
[2] pry(main)> (0..0x10ffff).select{(s=[it].pack('U'); s=~/\p{Word}/&&s=~/\p{Cf}/) rescue false}.map{it.to_s 16}
=> []
```

Maybe for preset classes like `\p{...}` and `[[:alpha:]]` we should only warn if one range completely subsumes another?

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116325

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges. Perhaps 



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124728] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (3 preceding siblings ...)
  2026-02-09  5:54 ` [ruby-core:124725] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-09  8:10 ` mame (Yusuke Endoh) via ruby-core
  2026-02-09 15:44 ` [ruby-core:124736] " trinistr (Alexander Bulancov) via ruby-core
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2026-02-09  8:10 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #21870 has been updated by mame (Yusuke Endoh).


jneen (Jeanine Adkisson) wrote in #note-7:
> That specific case also appears to have changed, e.g. on 3.4.1:

It is an intentional bug fix. See #21503.

While I understand your trouble, this warning is functioning exactly as intended. How do you suggest resolving it?


----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116328

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124736] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (4 preceding siblings ...)
  2026-02-09  8:10 ` [ruby-core:124728] " mame (Yusuke Endoh) via ruby-core
@ 2026-02-09 15:44 ` trinistr (Alexander Bulancov) via ruby-core
  2026-02-09 15:49 ` [ruby-core:124737] " kddnewton (Kevin Newton) via ruby-core
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: trinistr (Alexander Bulancov) via ruby-core @ 2026-02-09 15:44 UTC (permalink / raw)
  To: ruby-core; +Cc: trinistr (Alexander Bulancov)

Issue #21870 has been updated by trinistr (Alexander Bulancov).


> Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Have you tried a non-capturing group? `/(?:\p{Word}|\p{S})/` should have better performance.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116337

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124737] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (5 preceding siblings ...)
  2026-02-09 15:44 ` [ruby-core:124736] " trinistr (Alexander Bulancov) via ruby-core
@ 2026-02-09 15:49 ` kddnewton (Kevin Newton) via ruby-core
  2026-02-09 17:42 ` [ruby-core:124739] " jneen (Jeanine Adkisson) via ruby-core
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: kddnewton (Kevin Newton) via ruby-core @ 2026-02-09 15:49 UTC (permalink / raw)
  To: ruby-core; +Cc: kddnewton (Kevin Newton)

Issue #21870 has been updated by kddnewton (Kevin Newton).


This might be a good opportunity to add the `||` operator from the Unicode spec (https://www.unicode.org/reports/tr18/#Subtraction_and_Intersection. We could make that one not warn, because it's explicitly desired. As in:

```ruby
$VERBOSE = true
regex = /[\p{Word}\p{S}]/ # warning
regex = /[\p{Word}||\p{S}]/ # no warning
```

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116338

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124739] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (6 preceding siblings ...)
  2026-02-09 15:49 ` [ruby-core:124737] " kddnewton (Kevin Newton) via ruby-core
@ 2026-02-09 17:42 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-10  4:58 ` [ruby-core:124750] " maxfelsher (Max Felsher) via ruby-core
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-09 17:42 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


trinistr (Alexander Bulancov) wrote in #note-11:
> > Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.
> 
> Have you tried a non-capturing group? `/(?:\p{Word}|\p{S})/` should have better performance.

This is what I actually tested. Still much slower.

mame (Yusuke Endoh) wrote in #note-9:
> jneen (Jeanine Adkisson) wrote in #note-7:
> > That specific case also appears to have changed, e.g. on 3.4.1:
> 
> It is an intentional bug fix. See #21503.
> 
> While I understand your trouble, this warning is functioning exactly as intended. How do you suggest resolving it?


I suppose the question is - what is the purpose of a warning here? What fix are you asking the code author to implement? If my downstream users are running with warnings on and Ruby prints 1000 lines of warnings loading my library, what exactly am I being warned about?

Is there a specific danger to using overlapping character classes? Or should this kind of thing live in a linter like Rubocop?

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116340

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124750] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (7 preceding siblings ...)
  2026-02-09 17:42 ` [ruby-core:124739] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-10  4:58 ` maxfelsher (Max Felsher) via ruby-core
  2026-02-10 13:15 ` [ruby-core:124761] " jneen (Jeanine Adkisson) via ruby-core
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: maxfelsher (Max Felsher) via ruby-core @ 2026-02-10  4:58 UTC (permalink / raw)
  To: ruby-core; +Cc: maxfelsher (Max Felsher)

Issue #21870 has been updated by maxfelsher (Max Felsher).


If I'm reading the history right, the warning was added in #1831 in order to catch mistakes like a regexp defined as `/[:lower:]/` (as opposed to `/[[:lower:]]/`, I assume). I can see the value in that, but it does seem like there should be a way to list overlapping character classes without a warning (and without turning warnings off completely).

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116352

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124761] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (8 preceding siblings ...)
  2026-02-10  4:58 ` [ruby-core:124750] " maxfelsher (Max Felsher) via ruby-core
@ 2026-02-10 13:15 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-10 15:32 ` [ruby-core:124764] " jneen (Jeanine Adkisson) via ruby-core
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-10 13:15 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


That's a very interesting find!

I do think it makes sense to warn if an explicitly written character repeats in a character class, or if the class begins and ends with a colon. But for overlapping unicode properties, there doesn't seem to be any danger in including both in a character class.

That said, there's still an argument that all of this is a job for a linter. Rubocop didn't exist until about a year after #1831 was opened.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116368

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124764] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (9 preceding siblings ...)
  2026-02-10 13:15 ` [ruby-core:124761] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-10 15:32 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-17 20:27 ` [ruby-core:124846] " jneen (Jeanine Adkisson) via ruby-core
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-10 15:32 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


Some benchmarks:

```console
$ ruby --version
ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
```

```ruby
require 'benchmark'

LENGTH = 1000000
REPEAT = 100
TEST_STR = 'a' * LENGTH

Benchmark.bm do |bm|
  bm.report "char class:" do
    REPEAT.times { /[\p{Word}\p{S}]*/o.match?(TEST_STR) }
  end

  bm.report "alternation:" do
    REPEAT.times { /(?:\p{Word}|\p{S})*/o.match?(TEST_STR) }
  end
end
```

output:
```
                  user     system      total        real
char class:   0.634908   0.302112   0.937020 (  0.937089)
alternation:  0.983069   0.449849   1.432918 (  1.433005)
```

The alternation syntax is understandably a bit slower, as it would be two nodes in the state machine rather than one unified range test. I expect this effect would be worse when more unicode properties are piled on (as they tend to be in practice), resulting in extra nodes.

Either way, `/[\p{Word}\p{S}]/` is a perfectly valid regular expression that as far as I know doesn't have any practical issues, so I don't think it is helpful to warn.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116371

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid ~~and non-overlapping~~ set of unicode properties, but I am still being spammed with warnings. Using `/(\p{Word}|\p{S})/` is kind of a workaround, but it is slower.

Edit: They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124846] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (10 preceding siblings ...)
  2026-02-10 15:32 ` [ruby-core:124764] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-17 20:27 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-24  5:41 ` [ruby-core:124875] " jneen (Jeanine Adkisson) via ruby-core
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-17 20:27 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


This isn't even possible to work around by targeting RUBY_VERSION, as Ruby warns even in unreachable cases:

```ruby
regex = if RUBY_VERSION < '4'
  /[\p{Word}\p{Cf}]/
else
  /[\p{Word}]/
end
```

still warns on Ruby 4+, even though the code is not reachable in that version.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116499

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using `/(?:\p{Word}|\p{S})/` is kind of a workaround, but it is slower (see benchmarks below), and also less clear.

They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.

For a similar example, consider `/[\p{Word}\p{Cf}]/`, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.

This warning was introduced back in 2009 with #1831, to help surface instances of things like `/[:lower:]/` instead of `/[[:lower:]]/`, but even then the reporter suggested only warning if the class both begins and ends with `:`.

Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124875] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (11 preceding siblings ...)
  2026-02-17 20:27 ` [ruby-core:124846] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-24  5:41 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-24  5:52 ` [ruby-core:124876] " jneen (Jeanine Adkisson) via ruby-core
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-24  5:41 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


Having looked through the onigmo code a bit now, I can think of a few ways forward.

**a) Simply don't warn on overlapping ctype classes.**

I believe this would only involve removing the check on line 1860 from regparse.c. This would preserve a warning for `/[:foo:]/`, as in #1831, as well as maybe rarer situations like `/[a-fb-g]/`. It would *not* warn on cases like `/[a-z\p{Word}]/` or `/[\p{Alnum}\p{Word}]/`. Whether this is a common enough mistake to warrant a warning I'm not entirely sure. I will also check the performance characteristics of these, in case overlapping ranges is a performance issue (which I doubt, but I think it is best to check).

**b) Find a way to check if a character class or range completely subsumes another.**

I honestly am not sure how I would go about implementing this, as it is a much deeper check which would require a greater understanding of onigmo internals than I have so far. The idea would be to warn on `/[a-z\p{Word}]/` but *not* on e.g. `/[_-z\p{Word}]`, since the range `_-z` contains a character not matched by `\p{Word}`. This would also catch `/[\p{Alnum}\p{Word}]/`.

**c) Rethink the overlapping character warning entirely, and (maybe) more specifically target things like `/[:x:]/`.**

This would involve warning only if the first and last character of a char class are literal `:`. Similar to (a), it may turn out that repeated characters in classes are not a performance or correctness issue it is worth warning about at all. But this is a judgment I leave to the team.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116534

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using `/(?:\p{Word}|\p{S})/` is kind of a workaround, but it is slower (see benchmarks below), and also less clear.

They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.

For a similar example, consider `/[\p{Word}\p{Cf}]/`, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.

This warning was introduced back in 2009 with #1831, to help surface instances of things like `/[:lower:]/` instead of `/[[:lower:]]/`, but even then the reporter suggested only warning if the class both begins and ends with `:`.

Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124876] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (12 preceding siblings ...)
  2026-02-24  5:41 ` [ruby-core:124875] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-24  5:52 ` jneen (Jeanine Adkisson) via ruby-core
  2026-02-25  8:23 ` [ruby-core:124881] " duerst via ruby-core
  2026-03-05  4:03 ` [ruby-core:124927] " jneen (Jeanine Adkisson) via ruby-core
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-02-24  5:52 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


A quick benchmark shows we are within error bars for matching performance:

```ruby
#!/usr/bin/env ruby

require 'benchmark'

NON_REPEAT = Regexp.new("[" + ("a-z" * 1) + "]")
YES_REPEAT = Regexp.new("[" + ("a-z" * 100000) + "]")

Benchmark.bm do |bm|
  bm.report('non-repeat') { 1000000.times { NON_REPEAT.match?('a') } }
  bm.report('yes-repeat') { 1000000.times { YES_REPEAT.match?('a') } }
end
```

Output:
```
; ruby /tmp/regex-test 
                user     system      total        real
non-repeat  0.105758   0.000233   0.105991 (  0.106004)
yes-repeat  0.103658   0.000223   0.103881 (  0.103881)
```

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116535

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using `/(?:\p{Word}|\p{S})/` is kind of a workaround, but it is slower (see benchmarks below), and also less clear.

They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.

For a similar example, consider `/[\p{Word}\p{Cf}]/`, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.

This warning was introduced back in 2009 with #1831, to help surface instances of things like `/[:lower:]/` instead of `/[[:lower:]]/`, but even then the reporter suggested only warning if the class both begins and ends with `:`.

Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124881] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (13 preceding siblings ...)
  2026-02-24  5:52 ` [ruby-core:124876] " jneen (Jeanine Adkisson) via ruby-core
@ 2026-02-25  8:23 ` duerst via ruby-core
  2026-03-05  4:03 ` [ruby-core:124927] " jneen (Jeanine Adkisson) via ruby-core
  15 siblings, 0 replies; 17+ messages in thread
From: duerst via ruby-core @ 2026-02-25  8:23 UTC (permalink / raw)
  To: ruby-core; +Cc: duerst

Issue #21870 has been updated by duerst (Martin Dürst).


Using two or more overlapping Unicode properties may not be very frequent, but in most cases isn't a mistake. If a user writes `/[\p{Word}\p{S}]/`, that expression should just match all word characters and all symbol characters, because that's most probably what the user wanted. The fact that there are some characters that are both word characters and symbol characters is irrelevant for that query, and should not produce a warning. There are many overlapping Unicode properties, because Unicode properties identify different aspects of characters (e.g. script, block, age, numeric properties,...). If we want to continue to warn about `/[:lower:]/`, that's fine, but we should warn about that specific case, not overlapping properties in general.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116543

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using `/(?:\p{Word}|\p{S})/` is kind of a workaround, but it is slower (see benchmarks below), and also less clear.

They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.

For a similar example, consider `/[\p{Word}\p{Cf}]/`, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.

This warning was introduced back in 2009 with #1831, to help surface instances of things like `/[:lower:]/` instead of `/[[:lower:]]/`, but even then the reporter suggested only warning if the class both begins and ends with `:`.

Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [ruby-core:124927] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping \p{...} classes
  2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
                   ` (14 preceding siblings ...)
  2026-02-25  8:23 ` [ruby-core:124881] " duerst via ruby-core
@ 2026-03-05  4:03 ` jneen (Jeanine Adkisson) via ruby-core
  15 siblings, 0 replies; 17+ messages in thread
From: jneen (Jeanine Adkisson) via ruby-core @ 2026-03-05  4:03 UTC (permalink / raw)
  To: ruby-core; +Cc: jneen (Jeanine Adkisson)

Issue #21870 has been updated by jneen (Jeanine Adkisson).


If there are no objections, I'll submit a patch with strategy (a) next week. It's straightforward to implement and maintains the closest to the current behaviour as possible while fixing the issue.

----------------------------------------
Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes
https://bugs.ruby-lang.org/issues/21870#change-116587

* Author: jneen (Jeanine Adkisson)
* Status: Open
* ruby -v: 4.0.1
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
```ruby
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
```

As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using `/(?:\p{Word}|\p{S})/` is kind of a workaround, but it is slower (see benchmarks below), and also less clear.

They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.

For a similar example, consider `/[\p{Word}\p{Cf}]/`, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.

This warning was introduced back in 2009 with #1831, to help surface instances of things like `/[:lower:]/` instead of `/[[:lower:]]/`, but even then the reporter suggested only warning if the class both begins and ends with `:`.

Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-03-05  4:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-08  6:41 [ruby-core:124714] [Ruby Bug#21870] Regexp: Warnings when using multiple non-overlapping \p{...} classes jneen (Jeanine Adkisson) via ruby-core
2026-02-08 19:00 ` [ruby-core:124718] " tompng (tomoya ishida) via ruby-core
2026-02-08 19:03 ` [ruby-core:124719] " jneen (Jeanine Adkisson) via ruby-core
2026-02-09  5:50 ` [ruby-core:124724] [Ruby Bug#21870] Regexp: Warnings when using slightly overlapping " jneen (Jeanine Adkisson) via ruby-core
2026-02-09  5:54 ` [ruby-core:124725] " jneen (Jeanine Adkisson) via ruby-core
2026-02-09  8:10 ` [ruby-core:124728] " mame (Yusuke Endoh) via ruby-core
2026-02-09 15:44 ` [ruby-core:124736] " trinistr (Alexander Bulancov) via ruby-core
2026-02-09 15:49 ` [ruby-core:124737] " kddnewton (Kevin Newton) via ruby-core
2026-02-09 17:42 ` [ruby-core:124739] " jneen (Jeanine Adkisson) via ruby-core
2026-02-10  4:58 ` [ruby-core:124750] " maxfelsher (Max Felsher) via ruby-core
2026-02-10 13:15 ` [ruby-core:124761] " jneen (Jeanine Adkisson) via ruby-core
2026-02-10 15:32 ` [ruby-core:124764] " jneen (Jeanine Adkisson) via ruby-core
2026-02-17 20:27 ` [ruby-core:124846] " jneen (Jeanine Adkisson) via ruby-core
2026-02-24  5:41 ` [ruby-core:124875] " jneen (Jeanine Adkisson) via ruby-core
2026-02-24  5:52 ` [ruby-core:124876] " jneen (Jeanine Adkisson) via ruby-core
2026-02-25  8:23 ` [ruby-core:124881] " duerst via ruby-core
2026-03-05  4:03 ` [ruby-core:124927] " jneen (Jeanine Adkisson) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).