* [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
@ 2025-07-07 18:02 procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-07 18:39 ` [ruby-core:122666] " procmarco (Marco Concetto Rudilosso) via ruby-core
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: procmarco (Marco Concetto Rudilosso) via ruby-core @ 2025-07-07 18:02 UTC (permalink / raw)
To: ruby-core; +Cc: procmarco (Marco Concetto Rudilosso)
Issue #21503 has been reported by procmarco (Marco Concetto Rudilosso).
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Open
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:122666] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
@ 2025-07-07 18:39 ` procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-08 2:12 ` [ruby-core:122669] " mame (Yusuke Endoh) via ruby-core
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: procmarco (Marco Concetto Rudilosso) via ruby-core @ 2025-07-07 18:39 UTC (permalink / raw)
To: ruby-core; +Cc: procmarco (Marco Concetto Rudilosso)
Issue #21503 has been updated by procmarco (Marco Concetto Rudilosso).
What I mean is that the current implementation of `\p{Word}` does not seem to match `\p{Join_Control}` even though it should and it also says so in the docs
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-113944
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Open
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:122669] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-07 18:39 ` [ruby-core:122666] " procmarco (Marco Concetto Rudilosso) via ruby-core
@ 2025-07-08 2:12 ` mame (Yusuke Endoh) via ruby-core
2025-07-10 9:39 ` [ruby-core:122718] " naruse (Yui NARUSE) via ruby-core
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-07-08 2:12 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21503 has been updated by mame (Yusuke Endoh).
There is already a PR for that: https://github.com/ruby/ruby/pull/7711
Can you take a look? @duerst @naruse
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-113949
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Open
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:122718] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-07 18:39 ` [ruby-core:122666] " procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-08 2:12 ` [ruby-core:122669] " mame (Yusuke Endoh) via ruby-core
@ 2025-07-10 9:39 ` naruse (Yui NARUSE) via ruby-core
2025-07-11 0:26 ` [ruby-core:122720] " hsbt (Hiroshi SHIBATA) via ruby-core
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: naruse (Yui NARUSE) via ruby-core @ 2025-07-10 9:39 UTC (permalink / raw)
To: ruby-core; +Cc: naruse (Yui NARUSE)
Issue #21503 has been updated by naruse (Yui NARUSE).
It looks `\p{Word}` is updated in TR#18 Version 15.
https://www.unicode.org/reports/tr18/tr18-15.html
The fix looks good.
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-113995
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Open
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:122720] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
` (2 preceding siblings ...)
2025-07-10 9:39 ` [ruby-core:122718] " naruse (Yui NARUSE) via ruby-core
@ 2025-07-11 0:26 ` hsbt (Hiroshi SHIBATA) via ruby-core
2025-07-14 21:57 ` [ruby-core:122772] " k0kubun (Takashi Kokubun) via ruby-core
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: hsbt (Hiroshi SHIBATA) via ruby-core @ 2025-07-11 0:26 UTC (permalink / raw)
To: ruby-core; +Cc: hsbt (Hiroshi SHIBATA)
Issue #21503 has been updated by hsbt (Hiroshi SHIBATA).
Status changed from Open to Closed
Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
https://github.com/ruby/ruby/pull/7711 has been merged
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-113999
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Closed
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:122772] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
` (3 preceding siblings ...)
2025-07-11 0:26 ` [ruby-core:122720] " hsbt (Hiroshi SHIBATA) via ruby-core
@ 2025-07-14 21:57 ` k0kubun (Takashi Kokubun) via ruby-core
2025-08-21 9:43 ` [ruby-core:123015] " naruse (Yui NARUSE) via ruby-core
2025-08-27 22:29 ` [ruby-core:123096] " alanwu (Alan Wu) via ruby-core
6 siblings, 0 replies; 8+ messages in thread
From: k0kubun (Takashi Kokubun) via ruby-core @ 2025-07-14 21:57 UTC (permalink / raw)
To: ruby-core; +Cc: k0kubun (Takashi Kokubun)
Issue #21503 has been updated by k0kubun (Takashi Kokubun).
The patch to master modified `CR_Word` on `enc/unicode/16.0.0/name2ctype.h`, but Ruby 3.4 uses `enc/unicode/15.0.0/name2ctype.h` that has a different content in `CR_Word`. I'm not sure how to backport this properly. Could @procmarco or anybody else have a look at making a backport PR to ruby_3_4 branch on GitHub?
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-114051
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Closed
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123015] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
` (4 preceding siblings ...)
2025-07-14 21:57 ` [ruby-core:122772] " k0kubun (Takashi Kokubun) via ruby-core
@ 2025-08-21 9:43 ` naruse (Yui NARUSE) via ruby-core
2025-08-27 22:29 ` [ruby-core:123096] " alanwu (Alan Wu) via ruby-core
6 siblings, 0 replies; 8+ messages in thread
From: naruse (Yui NARUSE) via ruby-core @ 2025-08-21 9:43 UTC (permalink / raw)
To: ruby-core; +Cc: naruse (Yui NARUSE)
Issue #21503 has been updated by naruse (Yui NARUSE).
@k0kubun `name2ctype.h` is generated by `tool/enc-unicode.rb`. You can run it on ruby_3_4.
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-114319
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Closed
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123096] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
` (5 preceding siblings ...)
2025-08-21 9:43 ` [ruby-core:123015] " naruse (Yui NARUSE) via ruby-core
@ 2025-08-27 22:29 ` alanwu (Alan Wu) via ruby-core
6 siblings, 0 replies; 8+ messages in thread
From: alanwu (Alan Wu) via ruby-core @ 2025-08-27 22:29 UTC (permalink / raw)
To: ruby-core; +Cc: alanwu (Alan Wu)
Issue #21503 has been updated by alanwu (Alan Wu).
Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: DONE
Backport for 3.4 done in commit:5a42d267bfabc86f86cae2e83de24b1b86bc316a
----------------------------------------
Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it does
https://bugs.ruby-lang.org/issues/21503#change-114410
* Author: procmarco (Marco Concetto Rudilosso)
* Status: Closed
* ruby -v: 3.4.4
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: DONE
----------------------------------------
in the [docs](https://ruby-doc.org/3.4.1/Regexp.html#:~:text=/%5Cp%7B-,Word,-%7D/%3A%20A%20member) it is mentioned that `\p{Word}` matches the equivalent of: `[\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}]` as it's also defined in the [unicode spec](https://unicode.org/reports/tr18/#word)
the issue is that it does not seem to be the case
```
irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> ""
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"
```
There's 2 solutions here, either we change the docs or the code.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-27 22:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-07 18:02 [ruby-core:122665] [Ruby Bug#21503] \p{Word} does not match on \p{Join_Control} while docs say it does procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-07 18:39 ` [ruby-core:122666] " procmarco (Marco Concetto Rudilosso) via ruby-core
2025-07-08 2:12 ` [ruby-core:122669] " mame (Yusuke Endoh) via ruby-core
2025-07-10 9:39 ` [ruby-core:122718] " naruse (Yui NARUSE) via ruby-core
2025-07-11 0:26 ` [ruby-core:122720] " hsbt (Hiroshi SHIBATA) via ruby-core
2025-07-14 21:57 ` [ruby-core:122772] " k0kubun (Takashi Kokubun) via ruby-core
2025-08-21 9:43 ` [ruby-core:123015] " naruse (Yui NARUSE) via ruby-core
2025-08-27 22:29 ` [ruby-core:123096] " alanwu (Alan Wu) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).