* [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
@ 2025-08-31 12:52 tompng (tomoya ishida) via ruby-core
2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: tompng (tomoya ishida) via ruby-core @ 2025-08-31 12:52 UTC (permalink / raw)
To: ruby-core; +Cc: tompng (tomoya ishida)
Issue #21559 has been reported by tompng (tomoya ishida).
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559
* Author: tompng (tomoya ishida)
* Status: Open
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123147] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
@ 2025-08-31 21:07 ` nobu (Nobuyoshi Nakada) via ruby-core
2025-09-01 0:50 ` [ruby-core:123148] " ima1zumi (Mari Imaizumi) via ruby-core
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-08-31 21:07 UTC (permalink / raw)
To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)
Issue #21559 has been updated by nobu (Nobuyoshi Nakada).
```ruby
"s\u{11930 323 11930 307}".unicode_normalize(:nfc).dump #=> "\u1E69\u{11930}\u{11930}"
"s\u{323 307}".unicode_normalize(:nfc).dump #=> "\u1E69"
```
Are U+0323 and U+0307 composed to `s` jumping over U+11930?
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114479
* Author: tompng (tomoya ishida)
* Status: Open
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123148] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2025-09-01 0:50 ` ima1zumi (Mari Imaizumi) via ruby-core
2025-09-01 4:09 ` [ruby-core:123154] " duerst via ruby-core
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: ima1zumi (Mari Imaizumi) via ruby-core @ 2025-09-01 0:50 UTC (permalink / raw)
To: ruby-core; +Cc: ima1zumi (Mari Imaizumi)
Issue #21559 has been updated by ima1zumi (Mari Imaizumi).
Assignee set to ima1zumi (Mari Imaizumi)
This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals
It seems the NFC process is combining characters across U+11930, even though its CCC is 0.
CC: @duerst
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114480
* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: ima1zumi (Mari Imaizumi)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123154] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
2025-09-01 0:50 ` [ruby-core:123148] " ima1zumi (Mari Imaizumi) via ruby-core
@ 2025-09-01 4:09 ` duerst via ruby-core
2025-09-01 23:50 ` [ruby-core:123160] " ima1zumi (Mari Imaizumi) via ruby-core
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-09-01 4:09 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #21559 has been updated by duerst (Martin Dürst).
Assignee changed from ima1zumi (Mari Imaizumi) to duerst (Martin Dürst)
@ima1zumi Not sure this is even allowed, but I'm sure I'm responsible for this behavior, and want to fix it myself, so I change the Assignee to myself.
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114486
* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123160] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
` (2 preceding siblings ...)
2025-09-01 4:09 ` [ruby-core:123154] " duerst via ruby-core
@ 2025-09-01 23:50 ` ima1zumi (Mari Imaizumi) via ruby-core
2025-11-02 2:06 ` [ruby-core:123639] " duerst via ruby-core
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: ima1zumi (Mari Imaizumi) via ruby-core @ 2025-09-01 23:50 UTC (permalink / raw)
To: ruby-core; +Cc: ima1zumi (Mari Imaizumi)
Issue #21559 has been updated by ima1zumi (Mari Imaizumi).
@duerst Thank you, I appreciate you taking care of it.
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114496
* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123639] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
` (3 preceding siblings ...)
2025-09-01 23:50 ` [ruby-core:123160] " ima1zumi (Mari Imaizumi) via ruby-core
@ 2025-11-02 2:06 ` duerst via ruby-core
2025-11-02 2:10 ` [ruby-core:123640] " duerst via ruby-core
2025-11-03 0:44 ` [ruby-core:123656] " duerst via ruby-core
6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-11-02 2:06 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #21559 has been updated by duerst (Martin Dürst).
Status changed from Open to Closed
Added regression test at https://github.com/ruby/ruby/commit/a122d7a58e91ed6cd531e906cb398688d7cc8b17
and fix at https://github.com/ruby/ruby/commit/e4c8e3544237b8c0efba6b945173dc66552d641c.
Many thanks to Tomoya Ishida for finding this bug.
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-115023
* Author: tompng (tomoya ishida)
* Status: Closed
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123640] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
` (4 preceding siblings ...)
2025-11-02 2:06 ` [ruby-core:123639] " duerst via ruby-core
@ 2025-11-02 2:10 ` duerst via ruby-core
2025-11-03 0:44 ` [ruby-core:123656] " duerst via ruby-core
6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-11-02 2:10 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #21559 has been updated by duerst (Martin Dürst).
Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED
Backport would only be needed if the upgrade to Unicode 16.0.0 (see https://bugs.ruby-lang.org/issues/20724) is backported.
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-115024
* Author: tompng (tomoya ishida)
* Status: Closed
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:123656] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
` (5 preceding siblings ...)
2025-11-02 2:10 ` [ruby-core:123640] " duerst via ruby-core
@ 2025-11-03 0:44 ` duerst via ruby-core
6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-11-03 0:44 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #21559 has been updated by duerst (Martin Dürst).
Note to potential backporters: https://github.com/ruby/ruby/commit/bd51b20c50 should also be backported.
----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-115038
* Author: tompng (tomoya ishida)
* Status: Closed
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.
~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-03 0:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
2025-09-01 0:50 ` [ruby-core:123148] " ima1zumi (Mari Imaizumi) via ruby-core
2025-09-01 4:09 ` [ruby-core:123154] " duerst via ruby-core
2025-09-01 23:50 ` [ruby-core:123160] " ima1zumi (Mari Imaizumi) via ruby-core
2025-11-02 2:06 ` [ruby-core:123639] " duerst via ruby-core
2025-11-02 2:10 ` [ruby-core:123640] " duerst via ruby-core
2025-11-03 0:44 ` [ruby-core:123656] " duerst via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).