ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
@ 2025-08-31 12:52 tompng (tomoya ishida) via ruby-core
  2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: tompng (tomoya ishida) via ruby-core @ 2025-08-31 12:52 UTC (permalink / raw)
  To: ruby-core; +Cc: tompng (tomoya ishida)

Issue #21559 has been reported by tompng (tomoya ishida).

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559

* Author: tompng (tomoya ishida)
* Status: Open
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123147] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
@ 2025-08-31 21:07 ` nobu (Nobuyoshi Nakada) via ruby-core
  2025-09-01  0:50 ` [ruby-core:123148] " ima1zumi (Mari Imaizumi) via ruby-core
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-08-31 21:07 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #21559 has been updated by nobu (Nobuyoshi Nakada).


```ruby
"s\u{11930 323 11930 307}".unicode_normalize(:nfc).dump #=> "\u1E69\u{11930}\u{11930}"
"s\u{323 307}".unicode_normalize(:nfc).dump  #=> "\u1E69"
```

Are U+0323 and U+0307 composed to `s` jumping over U+11930?

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114479

* Author: tompng (tomoya ishida)
* Status: Open
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123148] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
  2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2025-09-01  0:50 ` ima1zumi (Mari Imaizumi) via ruby-core
  2025-09-01  4:09 ` [ruby-core:123154] " duerst via ruby-core
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ima1zumi (Mari Imaizumi) via ruby-core @ 2025-09-01  0:50 UTC (permalink / raw)
  To: ruby-core; +Cc: ima1zumi (Mari Imaizumi)

Issue #21559 has been updated by ima1zumi (Mari Imaizumi).

Assignee set to ima1zumi (Mari Imaizumi)

This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals
It seems the NFC process is combining characters across U+11930, even though its CCC is 0.

CC: @duerst 

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114480

* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: ima1zumi (Mari Imaizumi)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123154] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
  2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
  2025-09-01  0:50 ` [ruby-core:123148] " ima1zumi (Mari Imaizumi) via ruby-core
@ 2025-09-01  4:09 ` duerst via ruby-core
  2025-09-01 23:50 ` [ruby-core:123160] " ima1zumi (Mari Imaizumi) via ruby-core
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-09-01  4:09 UTC (permalink / raw)
  To: ruby-core; +Cc: duerst

Issue #21559 has been updated by duerst (Martin Dürst).

Assignee changed from ima1zumi (Mari Imaizumi) to duerst (Martin Dürst)

@ima1zumi Not sure this is even allowed, but I'm sure I'm responsible for this behavior, and want to fix it myself, so I change the Assignee to myself.

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114486

* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123160] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
                   ` (2 preceding siblings ...)
  2025-09-01  4:09 ` [ruby-core:123154] " duerst via ruby-core
@ 2025-09-01 23:50 ` ima1zumi (Mari Imaizumi) via ruby-core
  2025-11-02  2:06 ` [ruby-core:123639] " duerst via ruby-core
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: ima1zumi (Mari Imaizumi) via ruby-core @ 2025-09-01 23:50 UTC (permalink / raw)
  To: ruby-core; +Cc: ima1zumi (Mari Imaizumi)

Issue #21559 has been updated by ima1zumi (Mari Imaizumi).


@duerst Thank you, I appreciate you taking care of it.

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-114496

* Author: tompng (tomoya ishida)
* Status: Open
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123639] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
                   ` (3 preceding siblings ...)
  2025-09-01 23:50 ` [ruby-core:123160] " ima1zumi (Mari Imaizumi) via ruby-core
@ 2025-11-02  2:06 ` duerst via ruby-core
  2025-11-02  2:10 ` [ruby-core:123640] " duerst via ruby-core
  2025-11-03  0:44 ` [ruby-core:123656] " duerst via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-11-02  2:06 UTC (permalink / raw)
  To: ruby-core; +Cc: duerst

Issue #21559 has been updated by duerst (Martin Dürst).

Status changed from Open to Closed

Added regression test at https://github.com/ruby/ruby/commit/a122d7a58e91ed6cd531e906cb398688d7cc8b17
and fix at https://github.com/ruby/ruby/commit/e4c8e3544237b8c0efba6b945173dc66552d641c.
Many thanks to Tomoya Ishida for finding this bug.

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-115023

* Author: tompng (tomoya ishida)
* Status: Closed
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123640] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
                   ` (4 preceding siblings ...)
  2025-11-02  2:06 ` [ruby-core:123639] " duerst via ruby-core
@ 2025-11-02  2:10 ` duerst via ruby-core
  2025-11-03  0:44 ` [ruby-core:123656] " duerst via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-11-02  2:10 UTC (permalink / raw)
  To: ruby-core; +Cc: duerst

Issue #21559 has been updated by duerst (Martin Dürst).

Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED

Backport would only be needed if the upgrade to Unicode 16.0.0 (see https://bugs.ruby-lang.org/issues/20724) is backported.

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-115024

* Author: tompng (tomoya ishida)
* Status: Closed
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:123656] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible
  2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
                   ` (5 preceding siblings ...)
  2025-11-02  2:10 ` [ruby-core:123640] " duerst via ruby-core
@ 2025-11-03  0:44 ` duerst via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: duerst via ruby-core @ 2025-11-03  0:44 UTC (permalink / raw)
  To: ruby-core; +Cc: duerst

Issue #21559 has been updated by duerst (Martin Dürst).


Note to potential backporters: https://github.com/ruby/ruby/commit/bd51b20c50 should also be backported.

----------------------------------------
Bug #21559: Unicode normalization nfd -> nfc -> nfd is not reversible
https://bugs.ruby-lang.org/issues/21559#change-115038

* Author: tompng (tomoya ishida)
* Status: Closed
* Assignee: duerst (Martin Dürst)
* Backport: 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED
----------------------------------------
I expect `nfd(nfc(str)) == nfd(str)` but found a string that doesn't.

~~~ruby
# Ruby 3.1 - 3.5
str = "s\u{11930}\u{323}\u{11930}\u{307}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~

~~~ruby
# ruby 3.5.0dev
str = "s\u{1611e}\u{323}\u{1611e}\u{307}\u{1611f}"
p str.unicode_normalize(:nfd) == str.unicode_normalize(:nfc).unicode_normalize(:nfd)
#=> false
~~~




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-03  0:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-31 12:52 [ruby-core:123146] [Ruby Bug#21559] Unicode normalization nfd -> nfc -> nfd is not reversible tompng (tomoya ishida) via ruby-core
2025-08-31 21:07 ` [ruby-core:123147] " nobu (Nobuyoshi Nakada) via ruby-core
2025-09-01  0:50 ` [ruby-core:123148] " ima1zumi (Mari Imaizumi) via ruby-core
2025-09-01  4:09 ` [ruby-core:123154] " duerst via ruby-core
2025-09-01 23:50 ` [ruby-core:123160] " ima1zumi (Mari Imaizumi) via ruby-core
2025-11-02  2:06 ` [ruby-core:123639] " duerst via ruby-core
2025-11-02  2:10 ` [ruby-core:123640] " duerst via ruby-core
2025-11-03  0:44 ` [ruby-core:123656] " duerst via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).