* [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1
@ 2023-10-02 6:55 nobu (Nobuyoshi Nakada) via ruby-core
2023-10-02 14:06 ` [ruby-core:114939] " Игорь Пятчиц via ruby-core
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2023-10-02 6:55 UTC (permalink / raw)
To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)
Issue #19908 has been reported by nobu (Nobuyoshi Nakada).
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* Target version: 3.3
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:114939] Re: [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
@ 2023-10-02 14:06 ` Игорь Пятчиц via ruby-core
2023-12-26 6:52 ` [ruby-core:115899] " duerst via ruby-core
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Игорь Пятчиц via ruby-core @ 2023-10-02 14:06 UTC (permalink / raw)
To: Ruby developers
Cc: Игорь
Пятчиц
[-- Attachment #1.1: Type: text/plain, Size: 1252 bytes --]
🤘👍
пн, 2 окт. 2023 г. в 12:55, nobu (Nobuyoshi Nakada) via ruby-core <
ruby-core@ml.ruby-lang.org>:
> Issue #19908 has been reported by nobu (Nobuyoshi Nakada).
>
>
>
> ----------------------------------------
>
> Feature #19908: Update to Unicode 15.1
>
> https://bugs.ruby-lang.org/issues/19908
>
>
>
> * Author: nobu (Nobuyoshi Nakada)
>
> * Status: Assigned
>
> * Priority: Normal
>
> * Assignee: duerst (Martin Dürst)
>
> * Target version: 3.3
>
> ----------------------------------------
>
> The Unicode 15.1 is released.
>
>
>
> The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break`
> properties with values.
>
>
>
> I'm not sure how these properties should be handled well.
>
> `/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
>
> https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
>
>
>
>
>
>
>
> --
>
> https://bugs.ruby-lang.org/
>
> ______________________________________________
> ruby-core mailing list -- ruby-core@ml.ruby-lang.org
> To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
> ruby-core info --
> https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
>
[-- Attachment #1.2: Type: text/html, Size: 2186 bytes --]
[-- Attachment #2: Type: text/plain, Size: 264 bytes --]
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:115899] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
2023-10-02 14:06 ` [ruby-core:114939] " Игорь Пятчиц via ruby-core
@ 2023-12-26 6:52 ` duerst via ruby-core
2023-12-26 11:42 ` [ruby-core:115906] " duerst via ruby-core
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: duerst via ruby-core @ 2023-12-26 6:52 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #19908 has been updated by duerst (Martin Dürst).
There is a serious issue than just whether using an '_' or an '=' in the property: Unicode 15.1 makes some serious changes to grapheme clusters.
Our implementation (function 'node_extended_grapheme_cluster' in regparse.c) is based on Unicode 11.0, in particular https://www.unicode.org/reports/tr29/tr29-33.html#Grapheme_Cluster_Boundaries. This is quite a bit different from the current version at https://www.unicode.org/reports/tr29/tr29-43.html#Grapheme_Cluster_Boundaries. One major difference is that for Unicode 11.0, there was a regular expression for grapheme clusters, which I just implemented in the above function. Unicode 15.1 just says that it's possible to use a regular expression, but doesn't give this regular expression.
From reading through https://www.unicode.org/versions/Unicode15.1.0/#Migration, that's the main issue affecting Ruby.
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-105854
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:115906] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
2023-10-02 14:06 ` [ruby-core:114939] " Игорь Пятчиц via ruby-core
2023-12-26 6:52 ` [ruby-core:115899] " duerst via ruby-core
@ 2023-12-26 11:42 ` duerst via ruby-core
2024-01-06 21:28 ` [ruby-core:116056] " janosch-x via ruby-core
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: duerst via ruby-core @ 2023-12-26 11:42 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #19908 has been updated by duerst (Martin Dürst).
@nobu:
We have `Grapheme_Cluster_Break=...`、so I think '=' may be appropriate. But `Grapheme_Cluster_Break=...` uses a long, explicit name. So shouldn't it be `Indic_Cluster_Break=...`, not just `InCB=...`?
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-105861
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:116056] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
` (2 preceding siblings ...)
2023-12-26 11:42 ` [ruby-core:115906] " duerst via ruby-core
@ 2024-01-06 21:28 ` janosch-x via ruby-core
2024-01-09 1:25 ` [ruby-core:116099] " duerst via ruby-core
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: janosch-x via ruby-core @ 2024-01-06 21:28 UTC (permalink / raw)
To: ruby-core; +Cc: janosch-x
Issue #19908 has been updated by janosch-x (Janosch Müller).
Is not [this](https://www.unicode.org/reports/tr29/tr29-43.html#Regex_Definitions) the updated regular expression?
```diff
ccs-base := [\p{L}\p{N}\p{P}\p{S}\p{Zs}]
ccs-extend := [\p{M}\p{Join_Control}]
extended_base := ccs-base
| hangul-syllable
-crlf := CR LF
+crlf := CR LF | CR | LF
legacy-core := hangul-syllable
| ri-sequence
| xpicto-sequence
legacy-postcore := [Extend ZWJ]
core := hangul-syllable
| ri-sequence
| xpicto-sequence
+| conjunctCluster
| [^Control CR LF]
postcore := [Extend ZWJ SpacingMark]
precore := Prepend
hangul-syllable := L* (V+ | LV V* | LVT) T*
| L+
| T+
xpicto-sequence := \p{Extended_Pictographic} (Extend* ZWJ \p{Extended_Pictographic})*
+conjunctCluster := \p{InCB=Consonant} ([\p{InCB=Extend} \p{InCB=Linker}]* \p{InCB=Linker} [\p{InCB=Extend} \p{InCB=Linker}]* \p{InCB=Consonant})+
```
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-106054
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:116099] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
` (3 preceding siblings ...)
2024-01-06 21:28 ` [ruby-core:116056] " janosch-x via ruby-core
@ 2024-01-09 1:25 ` duerst via ruby-core
2024-09-12 1:56 ` [ruby-core:119128] " hsbt (Hiroshi SHIBATA) via ruby-core
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: duerst via ruby-core @ 2024-01-09 1:25 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #19908 has been updated by duerst (Martin Dürst).
@janosch-x You are correct, thanks! I noticed it a few days ago, but didn't yet get around to write about that here. You beat me to that!
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-106096
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:119128] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
` (4 preceding siblings ...)
2024-01-09 1:25 ` [ruby-core:116099] " duerst via ruby-core
@ 2024-09-12 1:56 ` hsbt (Hiroshi SHIBATA) via ruby-core
2024-09-12 3:21 ` [ruby-core:119130] " duerst via ruby-core
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hsbt (Hiroshi SHIBATA) via ruby-core @ 2024-09-12 1:56 UTC (permalink / raw)
To: ruby-core; +Cc: hsbt (Hiroshi SHIBATA)
Issue #19908 has been updated by hsbt (Hiroshi SHIBATA).
Unicode 16.0 has been released.
https://www.unicode.org/versions/Unicode16.0.0/
Should we move this instead of 15.1?
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-109722
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:119130] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
` (5 preceding siblings ...)
2024-09-12 1:56 ` [ruby-core:119128] " hsbt (Hiroshi SHIBATA) via ruby-core
@ 2024-09-12 3:21 ` duerst via ruby-core
2024-09-12 3:53 ` [ruby-core:119131] " hsbt (Hiroshi SHIBATA) via ruby-core
2025-01-01 15:06 ` [ruby-core:120460] " ima1zumi (Mari Imaizumi) via ruby-core
8 siblings, 0 replies; 10+ messages in thread
From: duerst via ruby-core @ 2024-09-12 3:21 UTC (permalink / raw)
To: ruby-core; +Cc: duerst
Issue #19908 has been updated by duerst (Martin Dürst).
hsbt (Hiroshi SHIBATA) wrote in #note-8:
> Unicode 16.0 has been released.
> Should we move this instead of 15.1?
I think it's more prudent to do 15.1 first, then 16.0. I hope to be able to work on this soon. I created a separate issue for 16.0.
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-109725
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:119131] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
` (6 preceding siblings ...)
2024-09-12 3:21 ` [ruby-core:119130] " duerst via ruby-core
@ 2024-09-12 3:53 ` hsbt (Hiroshi SHIBATA) via ruby-core
2025-01-01 15:06 ` [ruby-core:120460] " ima1zumi (Mari Imaizumi) via ruby-core
8 siblings, 0 replies; 10+ messages in thread
From: hsbt (Hiroshi SHIBATA) via ruby-core @ 2024-09-12 3:53 UTC (permalink / raw)
To: ruby-core; +Cc: hsbt (Hiroshi SHIBATA)
Issue #19908 has been updated by hsbt (Hiroshi SHIBATA).
>I think it's more prudent to do 15.1 first, then 16.0.
Agreed, thanks!
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-109726
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [ruby-core:120460] [Ruby master Feature#19908] Update to Unicode 15.1
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
` (7 preceding siblings ...)
2024-09-12 3:53 ` [ruby-core:119131] " hsbt (Hiroshi SHIBATA) via ruby-core
@ 2025-01-01 15:06 ` ima1zumi (Mari Imaizumi) via ruby-core
8 siblings, 0 replies; 10+ messages in thread
From: ima1zumi (Mari Imaizumi) via ruby-core @ 2025-01-01 15:06 UTC (permalink / raw)
To: ruby-core; +Cc: ima1zumi (Mari Imaizumi)
Issue #19908 has been updated by ima1zumi (Mari Imaizumi).
@duerst
I'm interested in working on this issue. Are you planning to start it? If not, I'd like to try.
----------------------------------------
Feature #19908: Update to Unicode 15.1
https://bugs.ruby-lang.org/issues/19908#change-111243
* Author: nobu (Nobuyoshi Nakada)
* Status: Assigned
* Assignee: duerst (Martin Dürst)
----------------------------------------
The Unicode 15.1 is released.
The current enc-unicode.rb seems to fail because of `Indic_Conjunct_break` properties with values.
I'm not sure how these properties should be handled well.
`/\p{InCB_Liner}/` or `/\p{InCB=Liner}/` as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-01-01 15:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-02 6:55 [ruby-core:114936] [Ruby master Feature#19908] Update to Unicode 15.1 nobu (Nobuyoshi Nakada) via ruby-core
2023-10-02 14:06 ` [ruby-core:114939] " Игорь Пятчиц via ruby-core
2023-12-26 6:52 ` [ruby-core:115899] " duerst via ruby-core
2023-12-26 11:42 ` [ruby-core:115906] " duerst via ruby-core
2024-01-06 21:28 ` [ruby-core:116056] " janosch-x via ruby-core
2024-01-09 1:25 ` [ruby-core:116099] " duerst via ruby-core
2024-09-12 1:56 ` [ruby-core:119128] " hsbt (Hiroshi SHIBATA) via ruby-core
2024-09-12 3:21 ` [ruby-core:119130] " duerst via ruby-core
2024-09-12 3:53 ` [ruby-core:119131] " hsbt (Hiroshi SHIBATA) via ruby-core
2025-01-01 15:06 ` [ruby-core:120460] " ima1zumi (Mari Imaizumi) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).