* [ruby-core:119380] [Ruby master Misc#20774] Remove remaining locale dependent code from Windows port
@ 2024-10-01 9:57 larskanis (Lars Kanis) via ruby-core
2024-10-04 15:43 ` [ruby-core:119456] " larskanis (Lars Kanis) via ruby-core
0 siblings, 1 reply; 2+ messages in thread
From: larskanis (Lars Kanis) via ruby-core @ 2024-10-01 9:57 UTC (permalink / raw)
To: ruby-core; +Cc: larskanis (Lars Kanis)
Issue #20774 has been reported by larskanis (Lars Kanis).
----------------------------------------
Misc #20774: Remove remaining locale dependent code from Windows port
https://bugs.ruby-lang.org/issues/20774
* Author: larskanis (Lars Kanis)
* Status: Open
----------------------------------------
The external_encoding of files, file names and ENV on Windows were changed from locale codepage to UTF-8 in ruby-3.0.
But there are still several remaining points where locale encoding is used although there is no need to do so.
The Windows port is already fully UTF-16/UTF-8 based and locale encoding is only used for historical and not for technical reasons.
My proposal is to remove (most of) the locale dependent conversions from the ruby code for Windows.
Before I open pull requests in this regard, I would like to confirm this direction with the ruby core team.
Let me show what I mean:
```
# täst-locale-enc.rb
def pr(*strs)
strs.each do |str|
p [str, IO===str ? str.external_encoding&.name : str.encoding.name]
end
end
if $0==__FILE__
pr STDIN # => [#<IO:<STDIN>>, "CP850"]
pr $0 # => ["ruby/t\x84st-locale-enc.rb", "CP850"]
pr __FILE__ # => ["ruby/t\x84st-locale-enc.rb", "CP850"]
pr __dir__ # => ["C:/Users/kanis/ruby", "CP850"]
pr 'ä' # => ["ä", "UTF-8"]
pr '€' # => ["€", "UTF-8"]
pr $:.first # => ["C:/Users/kanis/t\xE2\x82\xACst", "ASCII-8BIT"]
pr $:.last # => ["C:/Ruby33-x64/lib/ruby/3.3.0/x64-mingw-ucrt", "CP850"]
require "win32/registry"
pr Win32::Registry::HKEY_CURRENT_USER.open("Environment")['TMP']
# => ["C:\\Users\\kanis\\AppData\\Local\\Temp", "UTF-8"]
pr Win32::Registry::HKEY_CURRENT_USER.open("\\").each_key{ break _1 }
# => ["AppEvents", "CP850"]
end
# execute with: ruby -It€st ruby\täst-locale-enc.rb
```
I wrote the results on `ruby-3.3 x64-mingw-ucrt` right into the code.
The situation is even worse when called with `-e` script:
```
$ ruby -It€st -r .\ruby\täst-locale-enc.rb -e "pr STDIN, $0, __FILE__, __dir__, 'ä', '€', $:.first, $:.last"
[#<IO:<STDIN>>, "CP850"]
["-e", "CP850"]
["-e", "UTF-8"]
[".", "US-ASCII"]
["\x84", "CP850"]
["?", "CP850"]
["C:/Users/kanis/t\xE2\x82\xACst", "ASCII-8BIT"]
["C:/Ruby33-x64/lib/ruby/3.3.0/x64-mingw-ucrt", "CP850"]
```
There are also some inconsistencies like that it's possible to `require` script names with characters outside of the codepage, but it fails to execute a script directly or by using `require_relative` :
```
$ ruby -r .\t€st-locale-enc.rb -e "pr STDIN"
[#<IO:<STDIN>>, "CP850"]
$ ruby .\t€st-locale-enc.rb
ruby: Invalid argument -- ./t?st-locale-enc.rb (LoadError)
```
Maybe there are more places which are working with locale codepage - these are only the few that I remember from memory.
I would like to change all the above results to be UTF-8 encoded, like it is the case on Ubuntu.
Compatibility
-------------
Changing the encoding of returned strings is of course an API change.
IMHO it is still something we should change in a minor release of ruby.
The reason is that I don't remember about only a single issue cased by the change to UTF-8 in ruby-3.0 in the company I work for.
To the contrary many issues are caused by using locale codepage where some non-ASCII characters work and other characters don't.
Most issue with ruby-3.0 were cased by the keyword argument changes.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 2+ messages in thread
* [ruby-core:119456] [Ruby master Misc#20774] Remove remaining locale dependent code from Windows port
2024-10-01 9:57 [ruby-core:119380] [Ruby master Misc#20774] Remove remaining locale dependent code from Windows port larskanis (Lars Kanis) via ruby-core
@ 2024-10-04 15:43 ` larskanis (Lars Kanis) via ruby-core
0 siblings, 0 replies; 2+ messages in thread
From: larskanis (Lars Kanis) via ruby-core @ 2024-10-04 15:43 UTC (permalink / raw)
To: ruby-core; +Cc: larskanis (Lars Kanis)
Issue #20774 has been updated by larskanis (Lars Kanis).
I opened https://github.com/ruby/ruby/pull/11799 to implement this issue.
----------------------------------------
Misc #20774: Remove remaining locale dependent code from Windows port
https://bugs.ruby-lang.org/issues/20774#change-110076
* Author: larskanis (Lars Kanis)
* Status: Open
----------------------------------------
The external_encoding of files, file names and ENV on Windows were changed from locale codepage to UTF-8 in ruby-3.0.
But there are still several remaining points where locale encoding is used although there is no need to do so.
The Windows port is already fully UTF-16/UTF-8 based and locale encoding is only used for historical and not for technical reasons.
My proposal is to remove (most of) the locale dependent conversions from the ruby code for Windows.
Before I open pull requests in this regard, I would like to confirm this direction with the ruby core team.
Let me show what I mean:
```
# täst-locale-enc.rb
def pr(*strs)
strs.each do |str|
p [str, IO===str ? str.external_encoding&.name : str.encoding.name]
end
end
if $0==__FILE__
pr STDIN # => [#<IO:<STDIN>>, "CP850"]
pr $0 # => ["ruby/t\x84st-locale-enc.rb", "CP850"]
pr __FILE__ # => ["ruby/t\x84st-locale-enc.rb", "CP850"]
pr __dir__ # => ["C:/Users/kanis/ruby", "CP850"]
pr 'ä' # => ["ä", "UTF-8"]
pr '€' # => ["€", "UTF-8"]
pr $:.first # => ["C:/Users/kanis/t\xE2\x82\xACst", "ASCII-8BIT"]
pr $:.last # => ["C:/Ruby33-x64/lib/ruby/3.3.0/x64-mingw-ucrt", "CP850"]
require "win32/registry"
pr Win32::Registry::HKEY_CURRENT_USER.open("Environment")['TMP']
# => ["C:\\Users\\kanis\\AppData\\Local\\Temp", "UTF-8"]
pr Win32::Registry::HKEY_CURRENT_USER.open("\\").each_key{ break _1 }
# => ["AppEvents", "CP850"]
end
# execute with: ruby -It€st ruby\täst-locale-enc.rb
```
I wrote the results on `ruby-3.3 x64-mingw-ucrt` right into the code.
The situation is even worse when called with `-e` script:
```
$ ruby -It€st -r .\ruby\täst-locale-enc.rb -e "pr STDIN, $0, __FILE__, __dir__, 'ä', '€', $:.first, $:.last"
[#<IO:<STDIN>>, "CP850"]
["-e", "CP850"]
["-e", "UTF-8"]
[".", "US-ASCII"]
["\x84", "CP850"]
["?", "CP850"]
["C:/Users/kanis/t\xE2\x82\xACst", "ASCII-8BIT"]
["C:/Ruby33-x64/lib/ruby/3.3.0/x64-mingw-ucrt", "CP850"]
```
There are also some inconsistencies like that it's possible to `require` script names with characters outside of the codepage, but it fails to execute a script directly or by using `require_relative` :
```
$ ruby -r .\t€st-locale-enc.rb -e "pr STDIN"
[#<IO:<STDIN>>, "CP850"]
$ ruby .\t€st-locale-enc.rb
ruby: Invalid argument -- ./t?st-locale-enc.rb (LoadError)
```
Maybe there are more places which are working with locale codepage - these are only the few that I remember from memory.
I would like to change all the above results to be UTF-8 encoded, like it is the case on Ubuntu.
Compatibility
-------------
Changing the encoding of returned strings is of course an API change.
IMHO it is still something we should change in a minor release of ruby.
The reason is that I don't remember about only a single issue cased by the change to UTF-8 in ruby-3.0 in the company I work for.
To the contrary many issues are caused by using locale codepage where some non-ASCII characters work and other characters don't.
Most issue with ruby-3.0 were cased by the keyword argument changes.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-10-04 15:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-01 9:57 [ruby-core:119380] [Ruby master Misc#20774] Remove remaining locale dependent code from Windows port larskanis (Lars Kanis) via ruby-core
2024-10-04 15:43 ` [ruby-core:119456] " larskanis (Lars Kanis) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).