* [ruby-core:118182] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
@ 2024-06-05 8:42 kou (Kouhei Sutou) via ruby-core
2024-06-05 9:18 ` [ruby-core:118183] " nobu (Nobuyoshi Nakada) via ruby-core
2024-12-26 10:37 ` [ruby-core:120417] " YO4 (Yoshinao Muramatsu) via ruby-core
0 siblings, 2 replies; 3+ messages in thread
From: kou (Kouhei Sutou) via ruby-core @ 2024-06-05 8:42 UTC (permalink / raw)
To: ruby-core; +Cc: kou (Kouhei Sutou)
Issue #20526 has been reported by kou (Kouhei Sutou).
----------------------------------------
Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
https://bugs.ruby-lang.org/issues/20526
* Author: kou (Kouhei Sutou)
* Status: Open
* Target version: 3.2
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
I'm not sure whether this is an intentional behavior but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion:
```ruby
File.write("a.txt", "a\r\n")
File.read("a.txt").bytes # => [97, 13, 10]
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10]
File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n
File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10]
```
Note that the `XXX: ` line the above codes. Is this an intentional behavior?
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ruby-core:118183] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
2024-06-05 8:42 [ruby-core:118182] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows kou (Kouhei Sutou) via ruby-core
@ 2024-06-05 9:18 ` nobu (Nobuyoshi Nakada) via ruby-core
2024-12-26 10:37 ` [ruby-core:120417] " YO4 (Yoshinao Muramatsu) via ruby-core
1 sibling, 0 replies; 3+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-06-05 9:18 UTC (permalink / raw)
To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)
Issue #20526 has been updated by nobu (Nobuyoshi Nakada).
Probably a bug at push back after BOM look ahead.
BTW, on Windows, `File.write` and `File.read` are in text mode by default.
That file would be 4 bytes, "a\r\r\n" in binary.
----------------------------------------
Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
https://bugs.ruby-lang.org/issues/20526#change-108634
* Author: kou (Kouhei Sutou)
* Status: Open
* Target version: 3.2
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
I'm not sure whether this is an intentional behavior but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion:
```ruby
File.write("a.txt", "a\r\n")
File.read("a.txt").bytes # => [97, 13, 10]
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10]
File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n
File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10]
```
Note that the `XXX: ` line the above codes. Is this an intentional behavior?
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ruby-core:120417] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
2024-06-05 8:42 [ruby-core:118182] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows kou (Kouhei Sutou) via ruby-core
2024-06-05 9:18 ` [ruby-core:118183] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-12-26 10:37 ` YO4 (Yoshinao Muramatsu) via ruby-core
1 sibling, 0 replies; 3+ messages in thread
From: YO4 (Yoshinao Muramatsu) via ruby-core @ 2024-12-26 10:37 UTC (permalink / raw)
To: ruby-core; +Cc: YO4 (Yoshinao Muramatsu)
Issue #20526 has been updated by YO4 (Yoshinao Muramatsu).
There are similar strangeness around an encoding specifiers.
preparations
```ruby
RUBY_VERSION # => "3.3.5"
File.write("a.txt", "a\r\n")
File.binread("a.txt").bytes # => [97, 13, 13, 10]
```
experimentations
```ruby
File.open("a.txt") {|f| f.read.bytes} # => [97, 13, 10] # expected(msvcrt[_*] newline)
File.open("a.txt", "r:utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected
File.open("a.txt", "r", encoding: "utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # XXX: universal newline enabled?
```
The omission of the mode parameter seems to enable universal newline.
```ruby
File.open("a.txt", "rt:utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline)
File.open("a.txt", "rt:bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX
File.open("a.txt", "rt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline)
File.open("a.txt", "rt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX
```
XXX: This is odd because universal newline and msvcrt newline appear to be cooperating.
----------------------------------------
Bug #20526: File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
https://bugs.ruby-lang.org/issues/20526#change-111198
* Author: kou (Kouhei Sutou)
* Status: Open
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt]
* Backport: 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED
----------------------------------------
I'm not sure whether this is an intentional behavior or not but it seems that `encoding: "utf-8"` doesn't change newline conversion but `encoding: "bom|utf-8"` changes newline conversion:
```ruby
File.write("a.txt", "a\r\n")
File.read("a.txt").bytes # => [97, 13, 10]
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10]
File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n
File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10]
```
Note that the `XXX: ` line in the above codes. Is this an intentional behavior?
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-26 10:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-05 8:42 [ruby-core:118182] [Ruby master Bug#20526] File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows kou (Kouhei Sutou) via ruby-core
2024-06-05 9:18 ` [ruby-core:118183] " nobu (Nobuyoshi Nakada) via ruby-core
2024-12-26 10:37 ` [ruby-core:120417] " YO4 (Yoshinao Muramatsu) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).