ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:119633] [Ruby master Bug#20819] IO#readline does not process newlines correctly for non-ASCII compatible encodings
@ 2024-10-28 14:08 javanthropus (Jeremy Bopp) via ruby-core
  0 siblings, 0 replies; only message in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-10-28 14:08 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20819 has been reported by javanthropus (Jeremy Bopp).

----------------------------------------
Bug #20819: IO#readline does not process newlines correctly for non-ASCII compatible encodings
https://bugs.ruby-lang.org/issues/20819

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When not performing character conversion, IO#readline only processes newline characters as ASCII when reading paragraphs.  However, when character conversion is involved, even when converting between 2 ASCII incompatible encodings, newline handling is correct.

```ruby
require "tempfile"

Tempfile.open(binmode: true) do |f|
  f.set_encoding("utf-16le")
  f.write("\n\n\n\nhello\n\nworld")
  f.rewind

  # No character conversion case.
  # Expecting "hello\n\n".encode(Encoding::UTF_16LE)
  f.readline("")   # => "\0".force_encoding(Encoding::UTF_16LE) + "\n\n\nhello\n\nworld".encode(Encoding::UTF_16LE)

  f.set_encoding("utf-16le:utf-32le")
  f.rewind

  # Character conversion case.
  f.readline("")   # => "hello\n\n".encode(Encoding::UTF_32LE)
end
```

In the failing case, a newline character appears in the first byte of the input due to the UTF-16LE encoding.  This is discarded per the normal behavior of reading paragraphs, but the following null byte is not consumed as required to consume the entire newline character in UTF-16LE encoding.  This leads to a leading and invalid null byte in the output of IO#readline.  Furthermore, the newlines between "hello" and "world" are not seen as a pair of newline characters sufficient to end the first paragraph because they are not ASCII newlines and instead have a null byte between them.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-10-28 14:09 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-28 14:08 [ruby-core:119633] [Ruby master Bug#20819] IO#readline does not process newlines correctly for non-ASCII compatible encodings javanthropus (Jeremy Bopp) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).