ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
@ 2024-11-05 16:07 javanthropus (Jeremy Bopp) via ruby-core
  2024-11-05 17:52 ` [ruby-core:119748] " byroot (Jean Boussier) via ruby-core
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-11-05 16:07 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20869 has been reported by javanthropus (Jeremy Bopp).

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119748] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
@ 2024-11-05 17:52 ` byroot (Jean Boussier) via ruby-core
  2024-11-05 18:06 ` [ruby-core:119749] " byroot (Jean Boussier) via ruby-core
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-05 17:52 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20869 has been updated by byroot (Jean Boussier).


I just looked into this a bit, I'm not quite familiar enough with the code to really propose a fix, but I get what is happening:

ungetbyte just shift the buffer offset, but the FD offset in unchanged.

```c
static void
io_ungetbyte(VALUE str, rb_io_t *fptr)
{
    // snip...
    // ungetbyte just shift the buffer offset, but the FD offset in unchanged
    fptr->rbuf.off-=(int)len;
    fptr->rbuf.len+=(int)len;
    MEMMOVE(fptr->rbuf.ptr+fptr->rbuf.off, RSTRING_PTR(str), char, len);
}


`fptr->rbuf.len == 1`, but real FD offset is 0
So we're doing `lseek(-1)` which fail with `EINVAL`

```c
static void
io_unread(rb_io_t *fptr)
{
    rb_off_t r;
    rb_io_check_closed(fptr);
    if (fptr->rbuf.len == 0 || fptr->mode & FMODE_DUPLEX)
        return;
    /* xxx: target position may be negative if buffer is filled by ungetc */
    errno = 0;
    // fptr->rbuf.len == 1, but real FD offset is 0
    // So we're doing lseek(-1) which fail with EINVAL
    r = lseek(fptr->fd, -fptr->rbuf.len, SEEK_CUR);
    if (r < 0 && errno) {
        if (errno == ESPIPE)
            fptr->mode |= FMODE_DUPLEX;
        return;
    }
    fptr->rbuf.off = 0;
    fptr->rbuf.len = 0;
    return;
}
```

So I suppose some more tracking info is needed to know that the real FD position and the buffer offset are desynced.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110411

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119749] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
  2024-11-05 17:52 ` [ruby-core:119748] " byroot (Jean Boussier) via ruby-core
@ 2024-11-05 18:06 ` byroot (Jean Boussier) via ruby-core
  2024-11-06 15:08 ` [ruby-core:119773] " javanthropus (Jeremy Bopp) via ruby-core
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-05 18:06 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20869 has been updated by byroot (Jean Boussier).


Just a quick proof of concept that fixes the first case: https://github.com/ruby/ruby/commit/7481a12fef3df934ab0d9db7f8f2d36341a1562e

But I think a lot more codepath would need to consider and update that new offset for the entire class of bug to be fixed.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110412

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119773] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
  2024-11-05 17:52 ` [ruby-core:119748] " byroot (Jean Boussier) via ruby-core
  2024-11-05 18:06 ` [ruby-core:119749] " byroot (Jean Boussier) via ruby-core
@ 2024-11-06 15:08 ` javanthropus (Jeremy Bopp) via ruby-core
  2024-11-06 15:53 ` [ruby-core:119774] " byroot (Jean Boussier) via ruby-core
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-11-06 15:08 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20869 has been updated by javanthropus (Jeremy Bopp).


I think your code change highlights another bug caused by the current behavior where `IO#pos` can report negative values.  Oddly, `IO#seek(0, :CUR)` still returns 0:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  f.ungetbyte(97)
  f.pos             # => -1
  f.seek(0, :CUR)   # => 0
end
```

Note that `IO#pos` works correctly when used with `IO#ungetc` while transcoding since that cases causes an entirely different buffer to be used which is currently ignored by the seeking functions.  As demonstrated in the issue description though, that buffer isn't ever cleared when using the seeking functions.

Conceptually, it makes sense to me that the seeking functions should only care about bytes from the underlying stream since that's what they operate on.  They should ignore read buffer manipulation by `IO#ungetbyte` and `IO#ungetc` since the data pushed by those methods have no relationship to the bytes of the stream.  What I don't see anywhere I've looked is a statement regarding how `IO#ungetbyte` and `IO#ungetc` *should* interact with seeking operations.  The existing specs and docs don't seem to cover those cases.

It would be great to get clarification here before working on solutions.  While I think the best solution would be to disregard the bytes added by `IO#ungetbyte` and `IO#ungetc` and to clear the relevant buffers when seeking, I can imagine others may prefer to preserve the buffers.  Maybe the solution is to leave the behavior deliberately undefined and just warn people against mixing these methods via documentation.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110439

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119774] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (2 preceding siblings ...)
  2024-11-06 15:08 ` [ruby-core:119773] " javanthropus (Jeremy Bopp) via ruby-core
@ 2024-11-06 15:53 ` byroot (Jean Boussier) via ruby-core
  2024-11-06 16:58 ` [ruby-core:119777] " javanthropus (Jeremy Bopp) via ruby-core
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-06 15:53 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20869 has been updated by byroot (Jean Boussier).


I'll add this to the next developer meeting.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110441

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119777] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (3 preceding siblings ...)
  2024-11-06 15:53 ` [ruby-core:119774] " byroot (Jean Boussier) via ruby-core
@ 2024-11-06 16:58 ` javanthropus (Jeremy Bopp) via ruby-core
  2024-11-07 10:46 ` [ruby-core:119807] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-11-06 16:58 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20869 has been updated by javanthropus (Jeremy Bopp).


Considering a parallel, the [manpage for the fseek(3) function](https://linux.die.net/man/3/fseek) clearly states that the effects of ungetc(3) are undone on successful calls to fseek(3).

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110444

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119807] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (4 preceding siblings ...)
  2024-11-06 16:58 ` [ruby-core:119777] " javanthropus (Jeremy Bopp) via ruby-core
@ 2024-11-07 10:46 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-11-07 12:16 ` [ruby-core:119809] " byroot (Jean Boussier) via ruby-core
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-11-07 10:46 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20869 has been updated by nobu (Nobuyoshi Nakada).


The buffers and `Encoding::Converter`s should be discarded at positioning, we think.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110493

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119809] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (5 preceding siblings ...)
  2024-11-07 10:46 ` [ruby-core:119807] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-11-07 12:16 ` byroot (Jean Boussier) via ruby-core
  2024-11-07 13:01 ` [ruby-core:119810] " javanthropus (Jeremy Bopp) via ruby-core
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-07 12:16 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20869 has been updated by byroot (Jean Boussier).


Right, but just discarding the buffers isn't enough, because in the case of `SEEK_CUR`, you need to know exactly how much ahead of the expected cursor you really are.

The only solution I see is to keep a count of how many bytes were added through `ungetc` etc, which is what I outlined in https://github.com/ruby/ruby/commit/7481a12fef3df934ab0d9db7f8f2d36341a1562e

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110496

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119810] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (6 preceding siblings ...)
  2024-11-07 12:16 ` [ruby-core:119809] " byroot (Jean Boussier) via ruby-core
@ 2024-11-07 13:01 ` javanthropus (Jeremy Bopp) via ruby-core
  2024-11-08  4:28 ` [ruby-core:119832] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-11-07 13:01 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20869 has been updated by javanthropus (Jeremy Bopp).


The documentation that was added says that `IO#tell` and `IO#pos` would clear the buffers, but they appear to have a special code path now to avoid it.  `IO#seek(0, :CUR)` doesn't share this behavior, and that's a curious inconsistency.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110498

* Author: javanthropus (Jeremy Bopp)
* Status: Closed
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119832] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (7 preceding siblings ...)
  2024-11-07 13:01 ` [ruby-core:119810] " javanthropus (Jeremy Bopp) via ruby-core
@ 2024-11-08  4:28 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-11-08 13:16 ` [ruby-core:119843] " javanthropus (Jeremy Bopp) via ruby-core
  2024-11-12 14:33 ` [ruby-core:119896] " javanthropus (Jeremy Bopp) via ruby-core
  10 siblings, 0 replies; 12+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-11-08  4:28 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20869 has been updated by nobu (Nobuyoshi Nakada).


The documentation is outdated, I'll fix it.

Since `io.pos` (not assignment) looks mere attribute, differentiated from `seek`.
Doesn't this make sense?

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110520

* Author: javanthropus (Jeremy Bopp)
* Status: Closed
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119843] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (8 preceding siblings ...)
  2024-11-08  4:28 ` [ruby-core:119832] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-11-08 13:16 ` javanthropus (Jeremy Bopp) via ruby-core
  2024-11-12 14:33 ` [ruby-core:119896] " javanthropus (Jeremy Bopp) via ruby-core
  10 siblings, 0 replies; 12+ messages in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-11-08 13:16 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20869 has been updated by javanthropus (Jeremy Bopp).


> Since io.pos (not assignment) looks mere attribute, differentiated from seek.

If not for the fact that `IO#seek` always returns 0 regardless of its arguments (something I've never understood), `IO#pos` could be implemented as `IO#seek(0, :CUR)`.  Why not avoid busting the buffer in that case?  On the other hand, why not simplify the implementation and bust the buffer in all cases?  Maybe I'm too hung up on implementation and am unable to see `IO#pos` as merely an attribute.

I also just installed the latest master branch build to check the changes, and there are still a couple of issues:
1. It's still possible for `IO#pos` to return negative values:
    ```ruby
    require 'tempfile'

    Tempfile.open do |f|
      f.write('0123456789')
      f.rewind

      f.ungetbyte(97)
      f.pos             # => -1
    end
    ```
2. The character buffer isn't cleared when transcoding and seeking without first calling `IO#getc`:
    ```ruby
    require 'tempfile'

    Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
      f.write('0123456789')
      f.rewind

      f.ungetc('a'.encode('utf-16le'))
      # Character buffer will not be cleared
      f.seek(2, :SET)

      f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
    end
    ```

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110533

* Author: javanthropus (Jeremy Bopp)
* Status: Closed
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ruby-core:119896] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking
  2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
                   ` (9 preceding siblings ...)
  2024-11-08 13:16 ` [ruby-core:119843] " javanthropus (Jeremy Bopp) via ruby-core
@ 2024-11-12 14:33 ` javanthropus (Jeremy Bopp) via ruby-core
  10 siblings, 0 replies; 12+ messages in thread
From: javanthropus (Jeremy Bopp) via ruby-core @ 2024-11-12 14:33 UTC (permalink / raw)
  To: ruby-core; +Cc: javanthropus (Jeremy Bopp)

Issue #20869 has been updated by javanthropus (Jeremy Bopp).


I found another issue while looking for more sharp edges on this, and I've opened #20889 as a result.

----------------------------------------
Bug #20869: IO buffer handling is inconsistent when seeking
https://bugs.ruby-lang.org/issues/20869#change-110599

* Author: javanthropus (Jeremy Bopp)
* Status: Closed
* ruby -v: ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
When performing any of the seek based operations on IO (IO#seek, IO#pos=, or IO#rewind), the read buffer is inconsistently cleared:

```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetbyte as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetbyte(97)
  # Byte buffer will not be cleared
  f.seek(2, :SET)

  f.getbyte       # => 97
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getbyte before #ungetbyte uses a
  # buffer that is not preserved when seeking
  f.getbyte
  f.ungetbyte(97)
  # Byte buffer will be cleared
  f.seek(2, :SET)

  f.getbyte       # => 50
end
```

Similar behavior happens when reading characters:
```ruby
require 'tempfile'

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #ungetc as the first read buffer
  # operation uses a buffer that is preserved during
  # seek operations
  f.ungetc('a')
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'
end

Tempfile.open do |f|
  f.write('0123456789')
  f.rewind

  # Calling #getc before #ungetc uses a
  # buffer that is not preserved when seeking
  f.getc
  f.ungetc('a')
  # Character buffer will be cleared
  f.seek(2, :SET)

  f.getc       # => '2'
end
```

When transcoding, however, the character buffer is never cleared when seeking:
```ruby
require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.getc
  f.ungetc('a'.encode('utf-16le'))
  # Character buffer will not be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le')
end
```

I would expect the buffers to be cleared in all cases except possibly when the seek operation doesn't actually move the file pointer such as when calling IO#pos or IO#seek(0, :CUR).  The inconsistent behavior demonstrated here is a problem regardless though.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-11-12 14:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-05 16:07 [ruby-core:119741] [Ruby master Bug#20869] IO buffer handling is inconsistent when seeking javanthropus (Jeremy Bopp) via ruby-core
2024-11-05 17:52 ` [ruby-core:119748] " byroot (Jean Boussier) via ruby-core
2024-11-05 18:06 ` [ruby-core:119749] " byroot (Jean Boussier) via ruby-core
2024-11-06 15:08 ` [ruby-core:119773] " javanthropus (Jeremy Bopp) via ruby-core
2024-11-06 15:53 ` [ruby-core:119774] " byroot (Jean Boussier) via ruby-core
2024-11-06 16:58 ` [ruby-core:119777] " javanthropus (Jeremy Bopp) via ruby-core
2024-11-07 10:46 ` [ruby-core:119807] " nobu (Nobuyoshi Nakada) via ruby-core
2024-11-07 12:16 ` [ruby-core:119809] " byroot (Jean Boussier) via ruby-core
2024-11-07 13:01 ` [ruby-core:119810] " javanthropus (Jeremy Bopp) via ruby-core
2024-11-08  4:28 ` [ruby-core:119832] " nobu (Nobuyoshi Nakada) via ruby-core
2024-11-08 13:16 ` [ruby-core:119843] " javanthropus (Jeremy Bopp) via ruby-core
2024-11-12 14:33 ` [ruby-core:119896] " javanthropus (Jeremy Bopp) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).