ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset
@ 2025-12-19  3:56 nobu (Nobuyoshi Nakada) via ruby-core
  2025-12-19  8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-12-19  3:56 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #21796 has been reported by nobu (Nobuyoshi Nakada).

----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in #note-4:
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in #note-6:
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:124314] [Ruby Feature#21796] unpack variant that returns the final offset
  2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
@ 2025-12-19  8:10 ` byroot (Jean Boussier) via ruby-core
  2025-12-19 17:22 ` [ruby-core:124325] " tenderlovemaking (Aaron Patterson) via ruby-core
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-12-19  8:10 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21796 has been updated by byroot (Jean Boussier).


It would be useful indeed, but I'm not sure a new method is the best way?

I think the simplest would be a new keyword parameter:

```ruby
offset, *values = bytes.unpack("Ro", offset: offset, return_offset:true)
```

Another possibility would be to add an `unpack` like method to `StringScanner`, for the case where you want to iteratively deserialize a binary string.

----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115816

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:124325] [Ruby Feature#21796] unpack variant that returns the final offset
  2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
  2025-12-19  8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
@ 2025-12-19 17:22 ` tenderlovemaking (Aaron Patterson) via ruby-core
  2025-12-19 19:57 ` [ruby-core:124328] " byroot (Jean Boussier) via ruby-core
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-12-19 17:22 UTC (permalink / raw)
  To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)

Issue #21796 has been updated by tenderlovemaking (Aaron Patterson).


I really like this idea.  @jhawthorn suggested `^` instead of `o` though, and I really like it.

```ruby
bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("R^", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("R^", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("R^", offset: offset) #=> 3
```

> I think the simplest would be a new keyword parameter

Why a new parameter?  You might be interested in more than one location.  We already have [pack directives for skipping bytes](https://github.com/ruby/ruby/blob/master/doc/language/packed_data.rdoc#additional-directives-for-unpacking) (`@`, `X`, and `x`). It seems natural to add a directive to return the current offset.

> Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.

I think this would be very useful in general, but I think maybe a separate Redmine ticket?

----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115830

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:124328] [Ruby Feature#21796] unpack variant that returns the final offset
  2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
  2025-12-19  8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
  2025-12-19 17:22 ` [ruby-core:124325] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-12-19 19:57 ` byroot (Jean Boussier) via ruby-core
  2025-12-23  2:31 ` [ruby-core:124347] " matz (Yukihiro Matsumoto) via ruby-core
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-12-19 19:57 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21796 has been updated by byroot (Jean Boussier).


> Why a new parameter?

because I misread the ticket, I didn't notice the `o`.

I do think `^` for offset is pure genius though.

----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115833

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:124347] [Ruby Feature#21796] unpack variant that returns the final offset
  2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
                   ` (2 preceding siblings ...)
  2025-12-19 19:57 ` [ruby-core:124328] " byroot (Jean Boussier) via ruby-core
@ 2025-12-23  2:31 ` matz (Yukihiro Matsumoto) via ruby-core
  2025-12-30  8:48 ` [ruby-core:124389] " nobu (Nobuyoshi Nakada) via ruby-core
  2026-02-12  6:38 ` [ruby-core:124777] " matz (Yukihiro Matsumoto) via ruby-core
  5 siblings, 0 replies; 7+ messages in thread
From: matz (Yukihiro Matsumoto) via ruby-core @ 2025-12-23  2:31 UTC (permalink / raw)
  To: ruby-core; +Cc: matz (Yukihiro Matsumoto)

Issue #21796 has been updated by matz (Yukihiro Matsumoto).


I like `^` specifier too.

Matz.


----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115856

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:124389] [Ruby Feature#21796] unpack variant that returns the final offset
  2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
                   ` (3 preceding siblings ...)
  2025-12-23  2:31 ` [ruby-core:124347] " matz (Yukihiro Matsumoto) via ruby-core
@ 2025-12-30  8:48 ` nobu (Nobuyoshi Nakada) via ruby-core
  2026-02-12  6:38 ` [ruby-core:124777] " matz (Yukihiro Matsumoto) via ruby-core
  5 siblings, 0 replies; 7+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-12-30  8:48 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #21796 has been updated by nobu (Nobuyoshi Nakada).


This might be useful for `A`, `a`, and `Z` as well.
Updated the PR to use `^` with the tests.

----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115898

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:124777] [Ruby Feature#21796] unpack variant that returns the final offset
  2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
                   ` (4 preceding siblings ...)
  2025-12-30  8:48 ` [ruby-core:124389] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2026-02-12  6:38 ` matz (Yukihiro Matsumoto) via ruby-core
  5 siblings, 0 replies; 7+ messages in thread
From: matz (Yukihiro Matsumoto) via ruby-core @ 2026-02-12  6:38 UTC (permalink / raw)
  To: ruby-core; +Cc: matz (Yukihiro Matsumoto)

Issue #21796 has been updated by matz (Yukihiro Matsumoto).


Go ahead.

Matz.


----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-116388

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
> 
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```

mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
> 
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec. 
> https://webassembly.github.io/spec/core/binary/values.html#integers





-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-12  6:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19  3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
2025-12-19  8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
2025-12-19 17:22 ` [ruby-core:124325] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-12-19 19:57 ` [ruby-core:124328] " byroot (Jean Boussier) via ruby-core
2025-12-23  2:31 ` [ruby-core:124347] " matz (Yukihiro Matsumoto) via ruby-core
2025-12-30  8:48 ` [ruby-core:124389] " nobu (Nobuyoshi Nakada) via ruby-core
2026-02-12  6:38 ` [ruby-core:124777] " matz (Yukihiro Matsumoto) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).