* [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset
@ 2025-12-19 3:56 nobu (Nobuyoshi Nakada) via ruby-core
2025-12-19 8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-12-19 3:56 UTC (permalink / raw)
To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)
Issue #21796 has been reported by nobu (Nobuyoshi Nakada).
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in #note-4:
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in #note-6:
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:124314] [Ruby Feature#21796] unpack variant that returns the final offset
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
@ 2025-12-19 8:10 ` byroot (Jean Boussier) via ruby-core
2025-12-19 17:22 ` [ruby-core:124325] " tenderlovemaking (Aaron Patterson) via ruby-core
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-12-19 8:10 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21796 has been updated by byroot (Jean Boussier).
It would be useful indeed, but I'm not sure a new method is the best way?
I think the simplest would be a new keyword parameter:
```ruby
offset, *values = bytes.unpack("Ro", offset: offset, return_offset:true)
```
Another possibility would be to add an `unpack` like method to `StringScanner`, for the case where you want to iteratively deserialize a binary string.
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115816
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:124325] [Ruby Feature#21796] unpack variant that returns the final offset
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
2025-12-19 8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
@ 2025-12-19 17:22 ` tenderlovemaking (Aaron Patterson) via ruby-core
2025-12-19 19:57 ` [ruby-core:124328] " byroot (Jean Boussier) via ruby-core
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-12-19 17:22 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21796 has been updated by tenderlovemaking (Aaron Patterson).
I really like this idea. @jhawthorn suggested `^` instead of `o` though, and I really like it.
```ruby
bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("R^", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("R^", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("R^", offset: offset) #=> 3
```
> I think the simplest would be a new keyword parameter
Why a new parameter? You might be interested in more than one location. We already have [pack directives for skipping bytes](https://github.com/ruby/ruby/blob/master/doc/language/packed_data.rdoc#additional-directives-for-unpacking) (`@`, `X`, and `x`). It seems natural to add a directive to return the current offset.
> Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.
I think this would be very useful in general, but I think maybe a separate Redmine ticket?
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115830
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:124328] [Ruby Feature#21796] unpack variant that returns the final offset
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
2025-12-19 8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
2025-12-19 17:22 ` [ruby-core:124325] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-12-19 19:57 ` byroot (Jean Boussier) via ruby-core
2025-12-23 2:31 ` [ruby-core:124347] " matz (Yukihiro Matsumoto) via ruby-core
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-12-19 19:57 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21796 has been updated by byroot (Jean Boussier).
> Why a new parameter?
because I misread the ticket, I didn't notice the `o`.
I do think `^` for offset is pure genius though.
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115833
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:124347] [Ruby Feature#21796] unpack variant that returns the final offset
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
` (2 preceding siblings ...)
2025-12-19 19:57 ` [ruby-core:124328] " byroot (Jean Boussier) via ruby-core
@ 2025-12-23 2:31 ` matz (Yukihiro Matsumoto) via ruby-core
2025-12-30 8:48 ` [ruby-core:124389] " nobu (Nobuyoshi Nakada) via ruby-core
2026-02-12 6:38 ` [ruby-core:124777] " matz (Yukihiro Matsumoto) via ruby-core
5 siblings, 0 replies; 7+ messages in thread
From: matz (Yukihiro Matsumoto) via ruby-core @ 2025-12-23 2:31 UTC (permalink / raw)
To: ruby-core; +Cc: matz (Yukihiro Matsumoto)
Issue #21796 has been updated by matz (Yukihiro Matsumoto).
I like `^` specifier too.
Matz.
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115856
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:124389] [Ruby Feature#21796] unpack variant that returns the final offset
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
` (3 preceding siblings ...)
2025-12-23 2:31 ` [ruby-core:124347] " matz (Yukihiro Matsumoto) via ruby-core
@ 2025-12-30 8:48 ` nobu (Nobuyoshi Nakada) via ruby-core
2026-02-12 6:38 ` [ruby-core:124777] " matz (Yukihiro Matsumoto) via ruby-core
5 siblings, 0 replies; 7+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-12-30 8:48 UTC (permalink / raw)
To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)
Issue #21796 has been updated by nobu (Nobuyoshi Nakada).
This might be useful for `A`, `a`, and `Z` as well.
Updated the PR to use `^` with the tests.
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-115898
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:124777] [Ruby Feature#21796] unpack variant that returns the final offset
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
` (4 preceding siblings ...)
2025-12-30 8:48 ` [ruby-core:124389] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2026-02-12 6:38 ` matz (Yukihiro Matsumoto) via ruby-core
5 siblings, 0 replies; 7+ messages in thread
From: matz (Yukihiro Matsumoto) via ruby-core @ 2026-02-12 6:38 UTC (permalink / raw)
To: ruby-core; +Cc: matz (Yukihiro Matsumoto)
Issue #21796 has been updated by matz (Yukihiro Matsumoto).
Go ahead.
Matz.
----------------------------------------
Feature #21796: unpack variant that returns the final offset
https://bugs.ruby-lang.org/issues/21796#change-116388
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
----------------------------------------
mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
> It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
>
> ```ruby
> bytes = "\x01\x02\x03"
> offset = 0
> leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
> leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
> leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3
> ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
> > You could tell how many bytes you read based on the size of the leb128_value returned.
>
> That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
> See the note of the section Values - Integers, in the Wasm spec.
> https://webassembly.github.io/spec/core/binary/values.html#integers
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-02-12 6:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 3:56 [ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset nobu (Nobuyoshi Nakada) via ruby-core
2025-12-19 8:10 ` [ruby-core:124314] " byroot (Jean Boussier) via ruby-core
2025-12-19 17:22 ` [ruby-core:124325] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-12-19 19:57 ` [ruby-core:124328] " byroot (Jean Boussier) via ruby-core
2025-12-23 2:31 ` [ruby-core:124347] " matz (Yukihiro Matsumoto) via ruby-core
2025-12-30 8:48 ` [ruby-core:124389] " nobu (Nobuyoshi Nakada) via ruby-core
2026-02-12 6:38 ` [ruby-core:124777] " matz (Yukihiro Matsumoto) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).