ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str
@ 2026-01-16 16:49 herwin (Herwin W) via ruby-core
  2026-01-16 19:02 ` [ruby-core:124580] " byroot (Jean Boussier) via ruby-core
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: herwin (Herwin W) via ruby-core @ 2026-01-16 16:49 UTC (permalink / raw)
  To: ruby-core; +Cc: herwin (Herwin W)

Issue #21842 has been reported by herwin (Herwin W).

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842

* Author: herwin (Herwin W)
* Status: Open
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124580] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
@ 2026-01-16 19:02 ` byroot (Jean Boussier) via ruby-core
  2026-01-16 20:02 ` [ruby-core:124581] " Eregon (Benoit Daloze) via ruby-core
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2026-01-16 19:02 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21842 has been updated by byroot (Jean Boussier).


Hum, good find. So the function was exposed as a result of [Feature #13381], before that the function was internal.

In that ticket we didn't discuss the default encoding, but it might be fair to assume it should have been BINARY (aka ASCII-8BIT) like `rb_str_new*`.

The function was later documented in https://github.com/ruby/ruby/commit/091faca99ca and assumed to default to ASCII-8BIT.

At first glance I'd say it makes sense to treat this as a bug and change the default encoding.

On the other hand, one could argue that interned binary strings don't make that much sense.

I don't have a strong opinion either way.

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116158

* Author: herwin (Herwin W)
* Status: Open
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124581] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
  2026-01-16 19:02 ` [ruby-core:124580] " byroot (Jean Boussier) via ruby-core
@ 2026-01-16 20:02 ` Eregon (Benoit Daloze) via ruby-core
  2026-01-16 20:17 ` [ruby-core:124582] " byroot (Jean Boussier) via ruby-core
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2026-01-16 20:02 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #21842 has been updated by Eregon (Benoit Daloze).


>From https://github.com/truffleruby/truffleruby/issues/4018#issuecomment-3549329873, it seems everyone's expectation is that it returns a BINARY String, like `rb_str_new()`.
@byroot Could you make a PR to fix it?

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116159

* Author: herwin (Herwin W)
* Status: Open
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124582] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
  2026-01-16 19:02 ` [ruby-core:124580] " byroot (Jean Boussier) via ruby-core
  2026-01-16 20:02 ` [ruby-core:124581] " Eregon (Benoit Daloze) via ruby-core
@ 2026-01-16 20:17 ` byroot (Jean Boussier) via ruby-core
  2026-01-16 22:17 ` [ruby-core:124584] " byroot (Jean Boussier) via ruby-core
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2026-01-16 20:17 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21842 has been updated by byroot (Jean Boussier).


Sure: https://github.com/ruby/ruby/pull/15888

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116160

* Author: herwin (Herwin W)
* Status: Open
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124584] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
                   ` (2 preceding siblings ...)
  2026-01-16 20:17 ` [ruby-core:124582] " byroot (Jean Boussier) via ruby-core
@ 2026-01-16 22:17 ` byroot (Jean Boussier) via ruby-core
  2026-01-17  0:02 ` [ruby-core:124585] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2026-01-16 22:17 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21842 has been updated by byroot (Jean Boussier).


Fixed merged (Redmine seem to be lagging behind, but will probably pick it up).

Backport PRs:

  - 4.0: https://github.com/ruby/ruby/pull/15890
  - 3.4: https://github.com/ruby/ruby/pull/15891
  - 3.3: https://github.com/ruby/ruby/pull/15892

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116163

* Author: herwin (Herwin W)
* Status: Open
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124585] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
                   ` (3 preceding siblings ...)
  2026-01-16 22:17 ` [ruby-core:124584] " byroot (Jean Boussier) via ruby-core
@ 2026-01-17  0:02 ` nobu (Nobuyoshi Nakada) via ruby-core
  2026-01-18  8:53 ` [ruby-core:124588] " herwin (Herwin W) via ruby-core
  2026-02-09 21:44 ` [ruby-core:124745] " k0kubun (Takashi Kokubun) via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2026-01-17  0:02 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #21842 has been updated by nobu (Nobuyoshi Nakada).


I think it should be US-ASCII for 7bit only strings, as well as `Symbol`s.

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116164

* Author: herwin (Herwin W)
* Status: Open
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124588] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
                   ` (4 preceding siblings ...)
  2026-01-17  0:02 ` [ruby-core:124585] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2026-01-18  8:53 ` herwin (Herwin W) via ruby-core
  2026-02-09 21:44 ` [ruby-core:124745] " k0kubun (Takashi Kokubun) via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: herwin (Herwin W) via ruby-core @ 2026-01-18  8:53 UTC (permalink / raw)
  To: ruby-core; +Cc: herwin (Herwin W)

Issue #21842 has been updated by herwin (Herwin W).


I've made a short update of the documentation in https://github.com/ruby/ruby/pull/15897, mostly to explain what information is used to determine the encoding of the result.

I've tried to keep the line width usage similar to the original, which meant doubling some random spaces until it lined up. I would not mind dropping this dependency, since it makes updating these texts a whole lot easier.

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116169

* Author: herwin (Herwin W)
* Status: Open
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:124745] [Ruby Bug#21842] Encoding of rb_interned_str
  2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
                   ` (5 preceding siblings ...)
  2026-01-18  8:53 ` [ruby-core:124588] " herwin (Herwin W) via ruby-core
@ 2026-02-09 21:44 ` k0kubun (Takashi Kokubun) via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: k0kubun (Takashi Kokubun) via ruby-core @ 2026-02-09 21:44 UTC (permalink / raw)
  To: ruby-core; +Cc: k0kubun (Takashi Kokubun)

Issue #21842 has been updated by k0kubun (Takashi Kokubun).

Backport changed from 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED to 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: DONE

ruby_4_0 commit:306930ae1ac62fb3b7f96581f4a6e9ab4c083e84 merged revision(s) commit:78b7646bdb91285873ac26bca060591e06c45afe, commit:b4a62a1ca949d93332ad8bce0fcc273581160cc5.

----------------------------------------
Bug #21842: Encoding of rb_interned_str
https://bugs.ruby-lang.org/issues/21842#change-116347

* Author: herwin (Herwin W)
* Status: Closed
* ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev
* Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: DONE
----------------------------------------
This is one of the API methods to get an fstring. The documentation in the source says the following:
```c
/**
 * Identical to rb_str_new(), except it returns an infamous "f"string.  What is
 * a  fstring?  Well  it is  a special  subkind of  strings that  is immutable,
 * deduped globally, and managed by our GC.   It is much like a Symbol (in fact
 * Symbols  are dynamic  these days  and are  backended using  fstrings).  This
 * concept has been  silently introduced at some point in  2.x era.  Since then
 * it  gained  wider acceptance  in  the  core.   Starting from  3.x  extension
 * libraries can also generate ones.
 *
 * @param[in]  ptr           A memory region of `len` bytes length.
 * @param[in]  len           Length  of  `ptr`,  in bytes,  not  including  the
 *                           terminating NUL character.
 * @exception  rb_eArgError  `len` is negative.
 * @return     A  found or  created instance  of ::rb_cString,  of `len`  bytes
 *             length, of  "binary" encoding,  whose contents are  identical to
 *             that of `ptr`.
 * @pre        At  least  `len` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 */
VALUE rb_interned_str(const char *ptr, long len);
```

I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour)
```ruby
it "support binary strings that are invalid in ASCII encoding" do
  str = "foo\x81bar\x82baz".b
  result = @s.rb_interned_str(str, str.bytesize)
  result.encoding.should == Encoding::US_ASCII
  result.should == str.dup.force_encoding(Encoding::US_ASCII)
  result.should_not.valid_encoding?
end
```
So it seems to me like either the implementation of the documentation is incorrect.

(`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument).



-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-02-09 21:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-16 16:49 [ruby-core:124579] [Ruby Bug#21842] Encoding of rb_interned_str herwin (Herwin W) via ruby-core
2026-01-16 19:02 ` [ruby-core:124580] " byroot (Jean Boussier) via ruby-core
2026-01-16 20:02 ` [ruby-core:124581] " Eregon (Benoit Daloze) via ruby-core
2026-01-16 20:17 ` [ruby-core:124582] " byroot (Jean Boussier) via ruby-core
2026-01-16 22:17 ` [ruby-core:124584] " byroot (Jean Boussier) via ruby-core
2026-01-17  0:02 ` [ruby-core:124585] " nobu (Nobuyoshi Nakada) via ruby-core
2026-01-18  8:53 ` [ruby-core:124588] " herwin (Herwin W) via ruby-core
2026-02-09 21:44 ` [ruby-core:124745] " k0kubun (Takashi Kokubun) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).