ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
@ 2024-11-07 10:31 byroot (Jean Boussier) via ruby-core
  2024-11-07 14:54 ` [ruby-core:119813] " Eregon (Benoit Daloze) via ruby-core
                   ` (30 more replies)
  0 siblings, 31 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-07 10:31 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been reported by byroot (Jean Boussier).

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119813] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
@ 2024-11-07 14:54 ` Eregon (Benoit Daloze) via ruby-core
  2024-11-07 15:38 ` [ruby-core:119815] " byroot (Jean Boussier) via ruby-core
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-11-07 14:54 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20878 has been updated by Eregon (Benoit Daloze).


LGTM, +1.
Maybe simply `rb_str_adopt()` for the name?
That way it's closer to `rb_str_new()`, and these days all String C API taking a C string should also take an encoding anyway so we don't need `enc_` and enc-less variants.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110501

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119815] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
  2024-11-07 14:54 ` [ruby-core:119813] " Eregon (Benoit Daloze) via ruby-core
@ 2024-11-07 15:38 ` byroot (Jean Boussier) via ruby-core
  2024-11-07 16:40 ` [ruby-core:119816] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-07 15:38 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> Maybe simply rb_str_adopt() for the name?

I don't have a strong opinion here, I just went with the current convention.

On another note:

> `ptr` MUST have been allocated with `ruby_xmalloc`.

I'm actually not sure this really need to be a MUST, I suppose what is a MUST is that the pointer should be `freeable` with `ruby_xfree`, but that's it.


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110503

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119816] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
  2024-11-07 14:54 ` [ruby-core:119813] " Eregon (Benoit Daloze) via ruby-core
  2024-11-07 15:38 ` [ruby-core:119815] " byroot (Jean Boussier) via ruby-core
@ 2024-11-07 16:40 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-11-07 17:14 ` [ruby-core:119819] " byroot (Jean Boussier) via ruby-core
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-11-07 16:40 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20878 has been updated by nobu (Nobuyoshi Nakada).


I think it is unsafe for memory leak, in comparison with "RString allocated memory".



----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110504

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119819] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (2 preceding siblings ...)
  2024-11-07 16:40 ` [ruby-core:119816] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-11-07 17:14 ` byroot (Jean Boussier) via ruby-core
  2024-11-08  0:02 ` [ruby-core:119828] " shyouhei (Shyouhei Urabe) via ruby-core
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-07 17:14 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> I think it is unsafe for memory leak, in comparison with "RString allocated memory".

I'm sorry I don't follow, could you expand on what you mean is unsafe? The entire "adopt" idea?

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110507

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119828] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (3 preceding siblings ...)
  2024-11-07 17:14 ` [ruby-core:119819] " byroot (Jean Boussier) via ruby-core
@ 2024-11-08  0:02 ` shyouhei (Shyouhei Urabe) via ruby-core
  2024-11-08  3:20 ` [ruby-core:119830] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: shyouhei (Shyouhei Urabe) via ruby-core @ 2024-11-08  0:02 UTC (permalink / raw)
  To: ruby-core; +Cc: shyouhei (Shyouhei Urabe)

Issue #20878 has been updated by shyouhei (Shyouhei Urabe).


byroot (Jean Boussier) wrote in #note-4:
> > I think it is unsafe for memory leak, in comparison with "RString allocated memory".
> 
> I'm sorry I don't follow, could you expand on what you mean is unsafe? The entire "adopt" idea?

There is no reason for us to believe that the `const char *ptr` was allocated by malloc.  It could be done by mmap or dlopen or anything.  Ruby cannot garbage collect the string because it simply doesn't know how.  Memory leak here is kind of inevitable.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110516

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119830] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (4 preceding siblings ...)
  2024-11-08  0:02 ` [ruby-core:119828] " shyouhei (Shyouhei Urabe) via ruby-core
@ 2024-11-08  3:20 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-11-08  7:53 ` [ruby-core:119834] " byroot (Jean Boussier) via ruby-core
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-11-08  3:20 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20878 has been updated by nobu (Nobuyoshi Nakada).


byroot (Jean Boussier) wrote in #note-4:
> > I think it is unsafe for memory leak, in comparison with "RString allocated memory".
> 
> I'm sorry I don't follow, could you expand on what you mean is unsafe? The entire "adopt" idea?

Whenever you allocate a new object, there is a risk of a memory error.
In that case, who will look after the pointer that is about to be "adopted"?

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110518

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119834] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (5 preceding siblings ...)
  2024-11-08  3:20 ` [ruby-core:119830] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-11-08  7:53 ` byroot (Jean Boussier) via ruby-core
  2024-11-08  8:43 ` [ruby-core:119835] " shyouhei (Shyouhei Urabe) via ruby-core
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-08  7:53 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> There is no reason for us to believe that the const char *ptr was allocated by malloc. 

The proposed function documentation state that the pointer MUST have been allocated with `ruby_xmalloc`.

> henever you allocate a new object, there is a risk of a memory error. In that case, who will look after the pointer that is about to be "adopted"?

I see. From my understanding, the only possible error is OutOfMemory, what if `rb_enc_str_adopt` would directly call `ruby_xfree` on the pointer in such case? Would that cover your concern?


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110522

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119835] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (6 preceding siblings ...)
  2024-11-08  7:53 ` [ruby-core:119834] " byroot (Jean Boussier) via ruby-core
@ 2024-11-08  8:43 ` shyouhei (Shyouhei Urabe) via ruby-core
  2024-11-08  8:56 ` [ruby-core:119836] " rhenium (Kazuki Yamaguchi) via ruby-core
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: shyouhei (Shyouhei Urabe) via ruby-core @ 2024-11-08  8:43 UTC (permalink / raw)
  To: ruby-core; +Cc: shyouhei (Shyouhei Urabe)

Issue #20878 has been updated by shyouhei (Shyouhei Urabe).


byroot (Jean Boussier) wrote in #note-7:
> > There is no reason for us to believe that the const char *ptr was allocated by malloc. 
> 
> The proposed function documentation state that the pointer MUST have been allocated with `ruby_xmalloc`.

If that's okay that's okay.  For instance a return value of asprintf cannot be "adopt"ed then because obviously, that's not allocated by ruby_xmalloc.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110523

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119836] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (7 preceding siblings ...)
  2024-11-08  8:43 ` [ruby-core:119835] " shyouhei (Shyouhei Urabe) via ruby-core
@ 2024-11-08  8:56 ` rhenium (Kazuki Yamaguchi) via ruby-core
  2024-11-08 10:08 ` [ruby-core:119840] " byroot (Jean Boussier) via ruby-core
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rhenium (Kazuki Yamaguchi) via ruby-core @ 2024-11-08  8:56 UTC (permalink / raw)
  To: ruby-core; +Cc: rhenium (Kazuki Yamaguchi)

Issue #20878 has been updated by rhenium (Kazuki Yamaguchi).


byroot (Jean Boussier) wrote:
> #### Work inside RString allocated memory
> [...]
> The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
> numerous safety checks, compute coderange, and write the string terminator on every invocation.

I thought `rb_str_set_len()` was supposed to be the efficient alternative to `rb_str_resize()` meant for such a purpose.

I think an assert on the capacity or filling the terminator is cheap enough that it won't matter. That it computes coderange is news to me - I found it was since commit commit:6b66b5fdedb2c9a9ee48e290d57ca7f8d55e01a2 / [Bug #19902] in 2023. I think correcting coderange after directly modifying the RString-managed buffer is the caller's responsibility. Perhaps it could be reversed?


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110524

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119840] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (8 preceding siblings ...)
  2024-11-08  8:56 ` [ruby-core:119836] " rhenium (Kazuki Yamaguchi) via ruby-core
@ 2024-11-08 10:08 ` byroot (Jean Boussier) via ruby-core
  2024-11-08 15:47 ` [ruby-core:119847] " kddnewton (Kevin Newton) via ruby-core
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-08 10:08 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> If that's okay that's okay. For instance a return value of asprintf cannot be "adopt"ed then because obviously, that's not allocated by ruby_xmalloc.

Yes, that's why I'm wondering if this requirement should be relaxed to "MUST be freeable by `ruby_xfree`", which I believe would be true for `asprintf`.

> I think an assert on the capacity or filling the terminator is cheap enough that it won't matter. 

It seemed to matter when I profiled. In some cases like `strftime` the string is written byte by byte, so it basically double the cost of appending a byte.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110529

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119847] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (9 preceding siblings ...)
  2024-11-08 10:08 ` [ruby-core:119840] " byroot (Jean Boussier) via ruby-core
@ 2024-11-08 15:47 ` kddnewton (Kevin Newton) via ruby-core
  2024-11-08 17:30 ` [ruby-core:119848] " mdalessio (Mike Dalessio) via ruby-core
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: kddnewton (Kevin Newton) via ruby-core @ 2024-11-08 15:47 UTC (permalink / raw)
  To: ruby-core; +Cc: kddnewton (Kevin Newton)

Issue #20878 has been updated by kddnewton (Kevin Newton).


I would use this in Prism as well. There are many cases where we allocate a string in the parser and then when we reify the Ruby AST we have to copy the string over. But the string content was allocated with ruby_xmalloc. So it would be nice to just hand over the string content without having to make a copy.

Personally I would prefer _move_ as a naming convention, just because it mirrors what I would expect from std::move.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110538

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119848] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (10 preceding siblings ...)
  2024-11-08 15:47 ` [ruby-core:119847] " kddnewton (Kevin Newton) via ruby-core
@ 2024-11-08 17:30 ` mdalessio (Mike Dalessio) via ruby-core
  2024-11-21 17:40 ` [ruby-core:119982] " byroot (Jean Boussier) via ruby-core
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: mdalessio (Mike Dalessio) via ruby-core @ 2024-11-08 17:30 UTC (permalink / raw)
  To: ruby-core; +Cc: mdalessio (Mike Dalessio)

Issue #20878 has been updated by mdalessio (Mike Dalessio).


This would likely be useful in Nokogiri as well. The two key places I have in mind are

1. returning a large serialization string generated within libxml2 (which is configured to use `ruby_xmalloc` by default)
2. assembling an HTML5-compliant serialization within the extension (which currently uses `rb_enc_str_buf_cat`)


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110539

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119982] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (11 preceding siblings ...)
  2024-11-08 17:30 ` [ruby-core:119848] " mdalessio (Mike Dalessio) via ruby-core
@ 2024-11-21 17:40 ` byroot (Jean Boussier) via ruby-core
  2024-11-22  8:49 ` [ruby-core:119989] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-21 17:40 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


Proposed implementation: https://github.com/ruby/ruby/pull/12143

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110722

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119989] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (12 preceding siblings ...)
  2024-11-21 17:40 ` [ruby-core:119982] " byroot (Jean Boussier) via ruby-core
@ 2024-11-22  8:49 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-11-22  8:50 ` [ruby-core:119990] " byroot (Jean Boussier) via ruby-core
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-11-22  8:49 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20878 has been updated by nobu (Nobuyoshi Nakada).


Rather I want to propose an opposite:

```C
char *rb_str_new_buffer(volatile VALUE *new_string, long size, rb_encoding *enc);
```


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110729

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:119990] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (13 preceding siblings ...)
  2024-11-22  8:49 ` [ruby-core:119989] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-11-22  8:50 ` byroot (Jean Boussier) via ruby-core
  2024-12-10  4:57 ` [ruby-core:120148] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-11-22  8:50 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


How would that work? e.g. when you need to resize it?

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110730

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120148] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (14 preceding siblings ...)
  2024-11-22  8:50 ` [ruby-core:119990] " byroot (Jean Boussier) via ruby-core
@ 2024-12-10  4:57 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-12-10  9:09 ` [ruby-core:120152] " byroot (Jean Boussier) via ruby-core
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-12-10  4:57 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20878 has been updated by nobu (Nobuyoshi Nakada).


byroot (Jean Boussier) wrote in #note-15:
> How would that work? e.g. when you need to resize it?

```C
VALUE string;
char *buffer = rb_str_new_buffer(&string, size, enc);
memcpy(buffer, somestring, length);
// ...
rb_str_modify_expand(string, 10); // expand 10 bytes
buffer = RSTRING_PTR(string);     // re-get the pointer
```

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110901

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120152] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (15 preceding siblings ...)
  2024-12-10  4:57 ` [ruby-core:120148] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-12-10  9:09 ` byroot (Jean Boussier) via ruby-core
  2024-12-11  1:43 ` [ruby-core:120170] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-10  9:09 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


Right, so that's not really different from https://bugs.ruby-lang.org/issues/20878#Work-inside-RString-allocated-memory. IT's something that's already done, that new function would just be a shortcut for:

```c
VALUE str = rb_str_buf_new(capa);
char *buffer = RSTRING_PTR(str);
```


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110904

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120170] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (16 preceding siblings ...)
  2024-12-10  9:09 ` [ruby-core:120152] " byroot (Jean Boussier) via ruby-core
@ 2024-12-11  1:43 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-12-11 10:08 ` [ruby-core:120175] " byroot (Jean Boussier) via ruby-core
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-12-11  1:43 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20878 has been updated by nobu (Nobuyoshi Nakada).


Yes, and `enc`.
Finally you want to allocate the String for a String-manageable pointer, why not allocate a managing String from the beginning?

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110928

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120175] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (17 preceding siblings ...)
  2024-12-11  1:43 ` [ruby-core:120170] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-12-11 10:08 ` byroot (Jean Boussier) via ruby-core
  2024-12-12  7:15 ` [ruby-core:120197] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-11 10:08 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).

File Capture d’écran 2024-12-11 à 11.03.08.png added

> why not allocate a managing String from the beginning?

I explained it in the issue body. If you want to append one character to an RString, you need something like:

```ruby
void
buf_append_c(VALUE buf, char c)
{
  long capa = rb_str_capacity(buf);
  if (RSTRING_LEN(buf) + 1 > capa) {
    rb_str_modify_expand(buf, capa); // double capa
  }
  char *ptr;
  long len;
  RSTRING_GETMEM(buf, ptr, len);
  ptr[len] = c;
  // Lenght must be set right away in case GC
  // triggers and tries to re-embed the buffer.
  rb_str_set_len(buf, len + 1);
}
```

First that a lot more complicated than just working with a raw malloced buffer, you need some pretty good knowledge of Ruby inner workings not to make a mistake. For example, you could save some metadata like `capacity`, but any time GC triggers, it's potentially no longer valid.

Second, all the `rb_str_*` function will do a lot of costly sanity checking.

If I profile a simple script that's calling `Time#strftime`, which is internally using the APIs you suggest:

```ruby
time = Time.now

i = 10_000_000

while i > 0
  i -= 1
  time.strftime("%FT%T.%6N")
end
```

It looks like this: https://share.firefox.dev/3ZNdAfg

![](Capture%20d%E2%80%99e%CC%81cran%202024-12-11%20a%CC%80%2011.03.08.png)

A ton of time is spent in:
  - `rb_str_set_len` (`9.8%`)
  - `rb_str_resize` (`6.8%`)
  - `RB_FL_TEST_RAW` (to get `RSTRING_PTR` etc) (`5.9%`)

All together, that's more than the time spent doing the actual formatting work in `BSD_vfprintf`, this seems like a major overhead to me.

If at least the API was easier to work with, I wouldn't mind so much, but in my opinion it's actually harder to work with.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110933

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120197] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (18 preceding siblings ...)
  2024-12-11 10:08 ` [ruby-core:120175] " byroot (Jean Boussier) via ruby-core
@ 2024-12-12  7:15 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-12-12  8:18 ` [ruby-core:120202] " byroot (Jean Boussier) via ruby-core
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-12-12  7:15 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20878 has been updated by nobu (Nobuyoshi Nakada).


byroot (Jean Boussier) wrote in #note-19:
> > why not allocate a managing String from the beginning?
> 
> I explained it in the issue body. If you want to append one character to an RString, you need something like:

It is same as `rb_str_cat(buf, &c, 1)`.

> First that a lot more complicated than just working with a raw malloced buffer, you need some pretty good knowledge of Ruby inner workings not to make a mistake. For example, you could save some metadata like `capacity`, but any time GC triggers, it's potentially no longer valid.

I can't get your point here.
Your proposal **does** need the knowledge more, I think.

> A ton of time is spent in:
>   - `rb_str_set_len` (`9.8%`)
>   - `rb_str_resize` (`6.8%`)
>   - `RB_FL_TEST_RAW` (to get `RSTRING_PTR` etc) (`5.9%`)
> 
> All together, that's more than the time spent doing the actual formatting work in `BSD_vfprintf`, this seems like a major overhead to me.

Recursive format in `Time#strftime` may have a room for improvement.




----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110956

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120202] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (19 preceding siblings ...)
  2024-12-12  7:15 ` [ruby-core:120197] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-12-12  8:18 ` byroot (Jean Boussier) via ruby-core
  2024-12-12 10:45 ` [ruby-core:120206] " mame (Yusuke Endoh) via ruby-core
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-12  8:18 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> It is same as rb_str_cat(buf, &c, 1).

Yes and:

  - You can't always use `rb_str_cat`, sometimes you have to pass a pointer to an existing API.
  - `rb_str_cat` does all the checks I mentioned and even more.

> I can't get your point here.

I'm proposing a way to build strings that is both more *convenient* and more *efficient*.

The typical use case being [`ruby/json` `fbuffer.h`](https://github.com/ruby/json/blob/e1f6456499d497f33f69ae4c1afdaf9b2b9c50b3/ext/json/ext/fbuffer/fbuffer.h) and similar buffers in other gems such as `msgpack` etc.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110961

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120206] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (20 preceding siblings ...)
  2024-12-12  8:18 ` [ruby-core:120202] " byroot (Jean Boussier) via ruby-core
@ 2024-12-12 10:45 ` mame (Yusuke Endoh) via ruby-core
  2024-12-12 10:47 ` [ruby-core:120208] " byroot (Jean Boussier) via ruby-core
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2024-12-12 10:45 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #20878 has been updated by mame (Yusuke Endoh).


Discussed at the dev meeting.

> Yes, that's why I'm wondering if this requirement should be relaxed to “MUST be freeable by ruby_xfree”, which I believe would be true for asprintf.

No, `ruby_xfree` is not a simple delegator to system free, depending on environment and configuration.

https://github.com/ruby/ruby/blob/197a3efc751f43956fc9ad30d688b4bfa3f7fbdb/gc/default/default.c#L8180

However, in many environments (at the moment), it only delegates to system free, so it would be very hard to notice if it inadvertently depends on the implementation. This proposed API risks promoting such misuse.

So, it must be proved that this API is really unavoidable. We would like you to try to use String objects as buffers instead of memory pointer. If that brings performance problems, consider again.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110969

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120208] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (21 preceding siblings ...)
  2024-12-12 10:45 ` [ruby-core:120206] " mame (Yusuke Endoh) via ruby-core
@ 2024-12-12 10:47 ` byroot (Jean Boussier) via ruby-core
  2024-12-12 11:46 ` [ruby-core:120216] " byroot (Jean Boussier) via ruby-core
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-12 10:47 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


>  We would like you to try to use String objects as buffers instead of memory pointer.

Yes, I was planning to try to convert JSON's `fbuffer` to use RString to show the impact. I'll update here once I have a working implementation.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110971

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120216] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (22 preceding siblings ...)
  2024-12-12 10:47 ` [ruby-core:120208] " byroot (Jean Boussier) via ruby-core
@ 2024-12-12 11:46 ` byroot (Jean Boussier) via ruby-core
  2024-12-12 17:07 ` [ruby-core:120220] " byroot (Jean Boussier) via ruby-core
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-12 11:46 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


Done: https://github.com/byroot/json/pull/1

It's basically twice as slow on almost all benchmarks. My implementation is rather naive, I'm sure a bit of performance can be reclaimed by using various tricks, but it's risky as you have to be careful when GC trigger and kinda defeat the purpose of using a higher level API.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110980

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120220] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (23 preceding siblings ...)
  2024-12-12 11:46 ` [ruby-core:120216] " byroot (Jean Boussier) via ruby-core
@ 2024-12-12 17:07 ` byroot (Jean Boussier) via ruby-core
  2024-12-13 11:01 ` [ruby-core:120228] " rhenium (Kazuki Yamaguchi) via ruby-core
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-12 17:07 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> No, ruby_xfree is not a simple delegator to system free, depending on environment and configuration.

I see, I didn't know about that compilation flag, thanks for letting me know that.

One solution I see (but that I'm not very fond of) is to pass the `free` function that must be used to `adopt`, e.g.

```c
str = rb_enc_str_adopt(buf->ptr, buf->len, buf->capa, rb_utf8_encoding(), ruby_xfree);
// or
str = rb_enc_str_adopt(buf->ptr, buf->len, buf->capa, rb_utf8_encoding(), free);
```

This way `rb_enc_str_adopt` can check if adopting the string is legal, and if it isn't deoptimize it into a copy?


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110984

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120228] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (24 preceding siblings ...)
  2024-12-12 17:07 ` [ruby-core:120220] " byroot (Jean Boussier) via ruby-core
@ 2024-12-13 11:01 ` rhenium (Kazuki Yamaguchi) via ruby-core
  2024-12-13 11:14 ` [ruby-core:120229] " byroot (Jean Boussier) via ruby-core
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: rhenium (Kazuki Yamaguchi) via ruby-core @ 2024-12-13 11:01 UTC (permalink / raw)
  To: ruby-core; +Cc: rhenium (Kazuki Yamaguchi)

Issue #20878 has been updated by rhenium (Kazuki Yamaguchi).


byroot (Jean Boussier) wrote in #note-19:
> First that a lot more complicated than just working with a raw malloced buffer, you need some pretty good knowledge of Ruby inner workings not to make a mistake. For example, you could save some metadata like `capacity`, but any time GC triggers, it's potentially no longer valid.

I think I understood what you meant. From what I remember, compaction should exclude objects pinned by `rb_gc_mark()` or referenced from the machine stack. In the `Time#strftime` example, much fewer `rb_str_set_len()` calls should be necessary since the String would be always on the stack.

Perhaps a more explicit way to prevent the re-embedding should be provided (for example by having the compaction code check the `rb_str_locktmp()` status).

The proposed API seems tricky to me in that it requires the user to allocate a buffer including the terminator length. The NUL terminator is an implementation detail left out from the public API so far, and I believe it's something we've wanted to eventually get rid of for the `SHARABLE_MIDDLE_SUBSTRING` optimization. I'm not sure if exposing it is a good idea.


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110994

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120229] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (25 preceding siblings ...)
  2024-12-13 11:01 ` [ruby-core:120228] " rhenium (Kazuki Yamaguchi) via ruby-core
@ 2024-12-13 11:14 ` byroot (Jean Boussier) via ruby-core
  2024-12-18  7:50 ` [ruby-core:120292] " shyouhei (Shyouhei Urabe) via ruby-core
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-12-13 11:14 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> I think I understood what you meant. From what I remember, compaction should exclude objects pinned by rb_gc_mark() or referenced from the machine stack.

You are correct.

> In the Time#strftime example, much fewer rb_str_set_len() calls should be necessary since the String would be always on the stack.

I don't think many of these could really be eluded, because you need to call it before every other call to a `rb_str_` method, e.g:

```c
// from strftime.c

rb_str_set_len(ftime, s-start); // must set the length before calling rb_str_append
rb_str_append(ftime, tmp); 
RSTRING_GETMEM(ftime, s, len); // rb_str_append have changed the length and potentially the pointer.
```

> The NUL terminator is an implementation detail left out from the public API so far [...] I'm not sure if exposing it is a good idea.

That's a very good point. I think it could be rephrased to say the NUL terminator is optional. Given the pointer is semantically adopter as soon as the function is called, it could do a `realloc` is needed to add the terminator. Making it an implementation detail.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-110995

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120292] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (26 preceding siblings ...)
  2024-12-13 11:14 ` [ruby-core:120229] " byroot (Jean Boussier) via ruby-core
@ 2024-12-18  7:50 ` shyouhei (Shyouhei Urabe) via ruby-core
  2025-01-07  8:23 ` [ruby-core:120516] " mame (Yusuke Endoh) via ruby-core
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: shyouhei (Shyouhei Urabe) via ruby-core @ 2024-12-18  7:50 UTC (permalink / raw)
  To: ruby-core; +Cc: shyouhei (Shyouhei Urabe)

Issue #20878 has been updated by shyouhei (Shyouhei Urabe).


One thing pointed out in the last developer meeting was that future MMTK might want to break "asprintf return values can be reclaimable using ruby_xfree" assumption at process startup, by choosing different memory management schemes.

----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-111057

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120516] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (27 preceding siblings ...)
  2024-12-18  7:50 ` [ruby-core:120292] " shyouhei (Shyouhei Urabe) via ruby-core
@ 2025-01-07  8:23 ` mame (Yusuke Endoh) via ruby-core
  2025-01-07  9:44 ` [ruby-core:120518] " byroot (Jean Boussier) via ruby-core
  2025-01-07 14:25 ` [ruby-core:120522] " mdalessio (Mike Dalessio) via ruby-core
  30 siblings, 0 replies; 32+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-01-07  8:23 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #20878 has been updated by mame (Yusuke Endoh).


Thanks for the benchmark. I briefly talked about this with @nobu, @akr, and @ko1.

The approach of passing a pointer to the `free` function looks a bit too over-the-top, @nobu said.
It need not only `free` function but also `realloc` function, @akr said.

I think it would be easy to just strictly keep the prerequisite “`ptr` MUST have been allocated with `ruby_xmalloc`.” as originally proposed.
Is there a real-world use case to make a String with a pointer allocated outside of `xmalloc`?

@ko1 suggested to introduce this API as a hidden API only for json gem, instead of introducing it as an official one.


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-111321

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120518] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (28 preceding siblings ...)
  2025-01-07  8:23 ` [ruby-core:120516] " mame (Yusuke Endoh) via ruby-core
@ 2025-01-07  9:44 ` byroot (Jean Boussier) via ruby-core
  2025-01-07 14:25 ` [ruby-core:120522] " mdalessio (Mike Dalessio) via ruby-core
  30 siblings, 0 replies; 32+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-01-07  9:44 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20878 has been updated by byroot (Jean Boussier).


> It need not only free function but also realloc function

Maybe I wasn't clear, but I my suggestion is to only use the `free` function to detect if it's compatible with `ruby_xfree`, which we presumbably can know at compile time, so something like:

```c
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc, void * freefunc(void *))
{
  if (freefunc != ruby_xfree && (CALC_EXACT_MALLOC_SIZE || freefunc != free)) {
    // copy and return
  }
  else {
    // adopt the pointer
  }
}
```

We could also use the `malloc` function instead for the same effect.

> Is there a real-world use case to make a String with a pointer allocated outside of xmalloc?

I don't personally have one, ``snprintf` was mentioned, and it seems realistic? But `snprintf` buffer generally aren't that big, so maybe it doesn't matter as much?

Also perhaps @mdalessio would have some in nokogiri?


> suggested to introduce this API as a hidden API only for json gem, instead of introducing it as an official one.

I don't think it would be a good precedent, given `json` is only a default gem. Also both nokogiri and prism maintainers expressed their interest.


> I think it would be easy to just strictly keep the prerequisite “ptr MUST have been allocated with ruby_xmalloc.” as originally proposed.

I'm also OK with that.


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-111323

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [ruby-core:120522] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
  2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
                   ` (29 preceding siblings ...)
  2025-01-07  9:44 ` [ruby-core:120518] " byroot (Jean Boussier) via ruby-core
@ 2025-01-07 14:25 ` mdalessio (Mike Dalessio) via ruby-core
  30 siblings, 0 replies; 32+ messages in thread
From: mdalessio (Mike Dalessio) via ruby-core @ 2025-01-07 14:25 UTC (permalink / raw)
  To: ruby-core; +Cc: mdalessio (Mike Dalessio)

Issue #20878 has been updated by mdalessio (Mike Dalessio).


> > Is there a real-world use case to make a String with a pointer allocated outside of xmalloc?

> I don't personally have one..
> Also perhaps @mdalessio (Mike Dalessio) would have some in nokogiri?

Yes, it would be easiest for Nokogiri if non-xmalloc string pointers were supported, but if it was decided to not support this, I could work around it.

Nokogiri actively configures libxml2's memory management functions. On windows, libxml2 is configure to use `malloc` because of bugs in some versions of libxml2. [^1] On other platforms, Nokogiri configures libxml2 to use `ruby_xmalloc` by default, but users can opt into using `malloc`, for example if they want to optimize performance and don't mind having a larger max heap size. [^2]

But! If anyone is opting into using `malloc`, it is likely for performance reasons. If the performance improvement from pointer adoption is great enough, and `malloc` strings are not supported, then I would consider removing the feature.

On windows, the libxml2 bugs have been fixed for three years (fixed 2022-02 in v2.9.13 [^3]) and most windows developers are using the precompiled native gem anyway, so if I have to, I would be comfortable changing the default to be `ruby_xmalloc` on windows or working around the limitation in pointer adoption.

[^1]: https://github.com/sparklemotion/nokogiri/issues/2241o
[^2]: https://github.com/sparklemotion/nokogiri/blob/main/adr/2023-04-libxml-memory-management.md
[^3]: https://gitlab.gnome.org/GNOME/libxml2/-/commit/a7b9f3eb


----------------------------------------
Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)`
https://bugs.ruby-lang.org/issues/20878#change-111327

* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context

A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back
wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers,
compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, 

### Current Solution

#### Work in a buffer and copy the result

The most often used solution is to work with a native buffer and to manage a native allocated buffer,
and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby.

It works, but isn't very efficient because it cause an extra copy and an extra `free()`.

On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`.

```c
static void fbuffer_free(FBuffer *fb)
{
    if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) {
        ruby_xfree(fb->ptr);
    }
}

static VALUE fbuffer_to_s(FBuffer *fb)
{
    VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb));
    fbuffer_free(fb);
    return result;
}
```

#### Work inside RString allocated memory

Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`,
and write into it with various functions such as `rb_str_catf`,
or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`.

The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform
numerous safety checks, compute coderange, and write the string terminator on every invocation.

Another major inneficiency is that this API make it hard to be in control of the buffer
growth, so it can result in a lot more `realloc()` calls than manually managing the buffer.

This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime`
performance, this problem showed up as the biggest bottleneck:

  - https://github.com/ruby/ruby/pull/11547
  - https://github.com/ruby/ruby/pull/11544
  - https://github.com/ruby/ruby/pull/11542

### Proposed API

I think a more effcient way to do this would be to work with a native buffer, and then build a RString
that "adopt" the memory region.

Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean,
and a dedicated API would be preferable:

```c
/**
 * Similar to rb_str_new(), but it adopts the pointer instead of copying.
 *
 * @param[in]  ptr             A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc`
 * @param[in]  len             Length  of the string,  in bytes,  not including  the
 *                             terminating NUL character, not including extra capacity.
 * @param[in]  capa            The usable length of `ptr`, in bytes,  including  the
 *                             terminating NUL character.
 * @param[in]  enc             Encoding of `ptr`.
 * @exception  rb_eArgError    `len` is negative.
 * @return     An instance  of ::rb_cString,  of `len`  bytes length, `capa - 1` bytes capacity,
 *             and of `enc` encoding.
 * @pre        At  least  `capa` bytes  of  continuous  memory region  shall  be
 *             accessible via `ptr`.
 * @pre        `ptr` MUST have been allocated with `ruby_xmalloc`.
 * @pre        `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called.
 * @note       `enc` can be a  null pointer.  It can also be  seen as a routine
 *             identical to rb_usascii_str_new() then.
 */
rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc);
```

An alternative to the `adopt` term, could be `move`.


---Files--------------------------------
Capture d’écran 2024-12-11 à 11.03.08.png (250 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-01-07 14:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-07 10:31 [ruby-core:119801] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` byroot (Jean Boussier) via ruby-core
2024-11-07 14:54 ` [ruby-core:119813] " Eregon (Benoit Daloze) via ruby-core
2024-11-07 15:38 ` [ruby-core:119815] " byroot (Jean Boussier) via ruby-core
2024-11-07 16:40 ` [ruby-core:119816] " nobu (Nobuyoshi Nakada) via ruby-core
2024-11-07 17:14 ` [ruby-core:119819] " byroot (Jean Boussier) via ruby-core
2024-11-08  0:02 ` [ruby-core:119828] " shyouhei (Shyouhei Urabe) via ruby-core
2024-11-08  3:20 ` [ruby-core:119830] " nobu (Nobuyoshi Nakada) via ruby-core
2024-11-08  7:53 ` [ruby-core:119834] " byroot (Jean Boussier) via ruby-core
2024-11-08  8:43 ` [ruby-core:119835] " shyouhei (Shyouhei Urabe) via ruby-core
2024-11-08  8:56 ` [ruby-core:119836] " rhenium (Kazuki Yamaguchi) via ruby-core
2024-11-08 10:08 ` [ruby-core:119840] " byroot (Jean Boussier) via ruby-core
2024-11-08 15:47 ` [ruby-core:119847] " kddnewton (Kevin Newton) via ruby-core
2024-11-08 17:30 ` [ruby-core:119848] " mdalessio (Mike Dalessio) via ruby-core
2024-11-21 17:40 ` [ruby-core:119982] " byroot (Jean Boussier) via ruby-core
2024-11-22  8:49 ` [ruby-core:119989] " nobu (Nobuyoshi Nakada) via ruby-core
2024-11-22  8:50 ` [ruby-core:119990] " byroot (Jean Boussier) via ruby-core
2024-12-10  4:57 ` [ruby-core:120148] " nobu (Nobuyoshi Nakada) via ruby-core
2024-12-10  9:09 ` [ruby-core:120152] " byroot (Jean Boussier) via ruby-core
2024-12-11  1:43 ` [ruby-core:120170] " nobu (Nobuyoshi Nakada) via ruby-core
2024-12-11 10:08 ` [ruby-core:120175] " byroot (Jean Boussier) via ruby-core
2024-12-12  7:15 ` [ruby-core:120197] " nobu (Nobuyoshi Nakada) via ruby-core
2024-12-12  8:18 ` [ruby-core:120202] " byroot (Jean Boussier) via ruby-core
2024-12-12 10:45 ` [ruby-core:120206] " mame (Yusuke Endoh) via ruby-core
2024-12-12 10:47 ` [ruby-core:120208] " byroot (Jean Boussier) via ruby-core
2024-12-12 11:46 ` [ruby-core:120216] " byroot (Jean Boussier) via ruby-core
2024-12-12 17:07 ` [ruby-core:120220] " byroot (Jean Boussier) via ruby-core
2024-12-13 11:01 ` [ruby-core:120228] " rhenium (Kazuki Yamaguchi) via ruby-core
2024-12-13 11:14 ` [ruby-core:120229] " byroot (Jean Boussier) via ruby-core
2024-12-18  7:50 ` [ruby-core:120292] " shyouhei (Shyouhei Urabe) via ruby-core
2025-01-07  8:23 ` [ruby-core:120516] " mame (Yusuke Endoh) via ruby-core
2025-01-07  9:44 ` [ruby-core:120518] " byroot (Jean Boussier) via ruby-core
2025-01-07 14:25 ` [ruby-core:120522] " mdalessio (Mike Dalessio) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).