ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:118345] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value
@ 2024-06-19  8:44 os (Shigeki OHARA) via ruby-core
  2024-06-19 11:03 ` [ruby-core:118349] " byroot (Jean Boussier) via ruby-core
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: os (Shigeki OHARA) via ruby-core @ 2024-06-19  8:44 UTC (permalink / raw)
  To: ruby-core; +Cc: os (Shigeki OHARA)

Issue #20585 has been reported by os (Shigeki OHARA).

----------------------------------------
Bug #20585: Size of memory allocated by String.new(:capacity) is different from the specified value
https://bugs.ruby-lang.org/issues/20585

* Author: os (Shigeki OHARA)
* Status: Open
* ruby -v: ruby 3.3.2 (2024-05-30 revision e5a195edf6) [x86_64-freebsd14.0]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
IMHO, if :capacity is specified in String.new, capa will be its value.

In fact, Ruby 3.2 seems to allocate the size as specified.

```
% cat string_capacity.rb
unless /\A3\.[23]\./ =~ RUBY_VERSION
  raise NotImplementedError, 'Not Supported Ruby Version'
end

require 'inline'

class String
  def super_inspect
    self.class.superclass.instance_method(:inspect).bind(self).call
  end
  inline do |builder|
    builder.include '<stdio.h>'
    builder.add_compile_flags '-Wall'
    builder.c_raw <<~CODE
      VALUE capacity(int argc, VALUE *argv, VALUE self) {
        struct RString *rstring = RSTRING(self);

        if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) {
          return rb_to_symbol(rb_str_new_cstr("EMBED"));
        } else {
          if (RBASIC(self)->flags & ELTS_SHARED) {
            return rb_to_symbol(rb_str_new_cstr("SHARED"));
          } else {
            return LONG2NUM(rstring->as.heap.aux.capa);
          }
        }
        return Qnil; /* NOTREACHED */
      }
    CODE
  end
end
```

```
% irb -I. -rstring_capacity
irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.2.4"]
irb(main):002:0> String.new('', capacity: 1024).capacity
=> 1024
irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity
=> 1024
irb(main):004:0>
```

This is what I expect.

However, Ruby 3.3 seems to behave differently.

```
% irb -I. -rstring_capacity
irb(main):001> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.3.2"]
irb(main):002> String.new('', capacity: 1024).capacity
=> 1023
irb(main):003> String.new('*'*1024, capacity: 1024).capacity
=> 2047
irb(main):004>
```

* If only :capacity is specified, one byte less is allocated.
* If the initial string and its bytesize are specified, about twice the size is allocated.

Is this intentional?




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:118349] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value
  2024-06-19  8:44 [ruby-core:118345] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value os (Shigeki OHARA) via ruby-core
@ 2024-06-19 11:03 ` byroot (Jean Boussier) via ruby-core
  2024-06-19 12:16 ` [ruby-core:118351] " byroot (Jean Boussier) via ruby-core
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-06-19 11:03 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20585 has been updated by byroot (Jean Boussier).


Most of this comes from: https://github.com/ruby/ruby/pull/8825

Long story short, `capacity` is a bit confusing because since Ruby strings are null terminated, there is always at least one extra byte needed. So it's debatable whether the terminating byte is accounted for in the capacity.

I see how when using `String.new(capacity:)`, the goal is to avoid reallocation, so if you precomputed the final string size, that might defeat the purpose. The other side of the coin though, is that if you use sizes like `4096` hoping to fit in a specific size in memory, the extra terminator byte make it not behave as you'd hoped.

> If the initial string and its bytesize are specified, about twice the size is allocated.

I need to dig more to answer this one.

----------------------------------------
Bug #20585: Size of memory allocated by String.new(:capacity) is different from the specified value
https://bugs.ruby-lang.org/issues/20585#change-108854

* Author: os (Shigeki OHARA)
* Status: Open
* ruby -v: ruby 3.3.2 (2024-05-30 revision e5a195edf6) [x86_64-freebsd14.0]
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
IMHO, if :capacity is specified in String.new, capa will be its value.

In fact, Ruby 3.2 seems to allocate the size as specified.

```
% cat string_capacity.rb
unless /\A3\.[23]\./ =~ RUBY_VERSION
  raise NotImplementedError, 'Not Supported Ruby Version'
end

require 'inline'

class String
  def super_inspect
    self.class.superclass.instance_method(:inspect).bind(self).call
  end
  inline do |builder|
    builder.include '<stdio.h>'
    builder.add_compile_flags '-Wall'
    builder.c_raw <<~CODE
      VALUE capacity(int argc, VALUE *argv, VALUE self) {
        struct RString *rstring = RSTRING(self);

        if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) {
          return rb_to_symbol(rb_str_new_cstr("EMBED"));
        } else {
          if (RBASIC(self)->flags & ELTS_SHARED) {
            return rb_to_symbol(rb_str_new_cstr("SHARED"));
          } else {
            return LONG2NUM(rstring->as.heap.aux.capa);
          }
        }
        return Qnil; /* NOTREACHED */
      }
    CODE
  end
end
```

```
% irb -I. -rstring_capacity
irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.2.4"]
irb(main):002:0> String.new('', capacity: 1024).capacity
=> 1024
irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity
=> 1024
irb(main):004:0>
```

This is what I expect.

However, Ruby 3.3 seems to behave differently.

```
% irb -I. -rstring_capacity
irb(main):001> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.3.2"]
irb(main):002> String.new('', capacity: 1024).capacity
=> 1023
irb(main):003> String.new('*'*1024, capacity: 1024).capacity
=> 2047
irb(main):004>
```

* If only :capacity is specified, one byte less is allocated.
* If the initial string and its bytesize are specified, about twice the size is allocated.

Is this intentional?




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:118351] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value
  2024-06-19  8:44 [ruby-core:118345] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value os (Shigeki OHARA) via ruby-core
  2024-06-19 11:03 ` [ruby-core:118349] " byroot (Jean Boussier) via ruby-core
@ 2024-06-19 12:16 ` byroot (Jean Boussier) via ruby-core
  2024-06-19 13:24 ` [ruby-core:118353] " Dan0042 (Daniel DeLorme) via ruby-core
  2024-07-08 22:54 ` [ruby-core:118500] " k0kubun (Takashi Kokubun) via ruby-core
  3 siblings, 0 replies; 5+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-06-19 12:16 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20585 has been updated by byroot (Jean Boussier).

Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

> If the initial string and its bytesize are specified, about twice the size is allocated.

Alrigth, this was just a fallout of the other change. The smaller buffer would cause the string to grow in size when the original string was copied, so doubling.

I opened: https://github.com/ruby/ruby/pull/11018

----------------------------------------
Bug #20585: Size of memory allocated by String.new(:capacity) is different from the specified value
https://bugs.ruby-lang.org/issues/20585#change-108856

* Author: os (Shigeki OHARA)
* Status: Open
* ruby -v: ruby 3.3.2 (2024-05-30 revision e5a195edf6) [x86_64-freebsd14.0]
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
IMHO, if :capacity is specified in String.new, capa will be its value.

In fact, Ruby 3.2 seems to allocate the size as specified.

```
% cat string_capacity.rb
unless /\A3\.[23]\./ =~ RUBY_VERSION
  raise NotImplementedError, 'Not Supported Ruby Version'
end

require 'inline'

class String
  def super_inspect
    self.class.superclass.instance_method(:inspect).bind(self).call
  end
  inline do |builder|
    builder.include '<stdio.h>'
    builder.add_compile_flags '-Wall'
    builder.c_raw <<~CODE
      VALUE capacity(int argc, VALUE *argv, VALUE self) {
        struct RString *rstring = RSTRING(self);

        if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) {
          return rb_to_symbol(rb_str_new_cstr("EMBED"));
        } else {
          if (RBASIC(self)->flags & ELTS_SHARED) {
            return rb_to_symbol(rb_str_new_cstr("SHARED"));
          } else {
            return LONG2NUM(rstring->as.heap.aux.capa);
          }
        }
        return Qnil; /* NOTREACHED */
      }
    CODE
  end
end
```

```
% irb -I. -rstring_capacity
irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.2.4"]
irb(main):002:0> String.new('', capacity: 1024).capacity
=> 1024
irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity
=> 1024
irb(main):004:0>
```

This is what I expect.

However, Ruby 3.3 seems to behave differently.

```
% irb -I. -rstring_capacity
irb(main):001> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.3.2"]
irb(main):002> String.new('', capacity: 1024).capacity
=> 1023
irb(main):003> String.new('*'*1024, capacity: 1024).capacity
=> 2047
irb(main):004>
```

* If only :capacity is specified, one byte less is allocated.
* If the initial string and its bytesize are specified, about twice the size is allocated.

Is this intentional?




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:118353] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value
  2024-06-19  8:44 [ruby-core:118345] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value os (Shigeki OHARA) via ruby-core
  2024-06-19 11:03 ` [ruby-core:118349] " byroot (Jean Boussier) via ruby-core
  2024-06-19 12:16 ` [ruby-core:118351] " byroot (Jean Boussier) via ruby-core
@ 2024-06-19 13:24 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-07-08 22:54 ` [ruby-core:118500] " k0kubun (Takashi Kokubun) via ruby-core
  3 siblings, 0 replies; 5+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-06-19 13:24 UTC (permalink / raw)
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20585 has been updated by Dan0042 (Daniel DeLorme).


What about allocating capacity+1 unless capacity is a power of two?

----------------------------------------
Bug #20585: Size of memory allocated by String.new(:capacity) is different from the specified value
https://bugs.ruby-lang.org/issues/20585#change-108859

* Author: os (Shigeki OHARA)
* Status: Closed
* ruby -v: ruby 3.3.2 (2024-05-30 revision e5a195edf6) [x86_64-freebsd14.0]
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
IMHO, if :capacity is specified in String.new, capa will be its value.

In fact, Ruby 3.2 seems to allocate the size as specified.

```
% cat string_capacity.rb
unless /\A3\.[23]\./ =~ RUBY_VERSION
  raise NotImplementedError, 'Not Supported Ruby Version'
end

require 'inline'

class String
  def super_inspect
    self.class.superclass.instance_method(:inspect).bind(self).call
  end
  inline do |builder|
    builder.include '<stdio.h>'
    builder.add_compile_flags '-Wall'
    builder.c_raw <<~CODE
      VALUE capacity(int argc, VALUE *argv, VALUE self) {
        struct RString *rstring = RSTRING(self);

        if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) {
          return rb_to_symbol(rb_str_new_cstr("EMBED"));
        } else {
          if (RBASIC(self)->flags & ELTS_SHARED) {
            return rb_to_symbol(rb_str_new_cstr("SHARED"));
          } else {
            return LONG2NUM(rstring->as.heap.aux.capa);
          }
        }
        return Qnil; /* NOTREACHED */
      }
    CODE
  end
end
```

```
% irb -I. -rstring_capacity
irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.2.4"]
irb(main):002:0> String.new('', capacity: 1024).capacity
=> 1024
irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity
=> 1024
irb(main):004:0>
```

This is what I expect.

However, Ruby 3.3 seems to behave differently.

```
% irb -I. -rstring_capacity
irb(main):001> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.3.2"]
irb(main):002> String.new('', capacity: 1024).capacity
=> 1023
irb(main):003> String.new('*'*1024, capacity: 1024).capacity
=> 2047
irb(main):004>
```

* If only :capacity is specified, one byte less is allocated.
* If the initial string and its bytesize are specified, about twice the size is allocated.

Is this intentional?




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:118500] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value
  2024-06-19  8:44 [ruby-core:118345] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value os (Shigeki OHARA) via ruby-core
                   ` (2 preceding siblings ...)
  2024-06-19 13:24 ` [ruby-core:118353] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-07-08 22:54 ` k0kubun (Takashi Kokubun) via ruby-core
  3 siblings, 0 replies; 5+ messages in thread
From: k0kubun (Takashi Kokubun) via ruby-core @ 2024-07-08 22:54 UTC (permalink / raw)
  To: ruby-core; +Cc: k0kubun (Takashi Kokubun)

Issue #20585 has been updated by k0kubun (Takashi Kokubun).

Backport changed from 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE

ruby_3_3 commit:d1ffd5ecfa62a049b7c508f30b6912a890de1b32.

----------------------------------------
Bug #20585: Size of memory allocated by String.new(:capacity) is different from the specified value
https://bugs.ruby-lang.org/issues/20585#change-109022

* Author: os (Shigeki OHARA)
* Status: Closed
* ruby -v: ruby 3.3.2 (2024-05-30 revision e5a195edf6) [x86_64-freebsd14.0]
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE
----------------------------------------
IMHO, if :capacity is specified in String.new, capa will be its value.

In fact, Ruby 3.2 seems to allocate the size as specified.

```
% cat string_capacity.rb
unless /\A3\.[23]\./ =~ RUBY_VERSION
  raise NotImplementedError, 'Not Supported Ruby Version'
end

require 'inline'

class String
  def super_inspect
    self.class.superclass.instance_method(:inspect).bind(self).call
  end
  inline do |builder|
    builder.include '<stdio.h>'
    builder.add_compile_flags '-Wall'
    builder.c_raw <<~CODE
      VALUE capacity(int argc, VALUE *argv, VALUE self) {
        struct RString *rstring = RSTRING(self);

        if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) {
          return rb_to_symbol(rb_str_new_cstr("EMBED"));
        } else {
          if (RBASIC(self)->flags & ELTS_SHARED) {
            return rb_to_symbol(rb_str_new_cstr("SHARED"));
          } else {
            return LONG2NUM(rstring->as.heap.aux.capa);
          }
        }
        return Qnil; /* NOTREACHED */
      }
    CODE
  end
end
```

```
% irb -I. -rstring_capacity
irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.2.4"]
irb(main):002:0> String.new('', capacity: 1024).capacity
=> 1024
irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity
=> 1024
irb(main):004:0>
```

This is what I expect.

However, Ruby 3.3 seems to behave differently.

```
% irb -I. -rstring_capacity
irb(main):001> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.3.2"]
irb(main):002> String.new('', capacity: 1024).capacity
=> 1023
irb(main):003> String.new('*'*1024, capacity: 1024).capacity
=> 2047
irb(main):004>
```

* If only :capacity is specified, one byte less is allocated.
* If the initial string and its bytesize are specified, about twice the size is allocated.

Is this intentional?




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-08 22:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-19  8:44 [ruby-core:118345] [Ruby master Bug#20585] Size of memory allocated by String.new(:capacity) is different from the specified value os (Shigeki OHARA) via ruby-core
2024-06-19 11:03 ` [ruby-core:118349] " byroot (Jean Boussier) via ruby-core
2024-06-19 12:16 ` [ruby-core:118351] " byroot (Jean Boussier) via ruby-core
2024-06-19 13:24 ` [ruby-core:118353] " Dan0042 (Daniel DeLorme) via ruby-core
2024-07-08 22:54 ` [ruby-core:118500] " k0kubun (Takashi Kokubun) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).