* [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new
@ 2025-04-07 23:03 tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 0:45 ` [ruby-core:121567] " ko1 (Koichi Sasada) via ruby-core
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-04-07 23:03 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been reported by tenderlovemaking (Aaron Patterson).
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121567] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-04-08 0:45 ` ko1 (Koichi Sasada) via ruby-core
2025-04-08 1:37 ` [ruby-core:121568] " tenderlovemaking (Aaron Patterson) via ruby-core
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2025-04-08 0:45 UTC (permalink / raw)
To: ruby-core; +Cc: ko1 (Koichi Sasada)
Issue #21254 has been updated by ko1 (Koichi Sasada).
`swap` is remained?
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112622
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121568] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 0:45 ` [ruby-core:121567] " ko1 (Koichi Sasada) via ruby-core
@ 2025-04-08 1:37 ` tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 6:58 ` [ruby-core:121572] " Earlopain (Earlopain _) via ruby-core
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-04-08 1:37 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been updated by tenderlovemaking (Aaron Patterson).
ko1 (Koichi Sasada) wrote in #note-1:
> `swap` is remained?
I [made a patch to remove `swap`](https://github.com/ruby/ruby/commit/04de973311231ca635c802e992ca1f48366f2e4c) but it makes Coverage tests break. I think we can eliminate the instruction but it will take a little more time.
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112623
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121572] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 0:45 ` [ruby-core:121567] " ko1 (Koichi Sasada) via ruby-core
2025-04-08 1:37 ` [ruby-core:121568] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-04-08 6:58 ` Earlopain (Earlopain _) via ruby-core
2025-04-08 16:02 ` [ruby-core:121590] " tenderlovemaking (Aaron Patterson) via ruby-core
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Earlopain (Earlopain _) via ruby-core @ 2025-04-08 6:58 UTC (permalink / raw)
To: ruby-core; +Cc: Earlopain (Earlopain _)
Issue #21254 has been updated by Earlopain (Earlopain _).
> As you can see in the above output, the Class#new frame is eliminated. I'm not sure if anyone really cares about this frame
Sorry if this is a dumb question, but wouldn't this also affect warn in general, similar to what you did for erb?
```rb
class Foo
def initialize
warn "don't call me like this!", uplevel: 1
end
end
def bar
Foo.new
end
bar
```
It currently points to `Foo.new`, would this change this? I wanted to try this out on the playground link myself but it seems broken ):
> <internal:gem_prelude>:2:in 'Kernel#require': Not supported @ rb_check_realpath_internal - /usr/local/lib/ruby/3.5.0+0/rubygems.rb (Errno::ENOTSUP)
> from <internal:gem_prelude>:2:in '<internal:gem_prelude>'
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112629
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121590] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (2 preceding siblings ...)
2025-04-08 6:58 ` [ruby-core:121572] " Earlopain (Earlopain _) via ruby-core
@ 2025-04-08 16:02 ` tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 16:18 ` [ruby-core:121591] " tenderlovemaking (Aaron Patterson) via ruby-core
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-04-08 16:02 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been updated by tenderlovemaking (Aaron Patterson).
Earlopain (Earlopain _) wrote in #note-3:
> > As you can see in the above output, the Class#new frame is eliminated. I'm not sure if anyone really cares about this frame
>
> Sorry if this is a dumb question, but wouldn't this also affect warn in general, similar to what you did for erb?
Not a dumb question. :)
>
> ```rb
> class Foo
> def initialize
> warn "don't call me like this!", uplevel: 1
> end
> end
>
> def bar
> Foo.new
> end
>
> bar
> ```
>
> It currently points to `Foo.new`, would this change this? I wanted to try this out on the playground link myself but it seems broken ):
It doesn't impact this case exactly, but I think it _could_ impact something. The example you gave behaves the same way with or without inlining:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize
warn "don't call me like this!", uplevel: 1
end
end
def bar
Foo.new
end
bar
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8: warning: don't call me like this!
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-04-08T15:56:43Z inline-new a9a45360ce) +PRISM [arm64-darwin24]
test.rb:8: warning: don't call me like this!
```
However, I _think_ the warning uplevel will skip C frames when counting. Since `Class#new` was a C frame, it would have been skipped by the uplevel anyway. I have to double check the implementation, but I think that's what is going on.
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112644
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121591] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (3 preceding siblings ...)
2025-04-08 16:02 ` [ruby-core:121590] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-04-08 16:18 ` tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-09 20:57 ` [ruby-core:121613] " jez (Jake Zimmerman) via ruby-core
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-04-08 16:18 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been updated by tenderlovemaking (Aaron Patterson).
Btw, @ko1 came up with this idea, so I want to say thanks to him.
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112645
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121613] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (4 preceding siblings ...)
2025-04-08 16:18 ` [ruby-core:121591] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-04-09 20:57 ` jez (Jake Zimmerman) via ruby-core
2025-04-09 22:18 ` [ruby-core:121614] " tenderlovemaking (Aaron Patterson) via ruby-core
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jez (Jake Zimmerman) via ruby-core @ 2025-04-09 20:57 UTC (permalink / raw)
To: ruby-core; +Cc: jez (Jake Zimmerman)
Issue #21254 has been updated by jez (Jake Zimmerman).
@tenderlovemaking Question about an extension to the current implementation.
We have a fair amount of code that looks like this:
```ruby
class HoldsEvenNumbers
def initialize(even)
@even = even
end
private_class_method :new
def self.make(n)
return nil unless n.even?
self.new(n)
end
end
```
I believe that in this snippet, the call to `self.new(n)` would not be equal to `rb_class_new_instance_pass_kw` when doing the `vm_method_cfunc_is` search, because there will have been another method entry created by the call to `private_class_method :new`.
I'm curious: could we add a second check after [this check](https://github.com/ruby/ruby/compare/master...Shopify:ruby:inline-new#diff-7e987f13e758a51be20d9ce1e38ad46cabae2aebdfe6968720f0d570db61772aR918) such that if the method entry is `VM_METHOD_TYPE_ZSUPER`, we keep searching to find the `super` method that would be called and see if the `super` method is `rb_class_new_instance_pass_kw`? Plus also inline the logic to check whether the private method call is okay in that second condition (before the final deoptimization jump).
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112665
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121614] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (5 preceding siblings ...)
2025-04-09 20:57 ` [ruby-core:121613] " jez (Jake Zimmerman) via ruby-core
@ 2025-04-09 22:18 ` tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-09 23:15 ` [ruby-core:121615] " tenderlovemaking (Aaron Patterson) via ruby-core
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-04-09 22:18 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been updated by tenderlovemaking (Aaron Patterson).
jez (Jake Zimmerman) wrote in #note-6:
> I'm curious: could we add a second check after [this check](https://github.com/ruby/ruby/compare/master...Shopify:ruby:inline-new#diff-7e987f13e758a51be20d9ce1e38ad46cabae2aebdfe6968720f0d570db61772aR918) such that if the method entry is `VM_METHOD_TYPE_ZSUPER`, we keep searching to find the `super` method that would be called and see if the `super` method is `rb_class_new_instance_pass_kw`? Plus also inline the logic to check whether the private method call is okay in that second condition (before the final deoptimization jump).
Yes, I think we can add a special check for ZSUPER methods, but only for call sites where the receiver is `self`:
```ruby
class A
private_class_method :new
def self.make
new # fast path
end
end
class B
private_class_method :new
def self.make
self.new # fast path
end
end
class C
def self.make m
m.new # fast path depends on m
end
end
C.make(Object) # Fast Path
C.make(A) # Slow Path (exception)
C.make(B) # Slow Path (also exception)
```
I made a patch for it [here](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db), but I haven't tested it in CI yet.
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112666
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121615] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (6 preceding siblings ...)
2025-04-09 22:18 ` [ruby-core:121614] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-04-09 23:15 ` tenderlovemaking (Aaron Patterson) via ruby-core
2025-05-07 6:36 ` [ruby-core:121873] " mame (Yusuke Endoh) via ruby-core
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-04-09 23:15 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been updated by tenderlovemaking (Aaron Patterson).
tenderlovemaking (Aaron Patterson) wrote in #note-7:
> I made a patch for it [here](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db), but I haven't tested it in CI yet.
@jhawthorn pointed out a problem to me with this patch that I didn't think about.
If we consider this code:
```ruby
class A
private_class_method :new
def self.make
new # fast path
end
end
A.make
A.make
```
When we try to [look up the `new` method](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2352) it will fill out the inline cache with the ZSUPER entry. But since the ZSUPER entry [won't pass the `rb_class_new_instance_pass_kw` check](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2359), we'll end up [looking up the `new` method in from the superclass](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2364), which will fill out the inline cache with the method from the superclass.
On the next call to `A.make`, it will miss cache again because the receiver is `A`, basically repeating the above steps. In this case the inline cache will keep ping-ponging between the ZSUPER method and the superclasses method; never hitting.
Maybe we can figure out a way to do this in the future, but I'm not sure if it's a good idea right now. This particular case might actually be better handled by the JIT than the interpreter.
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112667
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121873] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (7 preceding siblings ...)
2025-04-09 23:15 ` [ruby-core:121615] " tenderlovemaking (Aaron Patterson) via ruby-core
@ 2025-05-07 6:36 ` mame (Yusuke Endoh) via ruby-core
2025-05-07 6:37 ` [ruby-core:121874] " hsbt (Hiroshi SHIBATA) via ruby-core
2025-05-07 17:09 ` [ruby-core:121884] " tenderlovemaking (Aaron Patterson) via ruby-core
10 siblings, 0 replies; 12+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-05-07 6:36 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21254 has been updated by mame (Yusuke Endoh).
@tenderlovemaking Can we close this?
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112940
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121874] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (8 preceding siblings ...)
2025-05-07 6:36 ` [ruby-core:121873] " mame (Yusuke Endoh) via ruby-core
@ 2025-05-07 6:37 ` hsbt (Hiroshi SHIBATA) via ruby-core
2025-05-07 17:09 ` [ruby-core:121884] " tenderlovemaking (Aaron Patterson) via ruby-core
10 siblings, 0 replies; 12+ messages in thread
From: hsbt (Hiroshi SHIBATA) via ruby-core @ 2025-05-07 6:37 UTC (permalink / raw)
To: ruby-core; +Cc: hsbt (Hiroshi SHIBATA)
Issue #21254 has been updated by hsbt (Hiroshi SHIBATA).
Note: https://github.com/ruby/ruby/pull/13080 has been merged.
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112941
* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ruby-core:121884] [Ruby Feature#21254] Inlining Class#new
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
` (9 preceding siblings ...)
2025-05-07 6:37 ` [ruby-core:121874] " hsbt (Hiroshi SHIBATA) via ruby-core
@ 2025-05-07 17:09 ` tenderlovemaking (Aaron Patterson) via ruby-core
10 siblings, 0 replies; 12+ messages in thread
From: tenderlovemaking (Aaron Patterson) via ruby-core @ 2025-05-07 17:09 UTC (permalink / raw)
To: ruby-core; +Cc: tenderlovemaking (Aaron Patterson)
Issue #21254 has been updated by tenderlovemaking (Aaron Patterson).
Status changed from Open to Closed
mame (Yusuke Endoh) wrote in #note-10:
> @tenderlovemaking Can we close this?
Yes, sorry. I thought I had closed this.
Merged in 8ac8225c504dee57454131e7cde2c47126596fdc
----------------------------------------
Feature #21254: Inlining Class#new
https://bugs.ruby-lang.org/issues/21254#change-112953
* Author: tenderlovemaking (Aaron Patterson)
* Status: Closed
----------------------------------------
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
`Class#new` especially benefits from inlining for two reasons:
1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments
2. We are able to use an inline cache when calling the `initialize` method
The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below.
## Implementation Details
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling `Object.new` would result in bytecode like this:
```
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
With this patch, the bytecode looks like this:
```
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
The new `opt_new` instruction checks whether or not the `new` implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple `Object.new` in a hot loop improves by about 24%:
```
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
```
Using a single keyword argument is improved by about 72%:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
```
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of `Class#new`:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
```
## Memory Increase
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
```
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
```
irb(main):001> 737191972 - 733354388
=> 3837584
```
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
```
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
```
We see total heap size as reported by `memsize` to only increase by about 1MB:
```
irb(main):001> 3981075617 - 3979926505
=> 1149112
```
## Changes to `caller`
This patch changes `caller` reporting in the `initialize` method:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
```
As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486).
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
```
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
```
Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`.
## Summary
I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-05-07 17:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-07 23:03 [ruby-core:121565] [Ruby Feature#21254] Inlining Class#new tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 0:45 ` [ruby-core:121567] " ko1 (Koichi Sasada) via ruby-core
2025-04-08 1:37 ` [ruby-core:121568] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 6:58 ` [ruby-core:121572] " Earlopain (Earlopain _) via ruby-core
2025-04-08 16:02 ` [ruby-core:121590] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-08 16:18 ` [ruby-core:121591] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-09 20:57 ` [ruby-core:121613] " jez (Jake Zimmerman) via ruby-core
2025-04-09 22:18 ` [ruby-core:121614] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-04-09 23:15 ` [ruby-core:121615] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-05-07 6:36 ` [ruby-core:121873] " mame (Yusuke Endoh) via ruby-core
2025-05-07 6:37 ` [ruby-core:121874] " hsbt (Hiroshi SHIBATA) via ruby-core
2025-05-07 17:09 ` [ruby-core:121884] " tenderlovemaking (Aaron Patterson) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).