* [ruby-core:120749] [Ruby master Feature#21084] Declare objects have weak references
@ 2025-01-21 16:47 peterzhu2118 (Peter Zhu) via ruby-core
2025-02-06 12:49 ` [ruby-core:120893] " Eregon (Benoit Daloze) via ruby-core
2025-02-06 15:01 ` [ruby-core:120894] " peterzhu2118 (Peter Zhu) via ruby-core
0 siblings, 2 replies; 3+ messages in thread
From: peterzhu2118 (Peter Zhu) via ruby-core @ 2025-01-21 16:47 UTC (permalink / raw)
To: ruby-core; +Cc: peterzhu2118 (Peter Zhu)
Issue #21084 has been reported by peterzhu2118 (Peter Zhu).
----------------------------------------
Feature #21084: Declare objects have weak references
https://bugs.ruby-lang.org/issues/21084
* Author: peterzhu2118 (Peter Zhu)
* Status: Open
----------------------------------------
# Summary
The current way of marking weak references uses `rb_gc_mark_weak(VALUE *ptr)`. This presents challenges because Ruby's GC is incremental, meaning that if the `ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory access. This also overwrites `*ptr = Qundef` if `*ptr` is dead, which prevents any cleanup to be run (e.g. freeing memory or deleting entries from hash tables). This ticket proposes `rb_gc_declare_weak_references` which declares that an object has weak references and calls a cleanup function after marking, allowing the object to clean up any memory for dead objects.
# Introduction
In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an API allowing objects to mark weak references, the function signature looks like this:
```c
void rb_gc_mark_weak(VALUE *ptr);
```
`rb_gc_mark_weak` is called during the marking phase of the GC to specify that the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced. `rb_gc_mark_weak` appends this pointer to a list that is processed after the marking phase of the GC. If the object at `*ptr` is no longer alive, then it overwrites the object reference with a special value (`*ptr = Qundef`).
However, this API resulted in two challenges:
1. Ruby's default GC is incremental, which means that the GC is not ran in one phase, but rather split into chunks of work that interleaves with Ruby execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc heap, and that memory could be realloc'd or even free'd. We had to use workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted runtime performance, and increased memory usage.
2. When an object dies, `rb_gc_mark_weak` only overwites the reference with `Qundef`. This means that if we want to do any cleanup (e.g. free a piece of memory or delete a hash table entry), we could not do that and had to defer this process elsewhere (e.g. during marking or runtime).
# Declarative weak references
In this ticket, I'm proposing a new API for weak references. Instead of an object marking its weak references during the marking phase, the object declares that it has weak references using the `rb_gc_declare_weak_references` function. This declaration occurs during runtime (e.g. after the object has been created) rather than during GC.
After an object declares that it has weak references, it will have its callback function called after marking as long as that object is alive. This callback function can then call a special function `rb_gc_handle_weak_references_alive_p` to determine whether its references are alive. This will allow the callback function to do whatever it wants on the object, allowing it to perform any cleanup work it needs.
This significantly simplifies the code for `ObjectSpace::WeakMap` and `ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for the limitations of `rb_gc_mark_weak`.
# Performance
The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now about 60% faster because the implementation has been simplified and the number of allocations has been reduced. We can see that there is not a significant impact on the performance of `ObjectSpace::WeakMap#[]`.
Base:
```
ObjectSpace::WeakMap#[]=
4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s
ObjectSpace::WeakMap#[]
30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s
```
Branch:
```
ObjectSpace::WeakMap#[]=
7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s
ObjectSpace::WeakMap#[]
30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s
```
Code:
```
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "benchmark-ips"
end
wmap = ObjectSpace::WeakMap.new
key = Object.new
val = Object.new
wmap[key] = val
Benchmark.ips do |x|
x.report("ObjectSpace::WeakMap#[]=") do |times|
i = 0
while i < times
wmap[Object.new] = Object.new
i += 1
end
end
x.report("ObjectSpace::WeakMap#[]") do |times|
i = 0
while i < times
wmap[key]
wmap[val] # does not exist
i += 1
end
end
end
```
# Alternative designs
Currently, `rb_gc_declare_weak_references` is designed to be an internal-only API. This allows us to assume the object types that call `rb_gc_declare_weak_references`. In the future, if we want to open up this API to third parties, we may want to change this function to something like:
```c
void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj));
```
This will allow the third party to implement a custom `callback` that gets called after the marking phase of GC to clean up any dead references. I chose not to implement this design because it is less efficient as we would need to store a mapping from `obj` to `callback`, which requires extra memory.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ruby-core:120893] [Ruby master Feature#21084] Declare objects have weak references
2025-01-21 16:47 [ruby-core:120749] [Ruby master Feature#21084] Declare objects have weak references peterzhu2118 (Peter Zhu) via ruby-core
@ 2025-02-06 12:49 ` Eregon (Benoit Daloze) via ruby-core
2025-02-06 15:01 ` [ruby-core:120894] " peterzhu2118 (Peter Zhu) via ruby-core
1 sibling, 0 replies; 3+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2025-02-06 12:49 UTC (permalink / raw)
To: ruby-core; +Cc: Eregon (Benoit Daloze)
Issue #21084 has been updated by Eregon (Benoit Daloze).
Is there a PR already? I don't see the link in the description.
----------------------------------------
Feature #21084: Declare objects have weak references
https://bugs.ruby-lang.org/issues/21084#change-111771
* Author: peterzhu2118 (Peter Zhu)
* Status: Open
----------------------------------------
# Summary
The current way of marking weak references uses `rb_gc_mark_weak(VALUE *ptr)`. This presents challenges because Ruby's GC is incremental, meaning that if the `ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory access. This also overwrites `*ptr = Qundef` if `*ptr` is dead, which prevents any cleanup to be run (e.g. freeing memory or deleting entries from hash tables). This ticket proposes `rb_gc_declare_weak_references` which declares that an object has weak references and calls a cleanup function after marking, allowing the object to clean up any memory for dead objects.
# Introduction
In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an API allowing objects to mark weak references, the function signature looks like this:
```c
void rb_gc_mark_weak(VALUE *ptr);
```
`rb_gc_mark_weak` is called during the marking phase of the GC to specify that the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced. `rb_gc_mark_weak` appends this pointer to a list that is processed after the marking phase of the GC. If the object at `*ptr` is no longer alive, then it overwrites the object reference with a special value (`*ptr = Qundef`).
However, this API resulted in two challenges:
1. Ruby's default GC is incremental, which means that the GC is not ran in one phase, but rather split into chunks of work that interleaves with Ruby execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc heap, and that memory could be realloc'd or even free'd. We had to use workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted runtime performance, and increased memory usage.
2. When an object dies, `rb_gc_mark_weak` only overwites the reference with `Qundef`. This means that if we want to do any cleanup (e.g. free a piece of memory or delete a hash table entry), we could not do that and had to defer this process elsewhere (e.g. during marking or runtime).
# Declarative weak references
In this ticket, I'm proposing a new API for weak references. Instead of an object marking its weak references during the marking phase, the object declares that it has weak references using the `rb_gc_declare_weak_references` function. This declaration occurs during runtime (e.g. after the object has been created) rather than during GC.
After an object declares that it has weak references, it will have its callback function called after marking as long as that object is alive. This callback function can then call a special function `rb_gc_handle_weak_references_alive_p` to determine whether its references are alive. This will allow the callback function to do whatever it wants on the object, allowing it to perform any cleanup work it needs.
This significantly simplifies the code for `ObjectSpace::WeakMap` and `ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for the limitations of `rb_gc_mark_weak`.
# Performance
The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now about 60% faster because the implementation has been simplified and the number of allocations has been reduced. We can see that there is not a significant impact on the performance of `ObjectSpace::WeakMap#[]`.
Base:
```
ObjectSpace::WeakMap#[]=
4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s
ObjectSpace::WeakMap#[]
30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s
```
Branch:
```
ObjectSpace::WeakMap#[]=
7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s
ObjectSpace::WeakMap#[]
30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s
```
Code:
```
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "benchmark-ips"
end
wmap = ObjectSpace::WeakMap.new
key = Object.new
val = Object.new
wmap[key] = val
Benchmark.ips do |x|
x.report("ObjectSpace::WeakMap#[]=") do |times|
i = 0
while i < times
wmap[Object.new] = Object.new
i += 1
end
end
x.report("ObjectSpace::WeakMap#[]") do |times|
i = 0
while i < times
wmap[key]
wmap[val] # does not exist
i += 1
end
end
end
```
# Alternative designs
Currently, `rb_gc_declare_weak_references` is designed to be an internal-only API. This allows us to assume the object types that call `rb_gc_declare_weak_references`. In the future, if we want to open up this API to third parties, we may want to change this function to something like:
```c
void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj));
```
This will allow the third party to implement a custom `callback` that gets called after the marking phase of GC to clean up any dead references. I chose not to implement this design because it is less efficient as we would need to store a mapping from `obj` to `callback`, which requires extra memory.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* [ruby-core:120894] [Ruby master Feature#21084] Declare objects have weak references
2025-01-21 16:47 [ruby-core:120749] [Ruby master Feature#21084] Declare objects have weak references peterzhu2118 (Peter Zhu) via ruby-core
2025-02-06 12:49 ` [ruby-core:120893] " Eregon (Benoit Daloze) via ruby-core
@ 2025-02-06 15:01 ` peterzhu2118 (Peter Zhu) via ruby-core
1 sibling, 0 replies; 3+ messages in thread
From: peterzhu2118 (Peter Zhu) via ruby-core @ 2025-02-06 15:01 UTC (permalink / raw)
To: ruby-core; +Cc: peterzhu2118 (Peter Zhu)
Issue #21084 has been updated by peterzhu2118 (Peter Zhu).
Ah yes, I forgot to link the PR: https://github.com/ruby/ruby/pull/12606
----------------------------------------
Feature #21084: Declare objects have weak references
https://bugs.ruby-lang.org/issues/21084#change-111773
* Author: peterzhu2118 (Peter Zhu)
* Status: Open
----------------------------------------
# Summary
The current way of marking weak references uses `rb_gc_mark_weak(VALUE *ptr)`. This presents challenges because Ruby's GC is incremental, meaning that if the `ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory access. This also overwrites `*ptr = Qundef` if `*ptr` is dead, which prevents any cleanup to be run (e.g. freeing memory or deleting entries from hash tables). This ticket proposes `rb_gc_declare_weak_references` which declares that an object has weak references and calls a cleanup function after marking, allowing the object to clean up any memory for dead objects.
# Introduction
In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an API allowing objects to mark weak references, the function signature looks like this:
```c
void rb_gc_mark_weak(VALUE *ptr);
```
`rb_gc_mark_weak` is called during the marking phase of the GC to specify that the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced. `rb_gc_mark_weak` appends this pointer to a list that is processed after the marking phase of the GC. If the object at `*ptr` is no longer alive, then it overwrites the object reference with a special value (`*ptr = Qundef`).
However, this API resulted in two challenges:
1. Ruby's default GC is incremental, which means that the GC is not ran in one phase, but rather split into chunks of work that interleaves with Ruby execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc heap, and that memory could be realloc'd or even free'd. We had to use workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted runtime performance, and increased memory usage.
2. When an object dies, `rb_gc_mark_weak` only overwites the reference with `Qundef`. This means that if we want to do any cleanup (e.g. free a piece of memory or delete a hash table entry), we could not do that and had to defer this process elsewhere (e.g. during marking or runtime).
# Declarative weak references
In this ticket, I'm proposing a new API for weak references. Instead of an object marking its weak references during the marking phase, the object declares that it has weak references using the `rb_gc_declare_weak_references` function. This declaration occurs during runtime (e.g. after the object has been created) rather than during GC.
After an object declares that it has weak references, it will have its callback function called after marking as long as that object is alive. This callback function can then call a special function `rb_gc_handle_weak_references_alive_p` to determine whether its references are alive. This will allow the callback function to do whatever it wants on the object, allowing it to perform any cleanup work it needs.
This significantly simplifies the code for `ObjectSpace::WeakMap` and `ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for the limitations of `rb_gc_mark_weak`.
# Performance
The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now about 60% faster because the implementation has been simplified and the number of allocations has been reduced. We can see that there is not a significant impact on the performance of `ObjectSpace::WeakMap#[]`.
Base:
```
ObjectSpace::WeakMap#[]=
4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s
ObjectSpace::WeakMap#[]
30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s
```
Branch:
```
ObjectSpace::WeakMap#[]=
7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s
ObjectSpace::WeakMap#[]
30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s
```
Code:
```
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "benchmark-ips"
end
wmap = ObjectSpace::WeakMap.new
key = Object.new
val = Object.new
wmap[key] = val
Benchmark.ips do |x|
x.report("ObjectSpace::WeakMap#[]=") do |times|
i = 0
while i < times
wmap[Object.new] = Object.new
i += 1
end
end
x.report("ObjectSpace::WeakMap#[]") do |times|
i = 0
while i < times
wmap[key]
wmap[val] # does not exist
i += 1
end
end
end
```
# Alternative designs
Currently, `rb_gc_declare_weak_references` is designed to be an internal-only API. This allows us to assume the object types that call `rb_gc_declare_weak_references`. In the future, if we want to open up this API to third parties, we may want to change this function to something like:
```c
void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj));
```
This will allow the third party to implement a custom `callback` that gets called after the marking phase of GC to clean up any dead references. I chose not to implement this design because it is less efficient as we would need to store a mapping from `obj` to `callback`, which requires extra memory.
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-06 15:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-21 16:47 [ruby-core:120749] [Ruby master Feature#21084] Declare objects have weak references peterzhu2118 (Peter Zhu) via ruby-core
2025-02-06 12:49 ` [ruby-core:120893] " Eregon (Benoit Daloze) via ruby-core
2025-02-06 15:01 ` [ruby-core:120894] " peterzhu2118 (Peter Zhu) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).