* [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
@ 2025-04-07 17:27 mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-07 17:33 ` [ruby-core:121557] " mbcodeandsound (Mike Bourgeous) via ruby-core
` (14 more replies)
0 siblings, 15 replies; 16+ messages in thread
From: mbcodeandsound (Mike Bourgeous) via ruby-core @ 2025-04-07 17:27 UTC (permalink / raw)
To: ruby-core; +Cc: mbcodeandsound (Mike Bourgeous)
Issue #21220 has been reported by mbcodeandsound (Mike Bourgeous).
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Open
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121557] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
@ 2025-04-07 17:33 ` mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-08 7:00 ` [ruby-core:121573] " byroot (Jean Boussier) via ruby-core
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mbcodeandsound (Mike Bourgeous) via ruby-core @ 2025-04-07 17:33 UTC (permalink / raw)
To: ruby-core; +Cc: mbcodeandsound (Mike Bourgeous)
Issue #21220 has been updated by mbcodeandsound (Mike Bourgeous).
Something like this should prevent the memory corruption, but may be hiding a deeper issue:
``` diff
--- thread.c 2025-02-14 14:25:54.000000000 -0700
+++ thread_fix.c 2025-04-07 11:32:53.571115993 -0600
@@ -5675,7 +5675,7 @@
rb_ary_push(lines, LONG2FIX(line + 1));
return;
}
- if (line >= RARRAY_LEN(lines)) { /* no longer tracked */
+ if (line < 0 || line >= RARRAY_LEN(lines)) { /* no longer tracked */
return;
}
num = RARRAY_AREF(lines, line);
```
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112583
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Open
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121573] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-07 17:33 ` [ruby-core:121557] " mbcodeandsound (Mike Bourgeous) via ruby-core
@ 2025-04-08 7:00 ` byroot (Jean Boussier) via ruby-core
2025-04-08 7:41 ` [ruby-core:121577] " mame (Yusuke Endoh) via ruby-core
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-04-08 7:00 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21220 has been updated by byroot (Jean Boussier).
> I do not know if it's normal for rb_sourceline() to return 0
Yes:
```ruby
const char *
rb_sourcefile(void)
{
const rb_execution_context_t *ec = GET_EC();
const rb_control_frame_t *cfp = rb_vm_get_ruby_level_next_cfp(ec, ec->cfp);
if (cfp) {
return RSTRING_PTR(rb_iseq_path(cfp->iseq));
}
else {
return 0;
}
}
```
If there is no Ruby frame yet (only C frames) you get zero.
You patch seems OK, even though I think it would be simpler to return right after the `rb_sourcefile()` call.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112630
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Open
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121577] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-07 17:33 ` [ruby-core:121557] " mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-08 7:00 ` [ruby-core:121573] " byroot (Jean Boussier) via ruby-core
@ 2025-04-08 7:41 ` mame (Yusuke Endoh) via ruby-core
2025-04-08 7:43 ` [ruby-core:121578] " byroot (Jean Boussier) via ruby-core
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-04-08 7:41 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21220 has been updated by mame (Yusuke Endoh).
Thanks for the report. The issue reproduces on my machine, but it's very rare. I haven't figured out what happens.
byroot (Jean Boussier) wrote in #note-2:
> If there is no Ruby frame yet (only C frames) you get zero.
That's true, but I was assuming that `RUBY_EVENT_COVERAGE_LINE` would never occur at such a timing.
I am OK to merge the proposed patch because it looks benign, but I am still curious how the issue itself happens.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112633
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Open
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121578] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (2 preceding siblings ...)
2025-04-08 7:41 ` [ruby-core:121577] " mame (Yusuke Endoh) via ruby-core
@ 2025-04-08 7:43 ` byroot (Jean Boussier) via ruby-core
2025-04-08 7:50 ` [ruby-core:121579] " byroot (Jean Boussier) via ruby-core
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-04-08 7:43 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21220 has been updated by byroot (Jean Boussier).
> The issue reproduces on my machine, but it's very rare.
You can make it happen 100% of the time with the following patch:
```diff
diff --git a/thread.c b/thread.c
index 661ee977a9..247fe7237f 100644
--- a/thread.c
+++ b/thread.c
@@ -5662,6 +5662,9 @@ update_line_coverage(VALUE data, const rb_trace_arg_t *trace_arg)
VALUE lines = RARRAY_AREF(coverage, COVERAGE_INDEX_LINES);
if (lines) {
long line = rb_sourceline() - 1;
+ if (line < 0) {
+ rb_bug("fail");
+ }
long count;
VALUE num;
void rb_iseq_clear_event_flags(const rb_iseq_t *iseq, size_t pos, rb_event_flag_t reset);
```
It's rare because it only crashes if GC trigger which doesn't always happen.
I'm using that patch to try to come up with a reproduction (but you are welcome to dig further too, there's no guarantee I will find the repro).
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112634
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Open
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121579] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (3 preceding siblings ...)
2025-04-08 7:43 ` [ruby-core:121578] " byroot (Jean Boussier) via ruby-core
@ 2025-04-08 7:50 ` byroot (Jean Boussier) via ruby-core
2025-04-08 8:25 ` [ruby-core:121580] " mame (Yusuke Endoh) via ruby-core
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-04-08 7:50 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21220 has been updated by byroot (Jean Boussier).
What I've figured for now it that the corruption is triggered by:
```ruby
# This require line makes sure the original script file is processed by simplecov
require File.expand_path($0, '.')
```
```
frame #4: 0x00000001003c468c ruby`rb_bug(fmt=<unavailable>) at error.c:1117:5
frame #5: 0x00000001003c57dc ruby`update_line_coverage.cold.1 at thread.c:5666:17
frame #6: 0x000000010021ae64 ruby`update_line_coverage(data=<unavailable>, trace_arg=<unavailable>) at thread.c:5666:17
frame #7: 0x0000000100277fe0 ruby`exec_hooks_unprotected [inlined] exec_hooks_body(ec=0x0000000141f05bf0, list=0x0000000141f05970, trace_arg=0x000000016fdfc4c8) at vm_trace.c:352:17
frame #8: 0x0000000100277fc0 ruby`exec_hooks_unprotected(ec=0x0000000141f05bf0, list=0x0000000141f05970, trace_arg=0x000000016fdfc4c8) at vm_trace.c:381:5
frame #9: 0x0000000100277f30 ruby`rb_exec_event_hooks(trace_arg=<unavailable>, hooks=0x0000000141f05970, pop_p=0) at vm_trace.c:427:13
frame #10: 0x000000010026c01c ruby`vm_trace_hook [inlined] rb_exec_event_hook_orig(ec=0x0000000141f05bf0, hooks=0x0000000141f05970, flag=65536, self=4302425520, id=0, called_id=0, klass=0, data=36, pop_p=0) at vm_core.h:2179:5
frame #11: 0x000000010026bfe8 ruby`vm_trace_hook(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127e78, pc=<unavailable>, pc_events=65537, target_event=65536, global_hooks=0x0000000141f05970, local_hooks_ptr=0x000000011b4d4ca8, val=36) at vm_insnhelper.c:7062:9
frame #12: 0x000000010026b9e4 ruby`vm_trace(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127e78) at vm_insnhelper.c:7170:13
frame #13: 0x000000010024b128 ruby`vm_exec_core(ec=<unavailable>) at vm.inc:4972:5
frame #14: 0x0000000100249860 ruby`rb_vm_exec(ec=0x0000000141f05bf0) at vm.c:2597:22
frame #15: 0x000000010025e978 ruby`rb_iseq_eval(iseq=<unavailable>) at vm.c:2852:11 [artificial]
frame #16: 0x0000000100123298 ruby`load_iseq_eval(ec=0x0000000141f05bf0, fname=4753022720) at load.c:789:5
frame #17: 0x0000000100121350 ruby`require_internal(ec=0x0000000141f05bf0, fname=4757156480, exception=1, warn=<unavailable>) at load.c:1297:21
frame #18: 0x0000000100120608 ruby`rb_require_string_internal(fname=4757156480, resurrect=false) at load.c:1403:22
frame #19: 0x00000001001204d8 ruby`rb_f_require [inlined] rb_require_string(fname=4757156480) at load.c:1389:12
frame #20: 0x00000001001204b8 ruby`rb_f_require(obj=<unavailable>, fname=<unavailable>) at load.c:1029:12
frame #21: 0x0000000100268894 ruby`vm_call_cfunc_with_frame_(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127ee8, calling=<unavailable>, argc=1, argv=0x00000001480280c0, stack_bottom=0x00000001480280b8) at vm_insnhelper.c:3794:11
frame #22: 0x0000000100263af8 ruby`vm_call_alias(ec=<unavailable>, cfp=<unavailable>, calling=0x000000016fdfd140) at vm_insnhelper.c:4181:12
frame #23: 0x000000010024cf4c ruby`vm_exec_core [inlined] vm_sendish(ec=0x0000000141f05bf0, reg_cfp=0x0000000148127ee8, cd=0x00006000022b3110, block_handler=0, method_explorer=mexp_search_method) at vm_insnhelper.c:5964:15
frame #24: 0x000000010024ce48 ruby`vm_exec_core(ec=<unavailable>) at insns.def:898:11
frame #25: 0x0000000100249860 ruby`rb_vm_exec(ec=0x0000000141f05bf0) at vm.c:2597:22
frame #26: 0x000000010025e978 ruby`rb_iseq_eval(iseq=<unavailable>) at vm.c:2852:11 [artificial]
frame #27: 0x0000000100123298 ruby`load_iseq_eval(ec=0x0000000141f05bf0, fname=4755544040) at load.c:789:5
frame #28: 0x0000000100121350 ruby`require_internal(ec=0x0000000141f05bf0, fname=4726729080, exception=1, warn=<unavailable>) at load.c:1297:21
```
But I wasn't able to reduce it yet.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112635
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Open
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121580] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (4 preceding siblings ...)
2025-04-08 7:50 ` [ruby-core:121579] " byroot (Jean Boussier) via ruby-core
@ 2025-04-08 8:25 ` mame (Yusuke Endoh) via ruby-core
2025-04-08 8:28 ` [ruby-core:121581] " byroot (Jean Boussier) via ruby-core
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-04-08 8:25 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21220 has been updated by mame (Yusuke Endoh).
Status changed from Open to Assigned
Assignee set to prism
@byroot Thanks! I think this is a issue of the prism compiler handling the line number around a flip-flop syntax.
test.rb
```
TracePoint.new(:line) do |tp|
p tp
end.enable
load "./target.rb"
```
target.rb
```
if true..true
end
```
```
$ ~/work/ruby/local/bin/ruby --parser=prism test.rb
#<TracePoint:line test.rb:5>
#<TracePoint:line /tmp/reproduce-simplecov-ruby34-bug/target.rb:0> # <=== This lineno should be 1
$ ~/work/ruby/local/bin/ruby --parser=parse.y test.rb
#<TracePoint:line test.rb:5>
#<TracePoint:line /tmp/reproduce-simplecov-ruby34-bug/target.rb:1>
```
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112636
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121581] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (5 preceding siblings ...)
2025-04-08 8:25 ` [ruby-core:121580] " mame (Yusuke Endoh) via ruby-core
@ 2025-04-08 8:28 ` byroot (Jean Boussier) via ruby-core
2025-04-08 8:32 ` [ruby-core:121582] " mame (Yusuke Endoh) via ruby-core
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-04-08 8:28 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21220 has been updated by byroot (Jean Boussier).
Nice find!
Can source line ever legitimately be `0`? Perhaps we are missing some assertions that would have caught this earlier? And perhaps there are other syntax causing the same issue.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112637
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121582] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (6 preceding siblings ...)
2025-04-08 8:28 ` [ruby-core:121581] " byroot (Jean Boussier) via ruby-core
@ 2025-04-08 8:32 ` mame (Yusuke Endoh) via ruby-core
2025-04-08 8:47 ` [ruby-core:121583] " mame (Yusuke Endoh) via ruby-core
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-04-08 8:32 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21220 has been updated by mame (Yusuke Endoh).
@mbcodeandsound Just FYI, I bet you meant to write `!(1..16).cover?(channel)` in the following line.
https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/d73c3fe80014cb91d8b6c64847581feb8a19d1b6/bin/midi_roll.rb#L42
Thank goodness for us, because it resulted in the discovery of a bug in Ruby :-)
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112638
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121583] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (7 preceding siblings ...)
2025-04-08 8:32 ` [ruby-core:121582] " mame (Yusuke Endoh) via ruby-core
@ 2025-04-08 8:47 ` mame (Yusuke Endoh) via ruby-core
2025-04-08 9:23 ` [ruby-core:121584] " byroot (Jean Boussier) via ruby-core
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-04-08 8:47 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21220 has been updated by mame (Yusuke Endoh).
byroot (Jean Boussier) wrote in #note-7:
> Can source line ever legitimately be `0`?
With `eval`, it can. It can be even negative.
```
$ ruby -e 'eval("p __LINE__", nil, "foo", -10000)'
-10000
```
And when I did this under `Coverage.start(eval: true)`, it does [BUG] :-P
```ruby
require 'coverage'
Coverage.start(eval: true, lines: true)
eval(<<END, nil, "foo", -1)
p
p
END
```
```
$ ~/work/ruby/local/bin/ruby test.rb
foo:-1: [BUG] fail
ruby 3.5.0dev (2025-04-06T03:24:23Z master e25889951f) +PRISM [x86_64-linux]
```
We can prevent `RUBY_EVENT_COVERAGE_LINE` from firing in contexts where lineno is negative or zero, but that doesn't happen very often.
So I guess I should merge the proposed patch as a precaution?
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112639
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121584] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (8 preceding siblings ...)
2025-04-08 8:47 ` [ruby-core:121583] " mame (Yusuke Endoh) via ruby-core
@ 2025-04-08 9:23 ` byroot (Jean Boussier) via ruby-core
2025-04-09 3:48 ` [ruby-core:121599] " mbcodeandsound (Mike Bourgeous) via ruby-core
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-04-08 9:23 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21220 has been updated by byroot (Jean Boussier).
> So I guess I should merge the proposed patch as a precaution?
I think so yes, with that eval code as a test case.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112640
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121599] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (9 preceding siblings ...)
2025-04-08 9:23 ` [ruby-core:121584] " byroot (Jean Boussier) via ruby-core
@ 2025-04-09 3:48 ` mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-09 11:01 ` [ruby-core:121604] " mame (Yusuke Endoh) via ruby-core
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mbcodeandsound (Mike Bourgeous) via ruby-core @ 2025-04-09 3:48 UTC (permalink / raw)
To: ruby-core; +Cc: mbcodeandsound (Mike Bourgeous)
Issue #21220 has been updated by mbcodeandsound (Mike Bourgeous).
mame (Yusuke Endoh) wrote in #note-8:
> @mbcodeandsound Just FYI, I bet you meant to write `!(1..16).cover?(channel)` in the following line.
>
>
> https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/d73c3fe80014cb91d8b6c64847581feb8a19d1b6/bin/midi_roll.rb#L42
>
>
> Thank goodness for us, because it resulted in the discovery of a bug in Ruby :-)
Yes I did, thanks! I didn't even know about the flip-flop operator, or its controversial almost removal, until today. I also did not expect to encounter a parser bug on top of a coverage-related bug. I'm glad my program's bug uncovered some bugs in Ruby either way!
Ruby continues to be my favorite language. Thanks for what you do!
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112651
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121604] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (10 preceding siblings ...)
2025-04-09 3:48 ` [ruby-core:121599] " mbcodeandsound (Mike Bourgeous) via ruby-core
@ 2025-04-09 11:01 ` mame (Yusuke Endoh) via ruby-core
2025-04-09 11:11 ` [ruby-core:121606] " byroot (Jean Boussier) via ruby-core
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2025-04-09 11:01 UTC (permalink / raw)
To: ruby-core; +Cc: mame (Yusuke Endoh)
Issue #21220 has been updated by mame (Yusuke Endoh).
I created a PR.
https://github.com/ruby/ruby/pull/13089
The PR should prevent `RUBY_EVENT_COVERAGE_LINE` from firing on non-positive line numbers. (The fix looked easier than I thought.)
I will create another ticket for the bug of the prism compiler.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112655
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* Assignee: prism
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121606] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (11 preceding siblings ...)
2025-04-09 11:01 ` [ruby-core:121604] " mame (Yusuke Endoh) via ruby-core
@ 2025-04-09 11:11 ` byroot (Jean Boussier) via ruby-core
2025-04-14 7:26 ` [ruby-core:121652] " k0kubun (Takashi Kokubun) via ruby-core
2025-05-17 6:27 ` [ruby-core:122163] " nagachika (Tomoyuki Chikanaga) via ruby-core
14 siblings, 0 replies; 16+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-04-09 11:11 UTC (permalink / raw)
To: ruby-core; +Cc: byroot (Jean Boussier)
Issue #21220 has been updated by byroot (Jean Boussier).
Assignee deleted (prism)
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112659
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Assigned
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:121652] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (12 preceding siblings ...)
2025-04-09 11:11 ` [ruby-core:121606] " byroot (Jean Boussier) via ruby-core
@ 2025-04-14 7:26 ` k0kubun (Takashi Kokubun) via ruby-core
2025-05-17 6:27 ` [ruby-core:122163] " nagachika (Tomoyuki Chikanaga) via ruby-core
14 siblings, 0 replies; 16+ messages in thread
From: k0kubun (Takashi Kokubun) via ruby-core @ 2025-04-14 7:26 UTC (permalink / raw)
To: ruby-core; +Cc: k0kubun (Takashi Kokubun)
Issue #21220 has been updated by k0kubun (Takashi Kokubun).
Backport changed from 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: WONTFIX, 3.3: REQUIRED, 3.4: DONE
ruby_3_4 commit:1b6f9cd6ea5715ad33d8ba4c78382aa8aafdbcd4 merged revision(s) commit:0d6263bd416338a339651fb97fe4d62701704c4b.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-112706
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Closed
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: DONE
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* [ruby-core:122163] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1]
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
` (13 preceding siblings ...)
2025-04-14 7:26 ` [ruby-core:121652] " k0kubun (Takashi Kokubun) via ruby-core
@ 2025-05-17 6:27 ` nagachika (Tomoyuki Chikanaga) via ruby-core
14 siblings, 0 replies; 16+ messages in thread
From: nagachika (Tomoyuki Chikanaga) via ruby-core @ 2025-05-17 6:27 UTC (permalink / raw)
To: ruby-core; +Cc: nagachika (Tomoyuki Chikanaga)
Issue #21220 has been updated by nagachika (Tomoyuki Chikanaga).
Backport changed from 3.2: WONTFIX, 3.3: REQUIRED, 3.4: DONE to 3.2: WONTFIX, 3.3: DONE, 3.4: DONE
ruby_3_3 commit:b1b6752fbeb2d23dbea639bd4b331c9e8b56f49c merged revision(s) commit:0d6263bd416338a339651fb97fe4d62701704c4b.
----------------------------------------
Bug #21220: Memory corruption in update_line_coverage() [write at index -1]
https://bugs.ruby-lang.org/issues/21220#change-113313
* Author: mbcodeandsound (Mike Bourgeous)
* Status: Closed
* ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
* Backport: 3.2: WONTFIX, 3.3: DONE, 3.4: DONE
----------------------------------------
Hello!
I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36
### Summary
`update_line_coverage()` calls `rb_sourceline()`, subtracts one from its return value, and uses this as an index into an Array. Sometimes `rb_sourceline()` returns 0, and when this happens, `update_line_coverage()` will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.
As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().
### Symptom
On Linux, affected processes print one of the following errors and exit:
```
munmap_chunk(): invalid pointer
Aborted (core dumped)
```
or, if preloading libc_malloc_debug.so
```
malloc_check_get_size: memory corruption
Aborted (core dumped)
```
### Reproduction
I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.
The issue only reproduces locally if the `coverage/` directory has a large `.resultset.json`.
- **Repo:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug
- **Example of the bug:** https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/actions/runs/14289657889/job/40049195631#step:5:176
``` shell
# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done
```
### Research and reasoning
I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.
After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote [a GDB script to automate information gathering](https://github.com/mike-bourgeous/reproduce-simplecov-ruby34-bug/blob/master/gdb_ruby_backtrace.gdb) when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.
I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.
Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36
---Files--------------------------------
corruption_c_stack.txt (2.63 KB)
corruption_ruby_stack.txt (948 Bytes)
crash_ruby_stack.txt (4.46 KB)
crash_c_stack.txt (26.2 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-05-17 6:27 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-07 17:27 [ruby-core:121556] [Ruby Bug#21220] Memory corruption in update_line_coverage() [write at index -1] mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-07 17:33 ` [ruby-core:121557] " mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-08 7:00 ` [ruby-core:121573] " byroot (Jean Boussier) via ruby-core
2025-04-08 7:41 ` [ruby-core:121577] " mame (Yusuke Endoh) via ruby-core
2025-04-08 7:43 ` [ruby-core:121578] " byroot (Jean Boussier) via ruby-core
2025-04-08 7:50 ` [ruby-core:121579] " byroot (Jean Boussier) via ruby-core
2025-04-08 8:25 ` [ruby-core:121580] " mame (Yusuke Endoh) via ruby-core
2025-04-08 8:28 ` [ruby-core:121581] " byroot (Jean Boussier) via ruby-core
2025-04-08 8:32 ` [ruby-core:121582] " mame (Yusuke Endoh) via ruby-core
2025-04-08 8:47 ` [ruby-core:121583] " mame (Yusuke Endoh) via ruby-core
2025-04-08 9:23 ` [ruby-core:121584] " byroot (Jean Boussier) via ruby-core
2025-04-09 3:48 ` [ruby-core:121599] " mbcodeandsound (Mike Bourgeous) via ruby-core
2025-04-09 11:01 ` [ruby-core:121604] " mame (Yusuke Endoh) via ruby-core
2025-04-09 11:11 ` [ruby-core:121606] " byroot (Jean Boussier) via ruby-core
2025-04-14 7:26 ` [ruby-core:121652] " k0kubun (Takashi Kokubun) via ruby-core
2025-05-17 6:27 ` [ruby-core:122163] " nagachika (Tomoyuki Chikanaga) via ruby-core
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).