From: "luke-gru (Luke Gruber) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "luke-gru (Luke Gruber)" <noreply@ruby-lang.org>
Subject: [ruby-core:120641] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
Date: Mon, 13 Jan 2025 22:37:08 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-111470.20250113223708.13774@ruby-lang.org> (raw)
In-Reply-To: <redmine.issue-20816.20241025220558.13774@ruby-lang.org>
Issue #20816 has been updated by luke-gru (Luke Gruber).
I think this is due to the refactor for M:N threads. In the 3.2 release, which is fast even though it still uses condition variables in `thread_sched_to_running_common`, the changing of threads was never forced during a blocking operation.
That is, in BLOCKING_REGION (`thread.c`), it goes:
1. `thread_sched_to_waiting(th)` -- this releases per-ractor GVL (`TH_SCHED(th)->running` changes from current ruby thread (`GET_THREAD()`) to NULL). Then, signal the next ruby thread in this ractor's readyq, if any.
2. execute blocking operation, like `File.mtime`
3. `thread_sched_to_running(th)` -- Blocks until another thread signals it if another thread is running and has the GVL, otherwise continues without blocking
Lots of times, since #2 can be so fast, when we get to #3, `sched->running` is still NULL from #1 because the other thread we signalled hasn't woken up yet and taken control of `sched->running`. The OS scheduler hasn't switched, so there are less context switches. I see this in some custom printf logging I made, where 1 thread runs many iterations of the syscall, then another thread runs many, etc.
In the new system,
1. Similar to before, but if there's a next thread ready (in readyq) we actually force it to hold the GVL (we set `sched->running = next_th`), then we signal it's condvar telling it to run
2. Execute blocking operation
3. Put ourselves in the back of the readyq, and wait until we're signalled
So in summary, there is more context switching, which is the main slowdown I believe. There is also more overhead now due to timeslice bookkeeping which I think can be reduced.
The new ractor/thread scheduling system is more complex than the old one, but I think we can fix this issue. I read that Go does some fancy things with fast syscalls where if a known-fast syscall is made, the goroutine scheduler doesn't signal the next goroutine but posts to a monitoring thread with a small timeout value and if this value is exceeded by the syscall then the next goroutine is scheduled, otherwise it never signals the next goroutine at all, takes itself off the monitoring thread and continues. There's some information here about that: https://utcc.utoronto.ca/~cks/space/blog/programming/GoSchedulerAndSyscalls.
We could do something similar to Go or try to incorporate more of how it used to work into the new system. I think doing something similar to Go makes sense with how the new scheduler works.
----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-111470
* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.
For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)
Here I'm gonna put my last findings and the smallest reproducible example I was able to write.
So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.
These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.
Here is the small reproduction script I wrote:
``` ruby
#!/usr/bin/env ruby
GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter
puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"
Array.new(THREADS).map do
Thread.new do
(N/THREADS).times { File.mtime(__FILE__) }
end
end.each(&:join)
puts "#{N} calls completed"
```
The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |
(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))
I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):
|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |
(I can make graphs for these numbers if it helps)
Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.
Thanks ! (and sorry for the long read)
---Files--------------------------------
Capture d’écran 2024-10-26 à 12.10.45.png (133 KB)
Capture d’écran 2024-10-26 à 12.10.55.png (109 KB)
Capture d’écran 2024-10-26 à 12.14.44.png (127 KB)
Capture d’écran 2024-10-26 à 12.14.51.png (139 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
next prev parent reply other threads:[~2025-01-13 22:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 22:05 [ruby-core:119621] " adrienjarthon (Adrien Jarthon) via ruby-core
2024-10-26 9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
2024-10-26 9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core
2025-01-13 22:37 ` luke-gru (Luke Gruber) via ruby-core [this message]
2025-01-14 8:27 ` [ruby-core:120659] " byroot (Jean Boussier) via ruby-core
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-111470.20250113223708.13774@ruby-lang.org \
--to=ruby-core@ml.ruby-lang.org \
--cc=noreply@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).