ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
@ 2024-10-25 22:05 adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: adrienjarthon (Adrien Jarthon) via ruby-core @ 2024-10-25 22:05 UTC (permalink / raw)
  To: ruby-core; +Cc: adrienjarthon (Adrien Jarthon)

Issue #20816 has been reported by adrienjarthon (Adrien Jarthon).

----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:119626] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
@ 2024-10-26  9:28 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-10-26  9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-10-26  9:28 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20816 has been updated by nobu (Nobuyoshi Nakada).


The result on my machine (Apple M2 Pro, 12 processors).
Note that N=5_000_000, a half of your example.

|Threads\Ruby| 3.1.4  | 3.2.5  | 3.3.5  | 3.4.0  |
|-----------:|-------:|-------:|-------:|-------:|
|          1 |  8.818 |  5.628 |  5.943 |  6.066 |
|          2 | 10.389 |  6.920 | 11.252 | 12.315 |
|          4 | 18.643 | 15.347 | 20.285 | 20.184 |
|          8 | 18.766 | 14.994 | 20.155 | 19.923 |
|         16 | 17.530 | 13.459 | 19.988 | 19.705 |



----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-110248

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:119628] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-10-26  9:50 ` adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: adrienjarthon (Adrien Jarthon) via ruby-core @ 2024-10-26  9:50 UTC (permalink / raw)
  To: ruby-core; +Cc: adrienjarthon (Adrien Jarthon)

Issue #20816 has been updated by adrienjarthon (Adrien Jarthon).


Thanks, these numbers are slightly more "expected" than mine (Ubuntu 24.04), especially for 2 threads, but other that that it looks like the "regression" is also visible on MacOS.
For the record I also tested with ruby master (3.4.0 as of 2024-10-25) but the results were similar to 3.3.5 (like for you).


----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-110250

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:119629] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
  2024-10-26  9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
@ 2024-10-26 10:16 ` byroot (Jean Boussier) via ruby-core
  2025-01-13 22:37 ` [ruby-core:120641] " luke-gru (Luke Gruber) via ruby-core
  2025-01-14  8:27 ` [ruby-core:120659] " byroot (Jean Boussier) via ruby-core
  4 siblings, 0 replies; 6+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-10-26 10:16 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20816 has been updated by byroot (Jean Boussier).

File Capture d’écran 2024-10-26 à 12.10.55.png added
File Capture d’écran 2024-10-26 à 12.10.45.png added
File Capture d’écran 2024-10-26 à 12.14.51.png added
File Capture d’écran 2024-10-26 à 12.14.44.png added
Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

I can also repro on macOS. Comparing profiles between 3.2 and 3.3, in both case the overwhelming majority of the time is spent in the thread scheduler.

The one big difference I can see is that 3.2 seem to use `poll` to wait on threads, while 3.3 uses `pthread_cond_wait`.

----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-110251

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)

---Files--------------------------------
Capture d’écran 2024-10-26 à 12.10.45.png (133 KB)
Capture d’écran 2024-10-26 à 12.10.55.png (109 KB)
Capture d’écran 2024-10-26 à 12.14.44.png (127 KB)
Capture d’écran 2024-10-26 à 12.14.51.png (139 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:120641] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
                   ` (2 preceding siblings ...)
  2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core
@ 2025-01-13 22:37 ` luke-gru (Luke Gruber) via ruby-core
  2025-01-14  8:27 ` [ruby-core:120659] " byroot (Jean Boussier) via ruby-core
  4 siblings, 0 replies; 6+ messages in thread
From: luke-gru (Luke Gruber) via ruby-core @ 2025-01-13 22:37 UTC (permalink / raw)
  To: ruby-core; +Cc: luke-gru (Luke Gruber)

Issue #20816 has been updated by luke-gru (Luke Gruber).


I think this is due to the refactor for M:N threads. In the 3.2 release, which is fast even though it still uses condition variables in `thread_sched_to_running_common`, the changing of threads was never forced during a blocking operation.

That is, in BLOCKING_REGION (`thread.c`), it goes:

1. `thread_sched_to_waiting(th)` -- this releases per-ractor GVL (`TH_SCHED(th)->running` changes from current ruby thread (`GET_THREAD()`) to NULL). Then, signal the next ruby thread in this ractor's readyq, if any.
2. execute blocking operation, like `File.mtime`
3. `thread_sched_to_running(th)` -- Blocks until another thread signals it if another thread is running and has the GVL, otherwise continues without blocking

Lots of times, since #2 can be so fast, when we get to #3, `sched->running` is still NULL from #1 because the other thread we signalled hasn't woken up yet and taken control of `sched->running`. The OS scheduler hasn't switched, so there are less context switches. I see this in some custom printf logging I made, where 1 thread runs many iterations of the syscall, then another thread runs many, etc.

In the new system,

1. Similar to before, but if there's a next thread ready (in readyq) we actually force it to hold the GVL (we set `sched->running = next_th`), then we signal it's condvar telling it to run
2. Execute blocking operation
3. Put ourselves in the back of the readyq, and wait until we're signalled

So in summary, there is more context switching, which is the main slowdown I believe. There is also more overhead now due to timeslice bookkeeping which I think can be reduced.

The new ractor/thread scheduling system is more complex than the old one, but I think we can fix this issue. I read that Go does some fancy things with fast syscalls where if a known-fast syscall is made, the goroutine scheduler doesn't signal the next goroutine but posts to a monitoring thread with a small timeout value and if this value is exceeded by the syscall then the next goroutine is scheduled, otherwise it never signals the next goroutine at all, takes itself off the monitoring thread and continues. There's some information here about that: https://utcc.utoronto.ca/~cks/space/blog/programming/GoSchedulerAndSyscalls.

We could do something similar to Go or try to incorporate more of how it used to work into the new system. I think doing something similar to Go makes sense with how the new scheduler works.


----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-111470

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)

---Files--------------------------------
Capture d’écran 2024-10-26 à 12.10.45.png (133 KB)
Capture d’écran 2024-10-26 à 12.10.55.png (109 KB)
Capture d’écran 2024-10-26 à 12.14.44.png (127 KB)
Capture d’écran 2024-10-26 à 12.14.51.png (139 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:120659] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
                   ` (3 preceding siblings ...)
  2025-01-13 22:37 ` [ruby-core:120641] " luke-gru (Luke Gruber) via ruby-core
@ 2025-01-14  8:27 ` byroot (Jean Boussier) via ruby-core
  4 siblings, 0 replies; 6+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-01-14  8:27 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20816 has been updated by byroot (Jean Boussier).

Assignee set to ko1 (Koichi Sasada)

@ko1 would input would be welcome on this issue.

----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-111485

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* Assignee: ko1 (Koichi Sasada)
* ruby -v: 3.3.5
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)

---Files--------------------------------
Capture d’écran 2024-10-26 à 12.10.45.png (133 KB)
Capture d’écran 2024-10-26 à 12.10.55.png (109 KB)
Capture d’écran 2024-10-26 à 12.14.44.png (127 KB)
Capture d’écran 2024-10-26 à 12.14.51.png (139 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-14  8:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
2024-10-26  9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core
2025-01-13 22:37 ` [ruby-core:120641] " luke-gru (Luke Gruber) via ruby-core
2025-01-14  8:27 ` [ruby-core:120659] " byroot (Jean Boussier) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).