ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
@ 2024-10-25 22:05 adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: adrienjarthon (Adrien Jarthon) via ruby-core @ 2024-10-25 22:05 UTC (permalink / raw)
  To: ruby-core; +Cc: adrienjarthon (Adrien Jarthon)

Issue #20816 has been reported by adrienjarthon (Adrien Jarthon).

----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:119626] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
@ 2024-10-26  9:28 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-10-26  9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core
  2 siblings, 0 replies; 4+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-10-26  9:28 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20816 has been updated by nobu (Nobuyoshi Nakada).


The result on my machine (Apple M2 Pro, 12 processors).
Note that N=5_000_000, a half of your example.

|Threads\Ruby| 3.1.4  | 3.2.5  | 3.3.5  | 3.4.0  |
|-----------:|-------:|-------:|-------:|-------:|
|          1 |  8.818 |  5.628 |  5.943 |  6.066 |
|          2 | 10.389 |  6.920 | 11.252 | 12.315 |
|          4 | 18.643 | 15.347 | 20.285 | 20.184 |
|          8 | 18.766 | 14.994 | 20.155 | 19.923 |
|         16 | 17.530 | 13.459 | 19.988 | 19.705 |



----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-110248

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:119628] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-10-26  9:50 ` adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core
  2 siblings, 0 replies; 4+ messages in thread
From: adrienjarthon (Adrien Jarthon) via ruby-core @ 2024-10-26  9:50 UTC (permalink / raw)
  To: ruby-core; +Cc: adrienjarthon (Adrien Jarthon)

Issue #20816 has been updated by adrienjarthon (Adrien Jarthon).


Thanks, these numbers are slightly more "expected" than mine (Ubuntu 24.04), especially for 2 threads, but other that that it looks like the "regression" is also visible on MacOS.
For the record I also tested with ruby master (3.4.0 as of 2024-10-25) but the results were similar to 3.3.5 (like for you).


----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-110250

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:119629] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
  2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
  2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
  2024-10-26  9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
@ 2024-10-26 10:16 ` byroot (Jean Boussier) via ruby-core
  2 siblings, 0 replies; 4+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-10-26 10:16 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20816 has been updated by byroot (Jean Boussier).

File Capture d’écran 2024-10-26 à 12.10.55.png added
File Capture d’écran 2024-10-26 à 12.10.45.png added
File Capture d’écran 2024-10-26 à 12.14.51.png added
File Capture d’écran 2024-10-26 à 12.14.44.png added
Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED

I can also repro on macOS. Comparing profiles between 3.2 and 3.3, in both case the overwhelming majority of the time is spent in the thread scheduler.

The one big difference I can see is that 3.2 seem to use `poll` to wait on threads, while 3.3 uses `pthread_cond_wait`.

----------------------------------------
Bug #20816: Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading.
https://bugs.ruby-lang.org/issues/20816#change-110251

* Author: adrienjarthon (Adrien Jarthon)
* Status: Open
* ruby -v: 3.3.5
* Backport: 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
----------------------------------------
Ok I'm sorry in advance if this is not really a regression (maybe it's the cost to pay for other improvements) but it still showed as a 30% performance drop in my benchmark so I think it may be worth a look.

For the full story about how I came down this rabbit hole I just wrote a blog post to avoid cluttering the issue: https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit (please read it first)

Here I'm gonna put my last findings and the smallest reproducible example I was able to write.

So basically I noticed that quick syscalls (at least with a good SSD) like `File.exists?` or `File.mtime`, when used in a multi-threaded environement (Sidekiq in my initial testing but I reduced it to a few Thread.new) are significantly slower in 3.3.x (tested with the same results on 3.3.0, 3.3.2 and 3.3.5 for the moment) in comparison with 3.1.2 or 3.2.5: about twice as slow.

These syscalls are heavily used by the Rails Reloader which is how it had such a huge impact (30%) in my testing.

Here is the small reproduction script I wrote:

``` ruby
#!/usr/bin/env ruby

GC.disable # just to make sure it doesn't skew results
THREADS = (ARGV.first || 10).to_i  # choose thread count from CLI
N = 10_000_000 # to make the test longer or shorter

puts "Ruby #{RUBY_VERSION} / YJIT: #{RubyVM::YJIT.enabled?} / #{THREADS} Threads"

Array.new(THREADS).map do
  Thread.new do
    (N/THREADS).times { File.mtime(__FILE__) }
  end
end.each(&:join)

puts "#{N} calls completed"
```

The goal is just to run a lot of fast syscalls, in various number of threads, and here are the numbers on my machine:
|Threads\Ruby|3.1.2 |3.3.5 |
|--|--|--|
|1 | 13.7s | ✅ 14.2s |
|2 | 9.7s | ⚠️ 33.4s |
|4 | 23.2s | ⚠️ 37.0s |
|8 | 21.8s | ⚠️ 39.7s |
|16 | 23.1s | ⚠️ 41.5s |

(more numbers in the [blog post](https://updown.io/blog/weird-results-comparing-ruby-3-1-3-2-3-3-with-jemalloc-and-yjit))

I suspected the [new N:M thread scheduler introduced in 3.3](https://github.com/ruby/ruby/pull/8629), even though it's "disabled" by default on the main Ractor, thinking maybe these changes degraded a bit this (quite specific scenario). So I gave a quick try with the feature ENABLED, to see if it would at least change anything and it did ! (though not was I expected):

|Threads\Ruby|3.1.2 |3.3.5 | 3.3.5 with RUBY_MN_THREADS=1 |
|--|--|--|--|
|1 | 13.7s | 14.2s | 14.5s ✅ |
|2 | 9.7s | ⚠️ 33.4s | 12.3s 🎉 |
|4 | 23.2s | ⚠️ 37.0s | 40.5s 🧐 |
|8 | 21.8s | ⚠️ 39.7s | 37.9s 🧐 |
|16 | 23.1s | ⚠️ 41.5s | 36.3s 🧐 |


(I can make graphs for these numbers if it helps)

Now I'm a bit stuck in my investigation, I'm not sure what change in 3.3 caused this (the new thread scheduler is a good lead IMO) or if it even is a regression. Maybe this downside is innevitable for other benefits? I am happy to contribute more time to this issue and perform more tests but I would need some guidance on what could be interesting to test, from people who understand a bit better than me how this works internally.

Thanks ! (and sorry for the long read)

---Files--------------------------------
Capture d’écran 2024-10-26 à 12.10.45.png (133 KB)
Capture d’écran 2024-10-26 à 12.10.55.png (109 KB)
Capture d’écran 2024-10-26 à 12.14.44.png (127 KB)
Capture d’écran 2024-10-26 à 12.14.51.png (139 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-26 10:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-25 22:05 [ruby-core:119621] [Ruby master Bug#20816] Potential regression in Ruby 3.3.x (compared with 3.1 and 3.2) regarding fast syscalls and multi-threading adrienjarthon (Adrien Jarthon) via ruby-core
2024-10-26  9:28 ` [ruby-core:119626] " nobu (Nobuyoshi Nakada) via ruby-core
2024-10-26  9:50 ` [ruby-core:119628] " adrienjarthon (Adrien Jarthon) via ruby-core
2024-10-26 10:16 ` [ruby-core:119629] " byroot (Jean Boussier) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).