ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL
@ 2024-06-19 10:16 ivoanjo (Ivo Anjo) via ruby-core
  2024-07-08 18:56 ` [ruby-core:118494] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: ivoanjo (Ivo Anjo) via ruby-core @ 2024-06-19 10:16 UTC (permalink / raw)
  To: ruby-core; +Cc: ivoanjo (Ivo Anjo)

Issue #20587 has been reported by ivoanjo (Ivo Anjo).

----------------------------------------
Bug #20587: dir.c calls blocking system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, `opendir()`, etc) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:118494] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system calls while holding the GVL
  2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
@ 2024-07-08 18:56 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-09 13:21 ` [ruby-core:118521] " ivoanjo (Ivo Anjo) via ruby-core
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-07-08 18:56 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20587 has been updated by jeremyevans0 (Jeremy Evans).


@ivoanjo Can you be specific about which places this affects?  I examined all `opendir` calls in `dir.c`:

* Called in `nogvl_opendir`
* Called in `opendir_without_gvl` when VM is not initialized
* Called in `nogvl_opendir_at`
* Called in `nogvl_dir_empty_p`

Tracing those function calls, if the VM is initialized, it appears that all calls use the `IO_WITHOUT_GVL` macro to release the GVL around the call.

----------------------------------------
Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587#change-109017

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, `opendir()`, etc) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:118521] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system calls while holding the GVL
  2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
  2024-07-08 18:56 ` [ruby-core:118494] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-07-09 13:21 ` ivoanjo (Ivo Anjo) via ruby-core
  2024-07-11  3:05 ` [ruby-core:118541] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ivoanjo (Ivo Anjo) via ruby-core @ 2024-07-09 13:21 UTC (permalink / raw)
  To: ruby-core; +Cc: ivoanjo (Ivo Anjo)

Issue #20587 has been updated by ivoanjo (Ivo Anjo).


jeremyevans0 (Jeremy Evans) wrote in #note-2:
> @ivoanjo Can you be specific about which places this affects?  I examined all `opendir` calls in `dir.c`:
> 
> * Called in `nogvl_opendir`
> * Called in `opendir_without_gvl` when VM is not initialized
> * Called in `nogvl_opendir_at`
> * Called in `nogvl_dir_empty_p`
> 
> Tracing those function calls, if the VM is initialized, it appears that all calls use the `IO_WITHOUT_GVL` macro to release the GVL around the call.

Hey Jeremy, thanks for taking a look at this!

You are right -- I think I was overeager in including `readdir` on the list. I'll update my original description to correct for this.

Here's my attempt at a more specific report of blocking calls while holding the GVL in `dir.c` (I've omitted cases where the GVL is released):

* `fdopendir` -> called by `dir_s_for_fd` 
* `readdir` -> called by `dir_read`, `dir_each_entry`, `glob_helper`, `glob_getent`
* `chdir` -> called by `dir_chdir0` (Although this one is somewhat weird already so perhaps it being blocking is not the weirdest part?)
* `fchdir` -> called by `dir_fchdir` (same as `chrdir`)
* `chroot` -> called by `dir_s_chroot`
* `lstat` -> called by `do_lstat`
* `stat` -> called by `do_stat`
* `fstatat` -> called by `do_stat`/`do_lstat`
* `fgetattrlist` -> called by `need_normalization`, `is_case_sensitive`
* `getattrlist` -> called by `is_case_sensitive`, `replace_real_basename`, `rb_dir_s_empty_p`, `need_normalization`, `dir_initialize`
* `getpwnam` -> called by `dir_s_home`

----------------------------------------
Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587#change-109041

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, `opendir()`, etc) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:118541] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system calls while holding the GVL
  2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
  2024-07-08 18:56 ` [ruby-core:118494] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system " jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-09 13:21 ` [ruby-core:118521] " ivoanjo (Ivo Anjo) via ruby-core
@ 2024-07-11  3:05 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-12 11:33 ` [ruby-core:118583] " ivoanjo (Ivo Anjo) via ruby-core
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-07-11  3:05 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20587 has been updated by jeremyevans0 (Jeremy Evans).


@ivoanjo Thank you for your analysis.  I worked on a pull request that handles most of the system calls you listed: https://github.com/ruby/ruby/pull/11147

It doesn't handle `getattrlist` or `fgetattrlist`, because I wasn't able to test with those (they appear to be Mac OS only).

Could you test that pull request and see if it addresses the issues you were seeing?  If so and it fixes the issues, I can work on `getattrlist` and `fgetattrlist` and rely on CI to check that it works on Mac OS.  I can also work on `getpwnam` and other system calls in `process.c` and `ext/etc/etc.c` that are mentioned in the pull request.

----------------------------------------
Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587#change-109060

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, ~~`opendir()`~~, ... see comments for more detailed list...) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:118583] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system calls while holding the GVL
  2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
                   ` (2 preceding siblings ...)
  2024-07-11  3:05 ` [ruby-core:118541] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-07-12 11:33 ` ivoanjo (Ivo Anjo) via ruby-core
  2024-07-14 19:07 ` [ruby-core:118591] " jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-16 17:37 ` [ruby-core:118619] " jeremyevans0 (Jeremy Evans) via ruby-core
  5 siblings, 0 replies; 7+ messages in thread
From: ivoanjo (Ivo Anjo) via ruby-core @ 2024-07-12 11:33 UTC (permalink / raw)
  To: ruby-core; +Cc: ivoanjo (Ivo Anjo)

Issue #20587 has been updated by ivoanjo (Ivo Anjo).

File repro.rb added
File hello.c added

Nice!

I've tried the PR and it actually makes the issue in https://bugs.ruby-lang.org/issues/20586#How-to-reproduce disappear in my tests.

(It does not fully solve the issue, but what made it trigger a lot more often was that the `readdir` call blocked with the GVL held, since the Datadog profiler tries to interrupt only threads holding the GVL. With the GVL released, this can only happen in the remote chance the profiler decided to interrupt the thread just as it was preparing to call `readdir`).

I also decided recover an old experiment I had to try to trigger some of these problems using a custom "adversarial" FUSE filesystem. Here's my script to trigger the syscalls:

```ruby
$liveness = true
$looping = false
liveness_check = Thread.new do
  print "Is Ruby alive? (Should print every 0.1s) "
  while $liveness do
    print '.'
    before_sleep = Time.now
    $looping = true
    sleep 0.1
    delay = Time.now - before_sleep
    puts "\n[HICCUP -- VM was probably blocked for #{delay - 0.1}s!]" if delay > 0.5
  end
end
Thread.pass while !$looping

directory = ARGV.last
result = case ARGV.first.to_sym
when :readdir
  d = Dir.new(directory)
  print "(Opened!)"
  d.children
when :fdopendir
  # This one doesn't seem to work as a reproducer, the opendir for d0 seems to get reused by libc and that one already
  # released the GVL
  raise 'Broken?'
  d0 = Dir.new(directory)
  d1 = Dir.for_fd(d0.fileno)
  d1.to_s
when :chdir
  Dir.chdir(directory) { }
when :fchdir
  d = Dir.new(directory)
  print "(Opened!)"
  d.chdir { }
when :chroot
  begin
    Dir.chroot(directory)
  rescue => e
    e
  end
when :lstat
  Dir.glob("#{directory}/*", File::FNM_DOTMATCH)
when :stat
  Dir.glob("#{directory}/", File::FNM_DOTMATCH)
when :fstatat
  raise 'Did not figure out this one?'
else
  raise "Unknown syscall #{ARGV.first}"
end

$liveness = false
liveness_check.join

puts "\n#{result.inspect}"
```

...and here's how it looks before your PR:

```
$ ./miniruby repro.rb readdir /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ........................................(Opened!)
[HICCUP -- VM was probably blocked for 1.9858527449999999s!]

["hello"]
$ ./miniruby repro.rb chdir /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) .
[HICCUP -- VM was probably blocked for 1.9008266639999998s!]

nil
$ ./miniruby repro.rb fchdir /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ........................................(Opened!)
[HICCUP -- VM was probably blocked for 1.987699879s!]

nil
$ ./miniruby repro.rb chroot /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) .
[HICCUP -- VM was probably blocked for 1.9008783179999997s!]

#<Errno::EPERM: Operation not permitted @ dir_s_chroot - /home/ivo.anjo/tmp-hello-mount-deleteme>
$ ./miniruby repro.rb lstat /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ........................................
[HICCUP -- VM was probably blocked for 3.9877458889999997s!]

["/home/ivo.anjo/tmp-hello-mount-deleteme/.", "/home/ivo.anjo/tmp-hello-mount-deleteme/hello"]
$ ./miniruby repro.rb stat /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) .
[HICCUP -- VM was probably blocked for 1.9009780909999998s!]

["/home/ivo.anjo/tmp-hello-mount-deleteme/"]
```

...here's how it looks after your PR:

```
$ ./miniruby repro.rb readdir /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ........................................(Opened!)....................
["hello"]
$ ./miniruby repro.rb chdir /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ....................
nil
$ ./miniruby repro.rb fchdir /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ........................................(Opened!)....................
nil
$ ./miniruby repro.rb chroot /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ....................
#<Errno::EPERM: Operation not permitted @ dir_s_chroot - /home/ivo.anjo/tmp-hello-mount-deleteme>
$ ./miniruby repro.rb lstat /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ................................................................................
["/home/ivo.anjo/tmp-hello-mount-deleteme/.", "/home/ivo.anjo/tmp-hello-mount-deleteme/hello"]
$ ./miniruby repro.rb stat /home/ivo.anjo/tmp-hello-mount-deleteme
Is Ruby alive? (Should print every 0.1s) ....................
["/home/ivo.anjo/tmp-hello-mount-deleteme/"]
```

I'm also attaching the "adversarial" FUSE filesystem -- I basically added a bunch of sleeps to https://github.com/libfuse/libfuse/blob/master/example/hello.c (my version attached), built the examples and mounted the filesystem.

Hopefully this helps, and thanks again for picking this one up :))

----------------------------------------
Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587#change-109106

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, ~~`opendir()`~~, ... see comments for more detailed list...) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.

---Files--------------------------------
repro.rb (1.18 KB)
hello.c (4.89 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:118591] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system calls while holding the GVL
  2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
                   ` (3 preceding siblings ...)
  2024-07-12 11:33 ` [ruby-core:118583] " ivoanjo (Ivo Anjo) via ruby-core
@ 2024-07-14 19:07 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-16 17:37 ` [ruby-core:118619] " jeremyevans0 (Jeremy Evans) via ruby-core
  5 siblings, 0 replies; 7+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-07-14 19:07 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20587 has been updated by jeremyevans0 (Jeremy Evans).


I merged https://github.com/ruby/ruby/pull/11147 .  I'll work on `getattrlist` and `fgetattrlist` next, and then I'll work on similar GVL-releasing work in `process.c` and `ext/ext/etc.c`.

----------------------------------------
Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587#change-109118

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, ~~`opendir()`~~, ... see comments for more detailed list...) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.

---Files--------------------------------
repro.rb (1.18 KB)
hello.c (4.89 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [ruby-core:118619] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system calls while holding the GVL
  2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
                   ` (4 preceding siblings ...)
  2024-07-14 19:07 ` [ruby-core:118591] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-07-16 17:37 ` jeremyevans0 (Jeremy Evans) via ruby-core
  5 siblings, 0 replies; 7+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-07-16 17:37 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20587 has been updated by jeremyevans0 (Jeremy Evans).


I submitted https://github.com/ruby/ruby/pull/11176 for the `getattrlist` and `fgetattrlist` changes.

----------------------------------------
Bug #20587: dir.c calls blocking filesystem APIs/system calls while holding the GVL
https://bugs.ruby-lang.org/issues/20587#change-109147

* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.

While I was investigating https://bugs.ruby-lang.org/issues/20586, I spotted that there's a number of cases where, in `dir.c`, blocking system calls are being made (e.g. `readdir()`, ~~`opendir()`~~, ... see comments for more detailed list...) without releasing the GVL.

This means that if they block for a long time (as happens in the gcsfuse example in https://bugs.ruby-lang.org/issues/20586 ), the Ruby VM will just be blocked and not make any progress.

The combination of not releasing the GVL + slow system calls actually makes the issue in https://bugs.ruby-lang.org/issues/20586 more likely to happen with the Datadog profiler, although even if the code releases the GVL the underlying issue could still happen, and this is why I decided to file this bug separately.

---Files--------------------------------
repro.rb (1.18 KB)
hello.c (4.89 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-07-16 17:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-19 10:16 [ruby-core:118347] [Ruby master Bug#20587] dir.c calls blocking system calls while holding the GVL ivoanjo (Ivo Anjo) via ruby-core
2024-07-08 18:56 ` [ruby-core:118494] [Ruby master Bug#20587] dir.c calls blocking filesystem APIs/system " jeremyevans0 (Jeremy Evans) via ruby-core
2024-07-09 13:21 ` [ruby-core:118521] " ivoanjo (Ivo Anjo) via ruby-core
2024-07-11  3:05 ` [ruby-core:118541] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-07-12 11:33 ` [ruby-core:118583] " ivoanjo (Ivo Anjo) via ruby-core
2024-07-14 19:07 ` [ruby-core:118591] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-07-16 17:37 ` [ruby-core:118619] " jeremyevans0 (Jeremy Evans) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).