From: "ivoanjo (Ivo Anjo) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "ivoanjo (Ivo Anjo)" <noreply@ruby-lang.org>
Subject: [ruby-core:118919] [Ruby master Bug#20586] Some filesystem calls in dir.c are missing error handling and can return incorrect results if interrupted
Date: Thu, 22 Aug 2024 08:03:17 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-109489.20240822080317.16600@ruby-lang.org> (raw)
In-Reply-To: <redmine.issue-20586.20240619101003.16600@ruby-lang.org>
Issue #20586 has been updated by ivoanjo (Ivo Anjo).
jeremyevans0 (Jeremy Evans) wrote in #note-11:
>
> I've submitted a pull request to add error checking to `getlogin` to `rb_home_dir_of` and `PTY.spawn`: https://github.com/ruby/ruby/pull/11427
>
> I reviewed `getlogin` usage in `etc` and it already handles errors, falling back to `USER` environment variable:
>
> ```c
> login = getlogin();
> if (!login) login = getenv("USER");
> ```
👍👍👍
The "what's the worst thing that can happen" spidy sense makes me think that automatically falling back on errors that could be transient could mean that such calls have the potential to behave differently once-in-a-blue-moon (e.g. when the fallback kicks in).
On the other hand raising on error can mean failure on non-transient errors (and which ones are transient or not??) as well as rare exceptions being raised once-in-a-blue-moon, which does not seem very nice either, so I think the current version is reasonable.
----------------------------------------
Bug #20586: Some filesystem calls in dir.c are missing error handling and can return incorrect results if interrupted
https://bugs.ruby-lang.org/issues/20586#change-109489
* Author: ivoanjo (Ivo Anjo)
* Status: Open
* Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
#### Background
Hey! I work for Datadog on the Ruby profiler part of the [`datadog` (previously `ddtrace`)](https://github.com/datadog/dd-trace-rb) gem.
A customer reached [out with an issue](https://github.com/DataDog/dd-trace-rb/issues/3450) where enabling the profiler made `Dir.glob` return no files for a given folder:
Without profiler:
```
irb(main):002:0> Dir.glob('/gcsfuse/t*')
=> ["/gcsfuse/test.html", "/gcsfuse/test.txt"]
```
With profiler:
```
irb(main):002:0> Dir.glob('/gcsfuse/t*')
=> []
```
It turns out the issue is related to missing error handling in `dir.c`.
The Datadog Ruby profiler, like stackprof, pf2 and vernier, uses unix signals to interrupt the currently-active thread and take a sample (usually `SIGPROF`). When some system calls get interrupted by a signal, they return an [EINTR error code](https://man7.org/linux/man-pages/man7/signal.7.html#:~:text=Interruption%20of%20system%20calls%20and%20library%20functions%20by%20signal%20handlers) back to the caller.
Consider for instance the implementation of `dir_each_entry` in `dir.c`:
```c
static VALUE
dir_each_entry(VALUE dir, VALUE (*each)(VALUE, VALUE), VALUE arg, int children_only)
{
struct dir_data *dirp;
struct dirent *dp;
IF_NORMALIZE_UTF8PATH(int norm_p);
GetDIR(dir, dirp);
rewinddir(dirp->dir);
IF_NORMALIZE_UTF8PATH(norm_p = need_normalization(dirp->dir, RSTRING_PTR(dirp->path)));
while ((dp = READDIR(dirp->dir, dirp->enc)) != NULL) {
// ... do things
}
return dir;
}
```
If `READDIR` returns `NULL`, then `dir_each_entry` assumes it has iterated the entire directory. But looking [at the man page for `readdir`](https://man7.org/linux/man-pages/man3/readdir.3.html) we see the following sharp edge (emphasis mine):
> It returns NULL on reaching the end of the directory stream **or if an error occurred**.
So what's happening in this situation is: `readdir` gets interrupted, returns `NULL` + sets errno to `EINTR`. But `dir_each_entry` doesn't check `errno`, so rather than raising an exception to flag the issue, it treats it as if the end of the directory has been reached.
#### How to reproduce
Reproducing this is somewhat annoying, because it's dependent on timing: the signal must arrive at the exact time the dir API is getting executed.
I was able to reproduce this every time by using the [google cloud `gcsfuse`](https://cloud.google.com/storage/docs/gcs-fuse) tool. This somewhat makes sense -- a remote filesystem is much slower than a local one, so there's a much bigger window of opportunity for a signal to arrive while the system call is blocked.
Here's an example I included in https://github.com/DataDog/dd-trace-rb/pull/3720:
```
# Not shown: Set up a trial google cloud account, install gcsfuse, create a cloud storage bucket and put in some test files
$ gcsfuse test_fs_dd_trace_rb fuse-testing/
$ ls fuse-testing/
hello.txt test.html test.txt
# Not shown: Add `datadog` gem to `Gemfile`
$ DD_PROFILING_ENABLED=true DD_PROFILING_DIR_INTERRUPTION_WORKAROUND_ENABLED=false bundle exec ddprofrb exec ruby -e "Datadog::Profiling.wait_until_running; pp Dir.children('fuse-testing/')"
[]
```
Let me know if you'd like me to try to create a reproducer that does not depend on the `datadog` gem.
#### Additional notes
I've spent quite some time looking at the `dir.c` sources, and here's the full list of APIs that suffer from issues:
* `dir_each_entry` does not check `errno`; all of its users have interruption bugs
* `dir_tell` will return -1 instead of the correct position (which means that passing -1 to `dir_seek`/`dir_set_pos` will cause it to not list the directory properly)
* `do_opendir` an error in system calls will only sometimes be turned into a raised exception
* Indirect callers that pass in rb_glob_error as errfunc: rb_glob, Dir.[], Dir.glob
* Indirect callers that pass in 0 as errfunc: ruby_glob, ruby_brace_glob
* `glob_opendir` does not check errno; all of its users have interruption bugs
* `glob_getent` does not check errno; all of its users have interruption bugs
* `nogvl_dir_empty_p` does not check errno (of readdir! it actually checks for opendir); all of its users have interruption bugs
Less sure about these:
* `do_stat`/`do_lstat` will turn errors into warnings (unclear if enabled or disabled by default)
* `need_normalization` calls `fgetattrlist` / `getattrlist` and all errors `(ret != 0)` are treated in the same way
* `rb_glob_error` is and `rb_glob_caller` leave exceptions as pending and rely on callers to raise them properly
* Error handling of `rb_home_dir_of` and `rb_default_home_dir` are a bit suspicious
As a workaround in the Datadog Ruby profiler, in https://github.com/DataDog/dd-trace-rb/pull/3720 I've added monkey patches to all of the Ruby-level APIs that use the above functions and mask out `SIGPROF` so these calls are never interrupted.
This solution is does successfully work around the issue, although it prevents the profiler from sampling during these system calls, which will mean less visibility if e.g. these calls are taking a long time. And well, maintaining monkey patches is always problematic for future Ruby compatibility.
---Files--------------------------------
readdir-bug-repro.c (2.11 KB)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
prev parent reply other threads:[~2024-08-22 8:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-19 10:10 [ruby-core:118346] " ivoanjo (Ivo Anjo) via ruby-core
2024-06-19 10:17 ` [ruby-core:118348] " ivoanjo (Ivo Anjo) via ruby-core
2024-06-19 11:11 ` [ruby-core:118350] " mame (Yusuke Endoh) via ruby-core
2024-06-19 14:04 ` [ruby-core:118354] " ivoanjo (Ivo Anjo) via ruby-core
2024-06-19 19:57 ` [ruby-core:118358] " Eregon (Benoit Daloze) via ruby-core
2024-06-20 5:02 ` [ruby-core:118359] " mame (Yusuke Endoh) via ruby-core
2024-06-20 9:53 ` [ruby-core:118364] " ivoanjo (Ivo Anjo) via ruby-core
2024-08-16 21:59 ` [ruby-core:118861] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-08-18 3:46 ` [ruby-core:118874] " Alan D. Salewski via ruby-core
2024-08-19 9:19 ` [ruby-core:118890] " ivoanjo (Ivo Anjo) via ruby-core
2024-08-19 22:32 ` [ruby-core:118898] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-08-21 22:52 ` [ruby-core:118911] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-08-22 8:03 ` ivoanjo (Ivo Anjo) via ruby-core [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-109489.20240822080317.16600@ruby-lang.org \
--to=ruby-core@ml.ruby-lang.org \
--cc=noreply@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).