ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:124804] [Ruby Bug#21876] Addrinfo.getaddrinfo(AF_UNSPEC) deadlocks after fork on macOS for IPv4-only hosts
@ 2026-02-13  0:44 nbeyer@gmail.com (Nathan Beyer) via ruby-core
  0 siblings, 0 replies; only message in thread
From: nbeyer@gmail.com (Nathan Beyer) via ruby-core @ 2026-02-13  0:44 UTC (permalink / raw)
  To: ruby-core; +Cc: nbeyer@gmail.com (Nathan Beyer)

Issue #21876 has been reported by nbeyer@gmail.com (Nathan Beyer).

----------------------------------------
Bug #21876: Addrinfo.getaddrinfo(AF_UNSPEC) deadlocks after fork on macOS for IPv4-only hosts
https://bugs.ruby-lang.org/issues/21876

* Author: nbeyer@gmail.com (Nathan Beyer)
* Status: Open
* ruby -v: ruby 3.4.8 (2025-12-17 revision 995b59f666) +PRISM [arm64-darwin25]
* Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN
----------------------------------------
## Summary

On macOS, `Addrinfo.getaddrinfo(host, service, Socket::AF_UNSPEC, Socket::SOCK_STREAM)` can deadlock in forked child processes when the host has no AAAA (IPv6) DNS records and the parent process previously resolved the same host.

This happened to me when using an HTTP library to acquire an OAuth access token in a Rails initializer and then the process was forked, then a separate call was made to the same host in the forked process.

## Environment

- macOS (tested on arm64-darwin24 and arm64-darwin25, Apple Silicon)
- Ruby 3.4.7, 3.4.8
- The issue is probabilistic — frequency varies by environment but is highly reproducible under sustained DNS activity

## Reproduction

Minimal example:

```ruby
require "socket"
require "timeout"

# Parent resolves an IPv4-only host (no AAAA records)
Addrinfo.getaddrinfo("httpbin.org", "https", Socket::AF_UNSPEC, Socket::SOCK_STREAM)

pid = fork do
  begin
    Timeout.timeout(5) do
      Addrinfo.getaddrinfo("httpbin.org", "https", Socket::AF_UNSPEC, Socket::SOCK_STREAM)
    end
    puts "Child: OK"
  rescue Timeout::Error
    puts "Child: DEADLOCK — getaddrinfo hung for 5s"
  end
end
Process.waitpid(pid)
```

The issue is probabilistic — a single invocation may or may not deadlock. The attached script runs 50 trials each for several variants to demonstrate the pattern. Deadlock may not happen on the first run, but if you run it several times, you should see at least a single deadlock in Test 2, if not deadlock of all results in Test 1 and Test 2.

See attachment - ruby_getaddrinfo_fork_bug.rb

Typical output:

```
Test 1 (single IPv4-only host):     20/20 deadlocked
Test 2 (multi-host warmup):         20/20 deadlocked
Test 3 (dual-stack host control):   0/20 deadlocked
Test 4 (AF_INET workaround):        0/20 deadlocked
```

## Context

The deadlock occurs when ALL of these conditions hold:

1. **macOS** (not observed on Linux)
2. Parent called `getaddrinfo(host, AF_UNSPEC)` for a host with **no AAAA (IPv6) records**
3. Child calls `getaddrinfo` for the **same host** with `AF_UNSPEC`

**Not affected:**
- Hosts **with** AAAA records (dual-stack) — e.g., `www.google.com`, `rubygems.org`
- Using `Socket::AF_INET` instead of `Socket::AF_UNSPEC`
- Hosts the parent never resolved

| Host | AAAA records | Child deadlocks? |
|------|-------------|-----------------|
| httpbin.org | None | **Yes** |
| www.github.com | None | **Yes** |
| api.github.com | None | **Yes** |
| stackoverflow.com | None | **Yes** |
| www.google.com | Yes | No |
| rubygems.org | Yes | No |
| example.com | Yes | No |
| www.cloudflare.com | Yes | No |

## Potential Root Cause

As I understand it, on macOS, `getaddrinfo` communicates with the `mDNSResponder` system daemon via Mach IPC ports. When `getaddrinfo(AF_UNSPEC)` queries a host with no AAAA records, the negative AAAA result appears to be cached via Mach port state. After `fork()`, the child process inherits the address space (including references to this cached state) but does **not** inherit the Mach port connections to `mDNSResponder`. When the child calls `getaddrinfo` for the same host, it encounters the stale cache entry and deadlocks trying to communicate over the invalidated Mach port.

Hosts with positive AAAA results are not affected, presumably because their cache entries do not require re-contacting `mDNSResponder` in the same code path.

## Feature #20590

Ruby 3.4's fork safety improvements (Feature #20590) added a read-write lock around `getaddrinfo` to prevent `fork()` while a `getaddrinfo` call is actively running. However, this does not address the issue reported here — the problem is not about forking *during* a `getaddrinfo` call, but about stale mDNSResponder Mach port state that is inherited by the child process *after* `getaddrinfo` has completed in the parent.

---Files--------------------------------
ruby_getaddrinfo_fork_bug.rb (5.36 KB)


-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-02-13  0:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13  0:44 [ruby-core:124804] [Ruby Bug#21876] Addrinfo.getaddrinfo(AF_UNSPEC) deadlocks after fork on macOS for IPv4-only hosts nbeyer@gmail.com (Nathan Beyer) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).