From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on starla X-Spam-Level: X-Spam-Status: No, score=-1.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=4.0.1 Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=BEyWEt90; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=tSVSX3FZ; dkim-atps=neutral Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E014B1F47D for ; Thu, 11 Sep 2025 23:40:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1757634023; bh=gVF4GFnPf/qVVziL0Ri+937teEKBxyAujBXmsx0ukpY=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=BEyWEt90xONLYm3GGaVCXfrbPvTXOZzjbj1gaJ0/RZ/dyaew2mzTt4JyiM/LnUFuY rxN4USKiDXCetma8xhJccml0ZBjL/ulvzTtuC0Ih7gDITtRt9E7zh5VbGJzBdTOcYu 1JhW1fyN1HzEA5p6uAI53aIChGFlLFq2K8NZbAjI= Received: from nue.mailmanlists.eu (localhost [IPv6:::1]) by nue.mailmanlists.eu (Postfix) with ESMTP id F309842DB3 for ; Thu, 11 Sep 2025 23:40:23 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=tSVSX3FZ; dkim-atps=neutral Received: from s.wrqvtzvf.outbound-mail.sendgrid.net (s.wrqvtzvf.outbound-mail.sendgrid.net [149.72.126.143]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 2E13A42D5F for ; Thu, 11 Sep 2025 23:39:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=date:from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:date:from:subject:to; s=s1; bh=ryrqYoJlZXvmtiMwaQgMvN+9sNTGAI7XUlk7kIKQVrM=; b=tSVSX3FZz/v2T0Mso1UiomotcLezLkBn77hZm48SfBcZgUb/f20vuE2K9l45ecHNQSyp jhH95dJvEJtPU3IvSP2SX4dNX0grxT+rtDVpDl/RZ/C2iOM3nVYZx8yOAR9562CSKEyHIR nm5wfDeHXxShrPhuWeWNK5K6EQjla+HSwM4SAilKpc8fsMZY8trXdlt2WP0ytzOs6wzg9b H2codbRrya18tIUCP3gXrD15jmtOV28rwVLiXQP7mKAeguFchsQ+ZCprlgNcBO/B99YfXy GEYm1FcqZWb6JwDK1UEa0bO3qneTpuE6WmjvcCgWQ8hg9UlANoc2MhfEhdFb2UAA== Received: by recvd-canary-75b76f449d-zv89v with SMTP id recvd-canary-75b76f449d-zv89v-1-68C35DA4-2 2025-09-11 23:39:16.033697264 +0000 UTC m=+8641855.645861737 Received: from herokuapp.com (unknown) by geopod-ismtpd-25 (SG) with ESMTP id pHfLqOewR3u48alPJJne_Q for ; Thu, 11 Sep 2025 23:39:15.975 +0000 (UTC) Date: Thu, 11 Sep 2025 23:39:16 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Bug X-Redmine-Issue-Id: 21571 X-Redmine-Issue-Author: dmorner X-Redmine-Issue-Priority: Normal X-Redmine-Sender: dmorner X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 100026 X-SG-EID: =?us-ascii?Q?u001=2EO3G=2Foho8mwLEY7Kt=2FHzf7Ij1DDvQ+IOwgNts24uqmQJ31icbQWgfOAFV9?= =?us-ascii?Q?burujJjiONuoxQ2Nz+rpQa4hnzW08MwQLENB0bl?= =?us-ascii?Q?NrmuFcX4zL2jzivnhlmquQSq0pIIlIVhlx6+IXG?= =?us-ascii?Q?+HqZUPBTDIu5XPApJA6NRpqq0Vd4Mt23veiCz8l?= =?us-ascii?Q?nuMxGvLWKwH9ePGNbMqwpHFO493PKrfQ1vkqEHU?= =?us-ascii?Q?YHkQrvtYo1hTeFtKFEDhnfKi1K6QiOr80fZ9k9H?= =?us-ascii?Q?qOTKMYuhB4QTKrZ3OqlxjsLdCA=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: MWNJMDBOFUOJSNY5XVJFXWGMIGUIPVWN X-Message-ID-Hash: MWNJMDBOFUOJSNY5XVJFXWGMIGUIPVWN X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:123229] [Ruby Bug#21571] Ruby forked process sporadically hanging on exit List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "dmorner (Daniel Orner) via ruby-core" Cc: "dmorner (Daniel Orner)" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #21571 has been updated by dmorner (Daniel Orner). Thanks so much for the prompt response! I learned at least two things in the last 30 seconds I hadn't known before. :) I really appreciate your patience and goodwill. I'll give your suggestions a try! ---------------------------------------- Bug #21571: Ruby forked process sporadically hanging on exit https://bugs.ruby-lang.org/issues/21571#change-114558 * Author: dmorner (Daniel Orner) * Status: Rejected * ruby -v: ruby 3.4.5 (2025-07-16 revision 20cda200d3) +YJIT +PRISM [x86_64-linux] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- This is my first bug report, so please let me know if there's anything I can do to improve it. We have a production-grade Rails app that's been running for many years. We recently moved to EKS and upgraded it to the latest Ruby and Rails. We have a number of delayed_job processes that fork on every job that comes in so that the OS can reclaim the memory used in executing it (we implemented this a long time ago because Ruby never gives up any memory that it takes, and some jobs use way more memory than others). In the last couple of weeks, we've noticed a rare occurrence where the delayed job hangs when exiting. The code looks like this:
    Process.fork do
      ActiveRecord::Base.establish_connection
      execute_job
    end
    Process.wait
The forked child process doesn't exit when this bug occurs, it's just stuck forever, doing nothing. Obviously I don't have a way to reproduce this because it happens maybe once every few thousand jobs, and it happens across all job types. If I run gdb on the child process, I always see something that looks like this (note: I am a total gdb newbie):
#0  __futex_abstimed_wait_common
    (futex_word=futex_word@entry=0x7fb6af41400c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=, cancel=cancel@entry=false) at ./nptl/futex-internal.c:103
#1  0x00007fb6d5677f68 in __GI___futex_abstimed_wait64
    (futex_word=futex_word@entry=0x7fb6af41400c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=) at ./nptl/futex-internal.c:128
#2  0x00007fb6d568138c in __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x7fb6af414000) at ./nptl/pthread_rwlock_common.c:730
#3  ___pthread_rwlock_wrlock (rwlock=0x7fb6af414000) at ./nptl/pthread_rwlock_wrlock.c:26
#4  0x00007fb6aee22989 in CRYPTO_THREAD_write_lock () at /lib/x86_64-linux-gnu/libcrypto.so.3
#5  0x00007fb6aee15c6a in  () at /lib/x86_64-linux-gnu/libcrypto.so.3
#6  0x00007fb6aee15fa9 in OPENSSL_thread_stop () at /lib/x86_64-linux-gnu/libcrypto.so.3
#7  0x00007fb6aee153b5 in OPENSSL_cleanup () at /lib/x86_64-linux-gnu/libcrypto.so.3
#8  0x00007fb6d563055d in __run_exit_handlers
    (status=0, listp=0x7fb6d57c5820 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at ./stdlib/exit.c:116
#9  0x00007fb6d563069a in __GI_exit (status=) at ./stdlib/exit.c:146
#10 0x00007fb6d5ad3a80 in ruby_stop (ex=) at eval.c:290
#11 0x00007fb6d5bc47b4 in rb_f_fork (obj=) at process.c:4388
#12 rb_f_fork (obj=) at process.c:4378
#13 0x00007fb6d5cad5cc in vm_call_cfunc_with_frame_
    (stack_bottom=, argv=, argc=0, calling=, reg_cfp=0x7fb6d4f68280, ec=0x7fb6d4e4d550)
    at /usr/src/ruby/vm_insnhelper.c:3794
#14 vm_call_cfunc_with_frame (ec=0x7fb6d4e4d550, reg_cfp=0x7fb6d4f68280, calling=) at /usr/src/ruby/vm_insnhelper.c:3840
#15 0x00007fb6d5cb3fef in vm_sendish
    (ec=0x7fb6d4e4d550, reg_cfp=0x7fb6d4f68280, cd=0x7fb69fb17650, block_handler=, method_explorer=mexp_search_method)
    at /usr/src/ruby/vm_callinfo.h:415
#16 0x00007fb6d5cc1e59 in vm_exec_core (ec=0x7fb6af41400c, ec@entry=0x7fb6d4e4d550) at /usr/src/ruby/insns.def:851
#17 0x00007fb6d5cc7ba9 in rb_vm_exec (ec=0x7fb6d4e4d550) at vm.c:2595
#18 0x00007fb6b13e73b9 in  ()
#19 0x00007fb6d4f68328 in  ()
...etc, I can paste more if needed
I can't seem to get `call rb_backtrace()` working in gdb, it never prints anything. This seems to indicate that there's some kind of thread lock when OpenSSL is shutting down. The crazy thing is that **there is only one thread** for most of the processes I inspect. Any help would be greatly appreciated! -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/