public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
* fmd core dump
@ 2024-07-22 14:01 Gabriele Bulfon
  2024-07-22 14:10 ` [developer] " Toomas Soome
  2024-08-09 16:47 ` Pramod Batni
  0 siblings, 2 replies; 8+ messages in thread
From: Gabriele Bulfon @ 2024-07-22 14:01 UTC (permalink / raw)
  To: illumos-developer


[-- Attachment #1.1: Type: text/plain, Size: 639 bytes --]

Hi, I have a couple of systems, installed in 2012 and updated up to illumos 2019 (will have to update to 2024 later).
They periodically (every 3-4 months, sometimes earlier) create a core dump under /var/fm/fmd.
Looks like fmd core dumped, so no email notice is sent, and we end up filling the rpool.
I found  this link: https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
So here I attach the pstack of one of the dumps.
 
Any idea?

Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 


[-- Attachment #1.2: Type: text/html, Size: 2289 bytes --]

[-- Attachment #2: core.fmd.dump.pstack.txt --]
[-- Type: text/plain, Size: 13959 bytes --]

sonicle@xstreamserver2:/var/fm/fmd$ pfexec pstack core.fmd.25573
core 'core.fmd.25573' of 25573: /usr/lib/fm/fmd/fmd
--------------------- thread# 1 / lwp# 1 ---------------------
 feed6715 __sigsuspend (8047d78, 4, 8047dc8, 807353d, fef4cc40, 0) + 15
 080735a4 main     (8047dcc, fef52308) + 39b
 0805ffd8 _start_crt (1, 8047e2c, fefd0b60, 0, 0, 0) + 97
 0805feaa _start   (1, 8047eec, 0, 8047f00, 8047f19, 8047f2a) + 1a
--------------------- thread# 2 / lwp# 2 ---------------------
 feed6335 __pollsys (fdadeb50, 1, fdadeae8, 0, fdca22e0, 0) + 15
 fee65906 poll     (fdadeb50, 1, 2710) + 66
 fdc89c51 ses_contract_thread (0) + 103
 feed1a3b _thrp_setup (fed80a40) + 88
 feed1bd0 _lwp_start (fed80a40, 0, 0, 0, 0, 0)
--------------------- thread# 3 / lwp# 3 ---------------------
 feed1c29 __lwp_park (feac9818, feac9838, fd94ef18, feed6da6, fefb0000, fd94eeb8) + 19
 feecb958 cond_wait_queue (feac9818, feac9838, fd94ef18) + 6a
 feecbc2c cond_wait_common (feac9818, feac9838, fd94ef18) + 27b
 feecbe9a __cond_timedwait (feac9818, feac9838, fd94ef9c) + 111
 feecbed4 cond_timedwait (feac9818, feac9838, fd94ef9c) + 35
 fea84ab3 umem_update_thread (0) + 1f9
 feed1a3b _thrp_setup (fed81a40) + 88
 feed1bd0 _lwp_start (fed81a40, 0, 0, 0, 0, 0)
--------------------- thread# 4 / lwp# 4 ---------------------
 feed71ee __door_return (fd82edb8, 4, 0, 0, 0, feed720b) + 2e
 fdbd4d1d event_deliver_service (9017988, fd82edfc, 4, 0, 0) + 18b
 feed720b __door_return () + 4b
--------------------- thread# 5 / lwp# 5 ---------------------
 feed1c29 __lwp_park (83cee90, 83ceea0, 0, 0, 0, 0) + 19
 feecb958 cond_wait_queue (83cee90, 83ceea0, 0) + 6a
 feecbfd0 __cond_wait (83cee90, 83ceea0) + 8f
 feecc024 cond_wait (83cee90, 83ceea0) + 2e
 fdbd49f3 subscriber_event_handler (9017988) + 51
 feed1a3b _thrp_setup (fed82a40) + 88
 feed1bd0 _lwp_start (fed82a40, 0, 0, 0, 0, 0)
--------------------- thread# 6 / lwp# 6 ---------------------
 feed68a5 _lwp_kill (6, 6, 3621, fef45000, fef45000, b) + 15
 fee68a7b raise    (6) + 2b
 fee41cde abort    () + 10e
 08079939 fmd_panic (8081400)
 0807994b fmd_panic (8081400) + 12
 08065394 fmd_alloc (50, 1) + 81
 0806f6a5 fmd_event_create (1, 328de12f, 1d365c3, 0) + 18
 08073ae3 fmd_module_timeout (9a71300, 3f7, 328de12f) + 20
 0807bd21 fmd_timerq_exec (9168a00) + 127
 0807b299 fmd_thread_start (82a5fb0) + 5b
 feed1a3b _thrp_setup (fed83240) + 88
 feed1bd0 _lwp_start (fed83240, 0, 0, 0, 0, 0)
--------------------- thread# 7 / lwp# 7 ---------------------
 feed1c29 __lwp_park (91694a0, 9169488, 0, fea8a743, 8f48194, 80c9340) + 19
 feecb958 cond_wait_queue (91694a0, 9169488, 0) + 6a
 feecbfd0 __cond_wait (91694a0, 9169488) + 8f
 feecc024 cond_wait (91694a0, 9169488) + 2e
 feecc06d pthread_cond_wait (91694a0, 9169488) + 24
 0806fcbe fmd_eventq_delete (9169488) + 3f
 080753f5 fmd_module_start (8f48180) + 13e
 0807b299 fmd_thread_start (90c5198) + 5b
 feed1a3b _thrp_setup (fed83a40) + 88
 feed1bd0 _lwp_start (fed83a40, 0, 0, 0, 0, 0)
--------------------- thread# 8 / lwp# 8 ---------------------
 feed1c29 __lwp_park (9169360, 9169348, 0, fea8a743, 8f487d4, 80c9380) + 19
 feecb958 cond_wait_queue (9169360, 9169348, 0) + 6a
 feecbfd0 __cond_wait (9169360, 9169348) + 8f
 feecc024 cond_wait (9169360, 9169348) + 2e
 feecc06d pthread_cond_wait (9169360, 9169348) + 24
 0806fcbe fmd_eventq_delete (9169348) + 3f
 080753f5 fmd_module_start (8f487c0) + 13e
 0807b299 fmd_thread_start (90c5148) + 5b
 feed1a3b _thrp_setup (fed84240) + 88
 feed1bd0 _lwp_start (fed84240, 0, 0, 0, 0, 0)
--------------------- thread# 9 / lwp# 9 ---------------------
 feed71ee __door_return (0, 0, 0, 0, 90c50d0, 0) + 2e
 feebd60c door_xcreate_startf (fd40ec80) + 17b
 0807b299 fmd_thread_start (90c50d0) + 5b
 feed1a3b _thrp_setup (fed84a40) + 88
 feed1bd0 _lwp_start (fed84a40, 0, 0, 0, 0, 0)
-------------------- thread# 10 / lwp# 10 --------------------
 feed71ee __door_return (fd1fed88, 4, 0, 0, 6874, 0) + 2e
 fdbd4d1d event_deliver_service (9168048, fd1fedfc, 4, 0, 0) + 18b
 feed720b __door_return () + 4b
-------------------- thread# 11 / lwp# 11 --------------------
 feed1c29 __lwp_park (fed85a40, 0, 8f788ac, 0, fd0ffc08, 1) + 19
 feecb805 mutex_lock_impl (8f788ac, 0) + 291
 feecc11f mutex_lock (8f788ac) + 19
 08071d80 fmd_log_append (8f787c0, 9a94590, 0) + 46f
 0807e93d fmd_xprt_recv (90ac178, 9902630, fa948298, 1cf5ce0) + 48d
 080628d7 fmd_xprt_post (8f487c0, 90ac178, 9902630, fa948298) + c5
 0807adf4 sysev_legacy (9ad8008) + 100
 fdbd4a1b subscriber_event_handler (9168048) + 79
 0807b299 fmd_thread_start (90c5058) + 5b
 feed1a3b _thrp_setup (fed85a40) + 88
 feed1bd0 _lwp_start (fed85a40, 0, 0, 0, 0, 0)
-------------------- thread# 12 / lwp# 12 --------------------
 feed6335 __pollsys (9294d88, 4, 0, 0, 0, 800) + 15
 fee65906 poll     (9294d88, 4, ffffffff) + 66
 feb1c7ef _svc_run_mt () + 2da
 feb1cbf6 svc_run  () + 48
 0807b299 fmd_thread_start (90c5008) + 5b
 feed1a3b _thrp_setup (fed86240) + 88
 feed1bd0 _lwp_start (fed86240, 0, 0, 0, 0, 0)
-------------------- thread# 13 / lwp# 13 --------------------
 feed1c29 __lwp_park (9280ef8, 9280ee0, 0, fea8a743, 8f48054, 80c94c0) + 19
 feecb958 cond_wait_queue (9280ef8, 9280ee0, 0) + 6a
 feecbfd0 __cond_wait (9280ef8, 9280ee0) + 8f
 feecc024 cond_wait (9280ef8, 9280ee0) + 2e
 feecc06d pthread_cond_wait (9280ef8, 9280ee0) + 24
 0806fcbe fmd_eventq_delete (9280ee0) + 3f
 080753f5 fmd_module_start (8f48040) + 13e
 0807b299 fmd_thread_start (927dad8) + 5b
 feed1a3b _thrp_setup (fed86a40) + 88
 feed1bd0 _lwp_start (fed86a40, 0, 0, 0, 0, 0)
-------------------- thread# 14 / lwp# 14 --------------------
 feed1c29 __lwp_park (9280e08, 9280df0, 0, fea8a743, 8fe9e54, 80c9500) + 19
 feecb958 cond_wait_queue (9280e08, 9280df0, 0) + 6a
 feecbfd0 __cond_wait (9280e08, 9280df0) + 8f
 feecc024 cond_wait (9280e08, 9280df0) + 2e
 feecc06d pthread_cond_wait (9280e08, 9280df0) + 24
 0806fcbe fmd_eventq_delete (9280df0) + 3f
 080753f5 fmd_module_start (8fe9e40) + 13e
 0807b299 fmd_thread_start (927da10) + 5b
 feed1a3b _thrp_setup (fed87240) + 88
 feed1bd0 _lwp_start (fed87240, 0, 0, 0, 0, 0)
-------------------- thread# 15 / lwp# 15 --------------------
 feed1c29 __lwp_park (9280ea8, 9280e90, 0, fea8a743, 8fe9954, 80c9540) + 19
 feecb958 cond_wait_queue (9280ea8, 9280e90, 0) + 6a
 feecbfd0 __cond_wait (9280ea8, 9280e90) + 8f
 feecc024 cond_wait (9280ea8, 9280e90) + 2e
 feecc06d pthread_cond_wait (9280ea8, 9280e90) + 24
 0806fcbe fmd_eventq_delete (9280e90) + 3f
 080753f5 fmd_module_start (8fe9940) + 13e
 0807b299 fmd_thread_start (9473010) + 5b
 feed1a3b _thrp_setup (fed87a40) + 88
 feed1bd0 _lwp_start (fed87a40, 0, 0, 0, 0, 0)
-------------------- thread# 16 / lwp# 16 --------------------
 0807b23e fmd_thread_start(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-------------------- thread# 17 / lwp# 17 --------------------
 feed1c29 __lwp_park (928af00, 928aee8, 0, fea8a743, 8fe96d4, 80c95c0) + 19
 feecb958 cond_wait_queue (928af00, 928aee8, 0) + 6a
 feecbfd0 __cond_wait (928af00, 928aee8) + 8f
 feecc024 cond_wait (928af00, 928aee8) + 2e
 feecc06d pthread_cond_wait (928af00, 928aee8) + 24
 0806fcbe fmd_eventq_delete (928aee8) + 3f
 080753f5 fmd_module_start (8fe96c0) + 13e
 0807b299 fmd_thread_start (94dff40) + 5b
 feed1a3b _thrp_setup (fed88a40) + 88
 feed1bd0 _lwp_start (fed88a40, 0, 0, 0, 0, 0)
-------------------- thread# 18 / lwp# 18 --------------------
 feed1c29 __lwp_park (9280e58, 9280e40, 0, fea8a743, 909b554, 80c9600) + 19
 feecb958 cond_wait_queue (9280e58, 9280e40, 0) + 6a
 feecbfd0 __cond_wait (9280e58, 9280e40) + 8f
 feecc024 cond_wait (9280e58, 9280e40) + 2e
 feecc06d pthread_cond_wait (9280e58, 9280e40) + 24
 0806fcbe fmd_eventq_delete (9280e40) + 3f
 080753f5 fmd_module_start (909b540) + 13e
 0807b299 fmd_thread_start (950cbe0) + 5b
 feed1a3b _thrp_setup (fed89a40) + 88
 feed1bd0 _lwp_start (fed89a40, 0, 0, 0, 0, 0)
-------------------- thread# 19 / lwp# 19 --------------------
 feed1c29 __lwp_park (9280638, 9280620, 0, fea8a743, 909b694, 80c9640) + 19
 feecb958 cond_wait_queue (9280638, 9280620, 0) + 6a
 feecbfd0 __cond_wait (9280638, 9280620) + 8f
 feecc024 cond_wait (9280638, 9280620) + 2e
 feecc06d pthread_cond_wait (9280638, 9280620) + 24
 0806fcbe fmd_eventq_delete (9280620) + 3f
 080753f5 fmd_module_start (909b680) + 13e
 0807b299 fmd_thread_start (950cb40) + 5b
 feed1a3b _thrp_setup (fed8a240) + 88
 feed1bd0 _lwp_start (fed8a240, 0, 0, 0, 0, 0)
-------------------- thread# 20 / lwp# 20 --------------------
 feed1c29 __lwp_park (928aeb0, 928ae98, 0, fea8a743, 909b7d4, 80c9680) + 19
 feecb958 cond_wait_queue (928aeb0, 928ae98, 0) + 6a
 feecbfd0 __cond_wait (928aeb0, 928ae98) + 8f
 feecc024 cond_wait (928aeb0, 928ae98) + 2e
 feecc06d pthread_cond_wait (928aeb0, 928ae98) + 24
 0806fcbe fmd_eventq_delete (928ae98) + 3f
 080753f5 fmd_module_start (909b7c0) + 13e
 0807b299 fmd_thread_start (950ca78) + 5b
 feed1a3b _thrp_setup (fed8aa40) + 88
 feed1bd0 _lwp_start (fed8aa40, 0, 0, 0, 0, 0)
-------------------- thread# 22 / lwp# 22 --------------------
 feed1c29 __lwp_park (928a8c0, 928a8a8, 0, fea8a743, 9094154, 80c9700) + 19
 feecb958 cond_wait_queue (928a8c0, 928a8a8, 0) + 6a
 feecbfd0 __cond_wait (928a8c0, 928a8a8) + 8f
 feecc024 cond_wait (928a8c0, 928a8a8) + 2e
 feecc06d pthread_cond_wait (928a8c0, 928a8a8) + 24
 0806fcbe fmd_eventq_delete (928a8a8) + 3f
 080753f5 fmd_module_start (9094140) + 13e
 0807b299 fmd_thread_start (950ca28) + 5b
 feed1a3b _thrp_setup (fed8b240) + 88
 feed1bd0 _lwp_start (fed8b240, 0, 0, 0, 0, 0)
-------------------- thread# 23 / lwp# 23 --------------------
 feed71ee __door_return (0, 0, 0, 0, 9571760, 0) + 2e
 feebd60c door_xcreate_startf (fc40ec40) + 17b
 0807b299 fmd_thread_start (9571760) + 5b
 feed1a3b _thrp_setup (fed8c240) + 88
 feed1bd0 _lwp_start (fed8c240, 0, 0, 0, 0, 0)
-------------------- thread# 24 / lwp# 24 --------------------
 feed71ee __door_return (0, 0, 0, 0, 9571738, 0) + 2e
 feebd60c door_xcreate_startf (fc40ec40) + 17b
 0807b299 fmd_thread_start (9571738) + 5b
 feed1a3b _thrp_setup (fed8ca40) + 88
 feed1bd0 _lwp_start (fed8ca40, 0, 0, 0, 0, 0)
-------------------- thread# 25 / lwp# 25 --------------------
 feed1c29 __lwp_park (9298418, 9298400, 0, fea8a743, 9094294, 80c97c0) + 19
 feecb958 cond_wait_queue (9298418, 9298400, 0) + 6a
 feecbfd0 __cond_wait (9298418, 9298400) + 8f
 feecc024 cond_wait (9298418, 9298400) + 2e
 feecc06d pthread_cond_wait (9298418, 9298400) + 24
 0806fcbe fmd_eventq_delete (9298400) + 3f
 080753f5 fmd_module_start (9094280) + 13e
 0807b299 fmd_thread_start (95716e8) + 5b
 feed1a3b _thrp_setup (fed8d240) + 88
 feed1bd0 _lwp_start (fed8d240, 0, 0, 0, 0, 0)
-------------------- thread# 27 / lwp# 27 --------------------
 feed1c29 __lwp_park (9298558, 9298540, 0, fea8a743, 90943d4, 80c9840) + 19
 feecb958 cond_wait_queue (9298558, 9298540, 0) + 6a
 feecbfd0 __cond_wait (9298558, 9298540) + 8f
 feecc024 cond_wait (9298558, 9298540) + 2e
 feecc06d pthread_cond_wait (9298558, 9298540) + 24
 0806fcbe fmd_eventq_delete (9298540) + 3f
 080753f5 fmd_module_start (90943c0) + 13e
 0807b299 fmd_thread_start (95716c0) + 5b
 feed1a3b _thrp_setup (fed8da40) + 88
 feed1bd0 _lwp_start (fed8da40, 0, 0, 0, 0, 0)
-------------------- thread# 28 / lwp# 28 --------------------
 feed1c29 __lwp_park (92985f8, 92985e0, 0, fea8a743, 9518814, 80c9880) + 19
 feecb958 cond_wait_queue (92985f8, 92985e0, 0) + 6a
 feecbfd0 __cond_wait (92985f8, 92985e0) + 8f
 feecc024 cond_wait (92985f8, 92985e0) + 2e
 feecc06d pthread_cond_wait (92985f8, 92985e0) + 24
 0806fcbe fmd_eventq_delete (92985e0) + 3f
 080753f5 fmd_module_start (9518800) + 13e
 0807b299 fmd_thread_start (9571620) + 5b
 feed1a3b _thrp_setup (fed8e240) + 88
 feed1bd0 _lwp_start (fed8e240, 0, 0, 0, 0, 0)
-------------------- thread# 29 / lwp# 29 --------------------
 feed1c29 __lwp_park (95a00c0, 95a00a8, 0, fea8a743, 9518954, 80c98c0) + 19
 feecb958 cond_wait_queue (95a00c0, 95a00a8, 0) + 6a
 feecbfd0 __cond_wait (95a00c0, 95a00a8) + 8f
 feecc024 cond_wait (95a00c0, 95a00a8) + 2e
 feecc06d pthread_cond_wait (95a00c0, 95a00a8) + 24
 0806fcbe fmd_eventq_delete (95a00a8) + 3f
 080753f5 fmd_module_start (9518940) + 13e
 0807b299 fmd_thread_start (95715f8) + 5b
 feed1a3b _thrp_setup (fed8ea40) + 88
 feed1bd0 _lwp_start (fed8ea40, 0, 0, 0, 0, 0)
-------------------- thread# 30 / lwp# 30 --------------------
 feed1c29 __lwp_park (96087b8, 96087a0, 0, fea8a743, 9518a94, 80c9900) + 19
 feecb958 cond_wait_queue (96087b8, 96087a0, 0) + 6a
 feecbfd0 __cond_wait (96087b8, 96087a0) + 8f
 feecc024 cond_wait (96087b8, 96087a0) + 2e
 feecc06d pthread_cond_wait (96087b8, 96087a0) + 24
 0806fcbe fmd_eventq_delete (96087a0) + 3f
 080753f5 fmd_module_start (9518a80) + 13e
 0807b299 fmd_thread_start (9609e48) + 5b
 feed1a3b _thrp_setup (fed8f240) + 88
 feed1bd0 _lwp_start (fed8f240, 0, 0, 0, 0, 0)
-------------------- thread# 32 / lwp# 32 --------------------
 feed1c29 __lwp_park (9608218, 9608200, 0, fea8a743, 909b054, 80c9980) + 19
 feecb958 cond_wait_queue (9608218, 9608200, 0) + 6a
 feecbfd0 __cond_wait (9608218, 9608200) + 8f
 feecc024 cond_wait (9608218, 9608200) + 2e
 feecc06d pthread_cond_wait (9608218, 9608200) + 24
 0806fcbe fmd_eventq_delete (9608200) + 3f
 080753f5 fmd_module_start (909b040) + 13e
 0807b299 fmd_thread_start (9609da8) + 5b
 feed1a3b _thrp_setup (fed8fa40) + 88
 feed1bd0 _lwp_start (fed8fa40, 0, 0, 0, 0, 0)
-------------------- thread# 33 / lwp# 33 --------------------
 feed71ee __door_return (0, 0, 0, 0, 1, 0) + 2e
 08060a31 fmd_door_server (0) + 2e
 0807b299 fmd_thread_start (98f1370) + 5b
 feed1a3b _thrp_setup (fed88240) + 88
 feed1bd0 _lwp_start (fed88240, 0, 0, 0, 0, 0)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [developer] fmd core dump
  2024-07-22 14:01 fmd core dump Gabriele Bulfon
@ 2024-07-22 14:10 ` Toomas Soome
  2024-07-22 14:21   ` Gabriele Bulfon
  2024-08-09 14:43   ` Gabriele Bulfon
  2024-08-09 16:47 ` Pramod Batni
  1 sibling, 2 replies; 8+ messages in thread
From: Toomas Soome @ 2024-07-22 14:10 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 1476 bytes --]



> On 22. Jul 2024, at 17:01, Gabriele Bulfon via illumos-developer <developer@lists.illumos.org> wrote:
> 
> Hi, I have a couple of systems, installed in 2012 and updated up to illumos 2019 (will have to update to 2024 later).
> They periodically (every 3-4 months, sometimes earlier) create a core dump under /var/fm/fmd.
> Looks like fmd core dumped, so no email notice is sent, and we end up filling the rpool.
> I found  this link: https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
> So here I attach the pstack of one of the dumps.
>  
> Any idea?
> 

fmd_alloc() does panic when we are out of memory:

        if (data == NULL)
                fmd_panic("insufficient memory (%u bytes needed)\n", size);

You can try adding some more swap space perhaps?

rgds,
toomas

> Gabriele
>  
>  
> Sonicle S.r.l. : http://www.sonicle.com <https://www.sonicle.com/>
> Music: http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/>
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>  
> illumos <https://illumos.topicbox.com/latest> / illumos-developer / see discussions <https://illumos.topicbox.com/groups/developer> + participants <https://illumos.topicbox.com/groups/developer/members> + delivery options <https://illumos.topicbox.com/groups/developer/subscription>Permalink <https://illumos.topicbox.com/groups/developer/Tde096911559aa716-M4ffab0f05ef3ac046ce9bf36><core.fmd.dump.pstack.txt>


[-- Attachment #2: Type: text/html, Size: 6774 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [developer] fmd core dump
  2024-07-22 14:10 ` [developer] " Toomas Soome
@ 2024-07-22 14:21   ` Gabriele Bulfon
  2024-08-09 14:43   ` Gabriele Bulfon
  1 sibling, 0 replies; 8+ messages in thread
From: Gabriele Bulfon @ 2024-07-22 14:21 UTC (permalink / raw)
  To: illumos-developer


[-- Attachment #1.1: Type: text/plain, Size: 2557 bytes --]

Here are some outputs.

top:
CPU states: 90.7% idle,  5.7% user,  3.6% kernel,  0.0% iowait,  0.0% swap
Kernel: 17637 ctxsw, 19873 trap, 8926 intr, 216414 syscall, 3 fork, 16781 flt
Memory: 128G phys mem, 16G free mem, 40G total swap, 40G free swap
 
swap -lh:
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 301,2        4K    4.00G    4.00G
/dev/zvol/dsk/rpool/swap2 301,3        4K    4.00G    4.00G
/dev/zvol/dsk/data/swap4 301,4        4K    32.0G    32.0G
 
swap -sh:
total: 30.8G allocated + 10.8G reserved = 41.6G used, 29.4G available
 
prstat -Z:
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
     4     1489   29G   22G    17%  15:25:38 5.8% cloudserver
     5      185 3319M 2147M   1.6%   4:05:07 2.1% encoserver
     0       54 1036M 1044M   0.8%  15:17:20 0.8% global
     1       71 1271M  636M   0.5%   0:03:24 0.0% mlp
     3      232 7557M 5834M   4.5%   2:48:54 0.0% wp
 
G.
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 

 


Da: Toomas Soome via illumos-developer <developer@lists.illumos.org>
A: illumos-developer <developer@lists.illumos.org>
Data: 22 luglio 2024 16.10.42 CEST
Oggetto: Re: [developer] fmd core dump




On 22. Jul 2024, at 17:01, Gabriele Bulfon via illumos-developer <developer@lists.illumos.org> wrote:
Hi, I have a couple of systems, installed in 2012 and updated up to illumos 2019 (will have to update to 2024 later).
They periodically (every 3-4 months, sometimes earlier) create a core dump under /var/fm/fmd.
Looks like fmd core dumped, so no email notice is sent, and we end up filling the rpool.
I found  this link: https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
So here I attach the pstack of one of the dumps.
 
Any idea?




 
fmd_alloc() does panic when we are out of memory:
 
        if (data == NULL)
                fmd_panic("insufficient memory (%u bytes needed)\n", size);

You can try adding some more swap space perhaps?

 
rgds,
toomas

Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 


<core.fmd.dump.pstack.txt>


illumos / illumos-developer / see discussions + participants + delivery options Permalink

[-- Attachment #1.2: Type: text/html, Size: 11961 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [developer] fmd core dump
  2024-07-22 14:10 ` [developer] " Toomas Soome
  2024-07-22 14:21   ` Gabriele Bulfon
@ 2024-08-09 14:43   ` Gabriele Bulfon
  2024-08-09 16:28     ` Peter Tribble
  2024-08-09 16:54     ` Toomas Soome
  1 sibling, 2 replies; 8+ messages in thread
From: Gabriele Bulfon @ 2024-08-09 14:43 UTC (permalink / raw)
  To: illumos-developer


[-- Attachment #1.1: Type: text/plain, Size: 2829 bytes --]

The problem happened again, but this time the rpool was not yet full.
The pstack output shows again the same problem:

 feed68a5 _lwp_kill (5, 6, 22c4, fef45000, fef45000, c) + 15
 fee68a7b raise    (6) + 2b
 fee41cde abort    () + 10e
 08079939 fmd_panic (8081400)
 0807994b fmd_panic (8081400) + 12
 08065394 fmd_alloc (50, 1) + 81
 0806f6a5 fmd_event_create (1, d1da323a, 1bd4e8f, 0) + 18
 08073ae3 fmd_module_timeout (fb8ef100, 2a1, d1da323a) + 20
 0807bd21 fmd_timerq_exec (915db80) + 127
 0807b299 fmd_thread_start (8131030) + 5b
 feed1a3b _thrp_setup (fed82a40) + 88
 feed1bd0 _lwp_start (fed82a40, 0, 0, 0, 0, 0)
 
I can't believe this global zone is out of virtual memory, it's running various zones with a lot of processes and they all goes fine.
Only fmd here is going panic.
What I found is an old issue I even forgot about: an infolog_hival file is being produced continuously.
Running a tail -f on it I get a continuous output like:

port_address        w500304801d0a8808LH
PhyIdentifier88 %/pci@0,0/pci8086,2f02@1/pci15d9,808@0((
event_type      port_broadcast_sesTPclass       3resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast  version  __ttl0(__todf▒'|▒,▒▒,^C
 
As I remember, this may go on for some time then it will stop.

Any idea?
G
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 

 


Da: Toomas Soome via illumos-developer <developer@lists.illumos.org>
A: illumos-developer <developer@lists.illumos.org>
Data: 22 luglio 2024 16.10.42 CEST
Oggetto: Re: [developer] fmd core dump




On 22. Jul 2024, at 17:01, Gabriele Bulfon via illumos-developer <developer@lists.illumos.org> wrote:
Hi, I have a couple of systems, installed in 2012 and updated up to illumos 2019 (will have to update to 2024 later).
They periodically (every 3-4 months, sometimes earlier) create a core dump under /var/fm/fmd.
Looks like fmd core dumped, so no email notice is sent, and we end up filling the rpool.
I found  this link: https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
So here I attach the pstack of one of the dumps.
 
Any idea?




 
fmd_alloc() does panic when we are out of memory:
 
        if (data == NULL)
                fmd_panic("insufficient memory (%u bytes needed)\n", size);

You can try adding some more swap space perhaps?

 
rgds,
toomas

Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 


<core.fmd.dump.pstack.txt>


illumos / illumos-developer / see discussions + participants + delivery options Permalink

[-- Attachment #1.2: Type: text/html, Size: 10249 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [developer] fmd core dump
  2024-08-09 14:43   ` Gabriele Bulfon
@ 2024-08-09 16:28     ` Peter Tribble
  2024-08-09 16:54     ` Toomas Soome
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Tribble @ 2024-08-09 16:28 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]

On Fri, Aug 9, 2024 at 3:43 PM Gabriele Bulfon via illumos-developer <
developer@lists.illumos.org> wrote:

> The problem happened again, but this time the rpool was not yet full.
> The pstack output shows again the same problem:
>
>  feed68a5 _lwp_kill (5, 6, 22c4, fef45000, fef45000, c) + 15
>  fee68a7b raise    (6) + 2b
>  fee41cde abort    () + 10e
>  08079939 fmd_panic (8081400)
>  0807994b fmd_panic (8081400) + 12
>  08065394 fmd_alloc (50, 1) + 81
>  0806f6a5 fmd_event_create (1, d1da323a, 1bd4e8f, 0) + 18
>  08073ae3 fmd_module_timeout (fb8ef100, 2a1, d1da323a) + 20
>  0807bd21 fmd_timerq_exec (915db80) + 127
>  0807b299 fmd_thread_start (8131030) + 5b
>  feed1a3b _thrp_setup (fed82a40) + 88
>  feed1bd0 _lwp_start (fed82a40, 0, 0, 0, 0, 0)
>
> I can't believe this global zone is out of virtual memory, it's running
> various zones with a lot of processes and they all goes fine.
>

One thing that occurs to me - how big is the fmd process? As it's 32-bit,
it can
only grow to 4G before it can't grow any further.


> Only fmd here is going panic.
> What I found is an old issue I even forgot about: an infolog_hival file is
> being produced continuously.
> Running a tail -f on it I get a continuous output like:
>
> port_address        w500304801d0a8808LH
> PhyIdentifier88 %/pci@0,0/pci8086,2f02@1/pci15d9,808@0((
> event_type      port_broadcast_sesTPclass
> 3resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast  version
>  __ttl0(__todf▒'|▒,▒▒,^C
>
> As I remember, this may go on for some time then it will stop.
>
> Any idea?
> G
>
>
> *Sonicle S.r.l. *: http://www.sonicle.com <https://www.sonicle.com/>
> *Music: *http://www.gabrielebulfon.com
> *eXoplanets : *https://gabrielebulfon.bandcamp.com/album/exoplanets
>
>
> ------------------------------
>
>
> *Da:* Toomas Soome via illumos-developer <developer@lists.illumos.org>
> *A:* illumos-developer <developer@lists.illumos.org>
> *Data:* 22 luglio 2024 16.10.42 CEST
> *Oggetto:* Re: [developer] fmd core dump
>
>
>
>
> On 22. Jul 2024, at 17:01, Gabriele Bulfon via illumos-developer <
> developer@lists.illumos.org> wrote:
> Hi, I have a couple of systems, installed in 2012 and updated up to
> illumos 2019 (will have to update to 2024 later).
> They periodically (every 3-4 months, sometimes earlier) create a core dump
> under /var/fm/fmd.
> Looks like fmd core dumped, so no email notice is sent, and we end up
> filling the rpool.
> I found  this link:
> https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
> So here I attach the pstack of one of the dumps.
>
> Any idea?
>
>
> fmd_alloc() does panic when we are out of memory:
>
>
>         if (data == NULL)
>
>                 fmd_panic("insufficient memory (%u bytes needed)\n",
> size);
>
> You can try adding some more swap space perhaps?
>
>
> rgds,
> toomas
>
> Gabriele
>
>
> *Sonicle S.r.l. *: http://www.sonicle.com <https://www.sonicle.com/>
> *Music: *http://www.gabrielebulfon.com
> *eXoplanets : *https://gabrielebulfon.bandcamp.com/album/exoplanets
>
> <core.fmd.dump.pstack.txt>
>
>
> *illumos <https://illumos.topicbox.com/latest>* / illumos-developer / see
> discussions <https://illumos.topicbox.com/groups/developer> + participants
> <https://illumos.topicbox.com/groups/developer/members> + delivery options
> <https://illumos.topicbox.com/groups/developer/subscription> Permalink
> <https://illumos.topicbox.com/groups/developer/Tde096911559aa716-M77a6e0329454caf1b3e91297>
>


-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

[-- Attachment #2: Type: text/html, Size: 10274 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [developer] fmd core dump
  2024-07-22 14:01 fmd core dump Gabriele Bulfon
  2024-07-22 14:10 ` [developer] " Toomas Soome
@ 2024-08-09 16:47 ` Pramod Batni
  1 sibling, 0 replies; 8+ messages in thread
From: Pramod Batni @ 2024-08-09 16:47 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 2450 bytes --]

The stack of one of the threads shows a call to
umem_update_thread() implying that libumem
is being used.

I am not sure if ‘fmd’ by default uses libumem.


If not, then were you using libumem to
debug a memory related issue (perhaps  a memory leak or memory corruption?)
The XML manifest file for the fmd service
will have information as to how the
‘fmd’ process is launched — LD_PRELOAD
is set to libumem and UMEM_DEBUG is set before invoking the ‘fmd’
executable in case libumem is being used.


If so, You might want to check the value of UMEM_DEBUG environment variable.

Please keep in mind that using the debug features of libumem has a cost
overhead in terms of memory (virtual address space and physical memory)
used depending on the
value of the UMEM_DEBUG variable.

Given that there are core files of  the ‘fmd’
process in your system, you might want
to check if there are any leaks detected by
mdb’s ::findleaks dcmd.

http://technopark02.blogspot.com/2016/08/solaris-memory-leak-checking-with.html?m=1

You can look at the above website to
get informed about how to use mdb’s
dcmds to get information about
libumem data structures.

Hope this helps,

Pramod





On Mon, 22 Jul 2024 at 19:33, Gabriele Bulfon via illumos-developer <
developer@lists.illumos.org> wrote:

> Hi, I have a couple of systems, installed in 2012 and updated up to
> illumos 2019 (will have to update to 2024 later).
> They periodically (every 3-4 months, sometimes earlier) create a core dump
> under /var/fm/fmd.
> Looks like fmd core dumped, so no email notice is sent, and we end up
> filling the rpool.
> I found  this link:
> https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
> So here I attach the pstack of one of the dumps.
>
> Any idea?
>
> Gabriele
>
>
> *Sonicle S.r.l. *: http://www.sonicle.com <https://www.sonicle.com/>
> *Music: *http://www.gabrielebulfon.com
> *eXoplanets : *https://gabrielebulfon.bandcamp.com/album/exoplanets
>
> *illumos <https://illumos.topicbox.com/latest>* / illumos-developer / see
> discussions <https://illumos.topicbox.com/groups/developer> + participants
> <https://illumos.topicbox.com/groups/developer/members> + delivery options
> <https://illumos.topicbox.com/groups/developer/subscription> Permalink
> <https://illumos.topicbox.com/groups/developer/Tde096911559aa716-M4ffab0f05ef3ac046ce9bf36>
>

[-- Attachment #2: Type: text/html, Size: 6829 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [developer] fmd core dump
  2024-08-09 14:43   ` Gabriele Bulfon
  2024-08-09 16:28     ` Peter Tribble
@ 2024-08-09 16:54     ` Toomas Soome
  1 sibling, 0 replies; 8+ messages in thread
From: Toomas Soome @ 2024-08-09 16:54 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 3768 bytes --]

Well, fmd_alloc is taking two arguments, size and flags, so we are trying to allocate 50 bytes there, but failing.

What does pmap -x core tell? Or pmap -S core? It is possible that you are not out of memory, but out of swap (to make swap reservations).

rgds,
toomas

> On 9. Aug 2024, at 17:43, Gabriele Bulfon via illumos-developer <developer@lists.illumos.org> wrote:
> 
> The problem happened again, but this time the rpool was not yet full.
> The pstack output shows again the same problem:
> 
>  feed68a5 _lwp_kill (5, 6, 22c4, fef45000, fef45000, c) + 15
>  fee68a7b raise    (6) + 2b
>  fee41cde abort    () + 10e
>  08079939 fmd_panic (8081400)
>  0807994b fmd_panic (8081400) + 12
>  08065394 fmd_alloc (50, 1) + 81
>  0806f6a5 fmd_event_create (1, d1da323a, 1bd4e8f, 0) + 18
>  08073ae3 fmd_module_timeout (fb8ef100, 2a1, d1da323a) + 20
>  0807bd21 fmd_timerq_exec (915db80) + 127
>  0807b299 fmd_thread_start (8131030) + 5b
>  feed1a3b _thrp_setup (fed82a40) + 88
>  feed1bd0 _lwp_start (fed82a40, 0, 0, 0, 0, 0)
>  
> I can't believe this global zone is out of virtual memory, it's running various zones with a lot of processes and they all goes fine.
> Only fmd here is going panic.
> What I found is an old issue I even forgot about: an infolog_hival file is being produced continuously.
> Running a tail -f on it I get a continuous output like:
> 
> port_address        w500304801d0a8808LH
> PhyIdentifier88 %/pci@0,0/pci8086,2f02@1/pci15d9,808@0((
> event_type      port_broadcast_sesTPclass       3resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast  version  __ttl0(__todf▒'|▒,▒▒,^C
>  
> As I remember, this may go on for some time then it will stop.
> 
> Any idea?
> G
>  
>  
> Sonicle S.r.l. : http://www.sonicle.com <https://www.sonicle.com/>
> Music: http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/>
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>  
>  
> 
> 
> Da: Toomas Soome via illumos-developer <developer@lists.illumos.org <mailto:developer@lists.illumos.org>>
> A: illumos-developer <developer@lists.illumos.org <mailto:developer@lists.illumos.org>>
> Data: 22 luglio 2024 16.10.42 CEST
> Oggetto: Re: [developer] fmd core dump
> 
> 
> 
> 
> On 22. Jul 2024, at 17:01, Gabriele Bulfon via illumos-developer <developer@lists.illumos.org> wrote:
> Hi, I have a couple of systems, installed in 2012 and updated up to illumos 2019 (will have to update to 2024 later).
> They periodically (every 3-4 months, sometimes earlier) create a core dump under /var/fm/fmd.
> Looks like fmd core dumped, so no email notice is sent, and we end up filling the rpool.
> I found  this link: https://support.oracle.com/knowledge/Sun%20Microsystems/1020519_1.html
> So here I attach the pstack of one of the dumps.
>  
> Any idea?
> 
>  
> fmd_alloc() does panic when we are out of memory:
>  
>         if (data == NULL)
>                 fmd_panic("insufficient memory (%u bytes needed)\n", size);
> You can try adding some more swap space perhaps?
>  
> rgds,
> toomas
> 
> Gabriele
>  
>  
> Sonicle S.r.l. : http://www.sonicle.com <https://www.sonicle.com/>
> Music: http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/>
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>  
> <core.fmd.dump.pstack.txt>
> 
> illumos <https://illumos.topicbox.com/latest> / illumos-developer / see discussions <https://illumos.topicbox.com/groups/developer> + participants <https://illumos.topicbox.com/groups/developer/members> + delivery options <https://illumos.topicbox.com/groups/developer/subscription>Permalink <https://illumos.topicbox.com/groups/developer/Tde096911559aa716-M77a6e0329454caf1b3e91297>

[-- Attachment #2: Type: text/html, Size: 15155 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* fmd core dump
@ 2020-03-20 14:27 Gabriele Bulfon
  0 siblings, 0 replies; 8+ messages in thread
From: Gabriele Bulfon @ 2020-03-20 14:27 UTC (permalink / raw)
  To: illumos-developer


[-- Attachment #1.1: Type: text/plain, Size: 1388 bytes --]

Hi, I have a system (not really a recent illumos kernel, probably around 2012) that recently caused a couple of core.fmd.xxx dumps, here's what mdb is saying:
 
bash-4.2# mdb core.fmd.1104
Loading modules: [ fmd libumem.so.1 libc.so.1 libnvpair.so.1 libtopo.so.1 libuut il.so.1 libavl.so.1 libsysevent.so.1 eft.so ld.so.1 ]
$C
fd98e988 libc.so.1`_lwp_kill+0x15(4, 6, 120a0, fef58000, fef58000, 4)
fd98e9a8 libc.so.1`raise+0x2b(6, 0, fd98e9c0, feed83e9, 0, 0)
fd98e9f8 libc.so.1`abort+0x10e(3a646d66, 4f424120, 203a5452, 75736e69, 63696666
, 746e6569)
fd98ee18 fmd_panic(8080ec0, fd98ee44, 1, 0)
fd98ee38 fmd_panic+0x12(8080ec0, c, 3e8, ffb3dd87)
fd98ee78 fmd_alloc+0x81(c, 1, 1dca2110, 0, 893c688, 84fd718)
fd98eeb8 fmd_eventq_insert_at_head+0x43(890bb48, 91ec5b8, 0, 92f1ab2d)
fd98eed8 fmd_module_gc+0x66(893c680, 0, 0, fd98eef8)
fd98ef18 fmd_modhash_apply+0x3e(84fd718, 8073d50, 0, 0, 6c275b0e, 30cef3)
fd98ef48 fmd_gc+0x28(80998c0, d, ff19063b, 30ceff, 84f8a48)
fd98efa8 fmd_timerq_exec+0x127(84f8a40, 0, feda22a0, fef58000)
fd98efc8 fmd_thread_start+0x5b(826cfb8, 0, 0, 0)
fd98efe8 libc.so.1`_thrp_setup+0x88(feda2240)
fd98eff8 libc.so.1`_lwp_start(feda2240, 0, 0, 0, 0, 0)
 
Any idea?
 
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon

[-- Attachment #1.2: Type: text/html, Size: 2564 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-09 16:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 14:01 fmd core dump Gabriele Bulfon
2024-07-22 14:10 ` [developer] " Toomas Soome
2024-07-22 14:21   ` Gabriele Bulfon
2024-08-09 14:43   ` Gabriele Bulfon
2024-08-09 16:28     ` Peter Tribble
2024-08-09 16:54     ` Toomas Soome
2024-08-09 16:47 ` Pramod Batni
  -- strict thread matches above, loose matches on Subject: below --
2020-03-20 14:27 Gabriele Bulfon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).