public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
From: maurilio.longo@libero.it
To: illumos-developer <developer@lists.illumos.org>
Subject: Re: [developer] panic inside lmrc driver
Date: Thu, 22 Feb 2024 12:44:15 -0500	[thread overview]
Message-ID: <17086238550.9b5c.407136@composer.illumos.topicbox.com> (raw)
In-Reply-To: <17084195390.7Be2F8Bc.38165@composer.illumos.topicbox-beta.com>

[-- Attachment #1: Type: text/plain, Size: 5446 bytes --]

So, after upgrading the controller's firmware I did reboot the computer and restart my tests which ended a few hours laters with the disks in a state similar to this:

                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
    0.0    3.0    0.0    0.1  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2
    0.0    2.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2t3d0
    0.0    1.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2t4d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c3
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c3t001B448B4A7140BFd0
    0.0    0.0    0.0    0.0  0.0 13.0    0.0    0.0   0 400   0   0   0   0 c4
    0.0    0.0    0.0    0.0  0.0  3.0    0.0    0.0   0 100   0   0   0   0 c4t50014EE6B2513C38d0
    0.0    0.0    0.0    0.0  0.0  3.0    0.0    0.0   0 100   0   0   0   0 c4t5000CCA85EE5ECB0d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 c4t5000C500AAF9B0C3d0
    0.0    0.0    0.0    0.0  0.0  3.0    0.0    0.0   0 100   0   0   0   0 c4t5000C500AAF9BF2Fd0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c5
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c5tACE42E0005CFF480d0
    0.0    0.0    0.0    0.0 246.0 13.0    0.0    0.0 100 100   0   0   0   0 dati
    0.0    0.0    0.0    0.0  0.2  0.1    0.0    0.0   2   2   0   0   0   0 rpool
                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
    0.0   15.0    0.0    0.4  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2
    0.0    7.0    0.0    0.2  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2t3d0
    0.0    8.0    0.0    0.2  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2t4d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c3
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c3t001B448B4A7140BFd0
    0.0    0.0    0.0    0.0  0.0 13.0    0.0    0.0   0 400   0   0   0   0 c4
    0.0    0.0    0.0    0.0  0.0  3.0    0.0    0.0   0 100   0   0   0   0 c4t50014EE6B2513C38d0
    0.0    0.0    0.0    0.0  0.0  3.0    0.0    0.0   0 100   0   0   0   0 c4t5000CCA85EE5ECB0d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0   0   0 c4t5000C500AAF9B0C3d0
    0.0    0.0    0.0    0.0  0.0  3.0    0.0    0.0   0 100   0   0   0   0 c4t5000C500AAF9BF2Fd0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c5
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c5tACE42E0005CFF480d0
    0.0    0.0    0.0    0.0 246.0 13.0    0.0    0.0 100 100   0   0   0   0 dati
    0.0    0.0    0.0    0.0  0.3  0.2    0.0    0.0   4   5   0   0   0   0 rpool

So I powered off the unit and restarted it but this time I executed the command "mdb -K" followed by ":c" on the console, to have the kernel debugger ready for the next lockup.

Instead my unit spent the next two days running my tests (iozone + zfs recv + zpool scrub) without any problem.

Given I needed a crash dump during the lockup,  I decided to re-enable the onboard ILO5 to be able to generate an NMI and started again my tests (without the mdb -K part) and went on with my other chores.

Well, it took less than three hours for the disks to become stuck again, just like above, but this time I've generated a NMI and I have a crash dump here:

https://mega.nz/file/16kShQ6Y#nAv0tLbIvydBy6uaEX87d1VXMD-NIeLkg4GgvDnMatY

::msgbuf ends with this lmrc warning/error

NOTICE: lmrc0: Drive 00(e252/Port 1I Box 0 Bay 0) Path 300062b20dde3640  reset (Type 03)
NOTICE: lmrc0: unknown AEN received, seqnum = 19954, timestamp = 761927783, code = 27f, locale = 2, class = 0, argtype = 10
NOTICE: lmrc0: Drive 00(e252/Port 1I Box 0 Bay 0) link speed changed

panic[cpu0]/thread=fffffe00f3e05c20:
NMI received

I hope this can shed some light on the problem because it seems that running with kernel debugger active it alters something (timings?) just that little that is needed to make the system (a lot more) solid. 
I have, very seldomly in the past two weeks, been able to run my tests for 48 straight hours without issues.

Best regards
Maurilio.

[-- Attachment #2: Type: text/html, Size: 16823 bytes --]

  reply	other threads:[~2024-02-22 17:44 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-07 10:31 maurilio.longo
2024-02-07 11:14 ` [developer] " Peter Tribble
2024-02-07 11:16 ` Hans Rosenfeld
2024-02-07 11:46   ` maurilio.longo
2024-02-07 12:04     ` maurilio.longo
2024-02-07 12:29     ` Hans Rosenfeld
2024-02-07 13:01     ` maurilio.longo
2024-02-07 16:47       ` maurilio.longo
2024-02-07 17:01         ` Hans Rosenfeld
2024-02-07 21:35         ` Hans Rosenfeld
2024-02-08  7:59           ` maurilio.longo
2024-02-08 10:42             ` Hans Rosenfeld
2024-02-08 11:01               ` maurilio.longo
2024-02-09  8:06                 ` maurilio.longo
2024-02-09 19:55                   ` Hans Rosenfeld
2024-02-09 20:36                     ` maurilio.longo
2024-02-12  8:11                       ` maurilio.longo
2024-02-13 19:32                         ` Hans Rosenfeld
2024-02-13 20:55                           ` maurilio.longo
2024-02-13 21:03                           ` maurilio.longo
2024-02-15  6:50                             ` maurilio.longo
2024-02-15  7:38                               ` Carsten Grzemba
2024-02-15  8:11                                 ` Toomas Soome
2024-02-15  9:44                                   ` maurilio.longo
2024-02-16 14:36                                     ` maurilio.longo
2024-02-16 15:20                                       ` maurilio.longo
2024-02-16 15:27                                         ` maurilio.longo
2024-02-16 15:42                                           ` Robert Mustacchi
2024-02-16 20:41                                             ` maurilio.longo
2024-02-19 21:14                                               ` maurilio.longo
2024-02-20  8:58                                                 ` maurilio.longo
2024-02-22 17:44                                                   ` maurilio.longo [this message]
2024-02-26  8:59                                                     ` maurilio.longo
2024-02-26  9:30                                                       ` Carsten Grzemba
2024-08-26 12:30                                                         ` manipm
2024-08-26 15:23                                                           ` Hans Rosenfeld
2024-02-13 19:23                       ` Hans Rosenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17086238550.9b5c.407136@composer.illumos.topicbox.com \
    --to=maurilio.longo@libero.it \
    --cc=developer@lists.illumos.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).