* panic inside lmrc driver @ 2024-02-07 10:31 maurilio.longo 2024-02-07 11:14 ` [developer] " Peter Tribble 2024-02-07 11:16 ` Hans Rosenfeld 0 siblings, 2 replies; 37+ messages in thread From: maurilio.longo @ 2024-02-07 10:31 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 541 bytes --] Hi, I'm using omnios bloody 20240111 to see if my HPE MR216i-p controller is recognized. It is, its pci id, pciex1000,10e2, is between the ones assigned to lmrc, but when booting I get a panic inside the driver which can be seen here https://imgur.com/a/qhRuYJ2 Sorry for the low quality. This is on a HPE ML30 Gen9 unit, I can give more info if needed. I'd also like to thank all involved in its development, because this driver is a much needed component to be able to use modern HPE servers. Best regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 821 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 10:31 panic inside lmrc driver maurilio.longo @ 2024-02-07 11:14 ` Peter Tribble 2024-02-07 11:16 ` Hans Rosenfeld 1 sibling, 0 replies; 37+ messages in thread From: Peter Tribble @ 2024-02-07 11:14 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 1418 bytes --] On Wed, Feb 7, 2024 at 10:31 AM maurilio.longo via illumos-developer < developer@lists.illumos.org> wrote: > Hi, > I'm using omnios bloody 20240111 to see if my HPE MR216i-p controller is > recognized. > There's a newer version - 20240206 - which I think has at least one lmrc fix. You may need to scroll to the bottom of the download page to see it https://downloads.omnios.org/media/bloody/ > It is, its pci id, pciex1000,10e2, is between the ones assigned to lmrc, > but when booting I get a panic inside the driver which can be seen here > > https://imgur.com/a/qhRuYJ2 > > Sorry for the low quality. > > This is on a HPE ML30 Gen9 unit, I can give more info if needed. > > I'd also like to thank all involved in its development, because this > driver is a much needed component to be able to use modern HPE servers. > > Best regards. > Maurilio. > > *illumos <https://illumos.topicbox-beta.com/latest>* / illumos-developer > / see discussions <https://illumos.topicbox-beta.com/groups/developer> + > participants <https://illumos.topicbox-beta.com/groups/developer/members> > + delivery options > <https://illumos.topicbox-beta.com/groups/developer/subscription> > Permalink > <https://illumos.topicbox-beta.com/groups/developer/Tf091423c9add514f-M61629bdd91b45a5da2bea103> > -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ [-- Attachment #2: Type: text/html, Size: 2869 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 10:31 panic inside lmrc driver maurilio.longo 2024-02-07 11:14 ` [developer] " Peter Tribble @ 2024-02-07 11:16 ` Hans Rosenfeld 2024-02-07 11:46 ` maurilio.longo 1 sibling, 1 reply; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-07 11:16 UTC (permalink / raw) To: illumos-developer Hi Maurilio, On Wed, Feb 07, 2024 at 05:31:28AM -0500, maurilio.longo via illumos-developer wrote: > I'm using omnios bloody 20240111 to see if my HPE MR216i-p controller > is recognized. > It is, its pci id, pciex1000,10e2, is between the ones assigned to > lmrc, but when booting I get a panic inside the driver which can be > seen here > > https://imgur.com/a/qhRuYJ2 This is a panic stack I haven't seen before. Obviously it's running into the default case at the end of lmrc_process_mpt_pkt(), meaning lmrc got an unknown status from a command that completed and doesn't know how to proceed from there. Can you please send me the panic message? That should be included near the end of the output of ::msgbuf, just before the panic stack. It should look like this: command failed, status = %x, ex_status = %x, cdb[0] = %x (That being said, I realize starting a panic message with a ! is a bug in itself. Sorry.) Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 11:16 ` Hans Rosenfeld @ 2024-02-07 11:46 ` maurilio.longo 2024-02-07 12:04 ` maurilio.longo ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: maurilio.longo @ 2024-02-07 11:46 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 61 bytes --] Hi Hans, here it is https://imgur.com/a/ZaSogO7 Maurilio [-- Attachment #2: Type: text/html, Size: 207 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 11:46 ` maurilio.longo @ 2024-02-07 12:04 ` maurilio.longo 2024-02-07 12:29 ` Hans Rosenfeld 2024-02-07 13:01 ` maurilio.longo 2 siblings, 0 replies; 37+ messages in thread From: maurilio.longo @ 2024-02-07 12:04 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 164 bytes --] Hi Peter, > There's a newer version - 20240206 - which I think has at least one lmrc fix. same stack trace using the 20240206 iso image. Regards. Maurilio [-- Attachment #2: Type: text/html, Size: 346 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 11:46 ` maurilio.longo 2024-02-07 12:04 ` maurilio.longo @ 2024-02-07 12:29 ` Hans Rosenfeld 2024-02-07 13:01 ` maurilio.longo 2 siblings, 0 replies; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-07 12:29 UTC (permalink / raw) To: illumos-developer On Wed, Feb 07, 2024 at 06:46:58AM -0500, maurilio.longo via illumos-developer wrote: > Hi Hans, > here it is > https://imgur.com/a/ZaSogO7 Thanks. Apparently no one knows that status 0x76 is, but we probably still shouldn't panic. I've filed a bug for this: https://www.illumos.org/issues/16241 Do you have the means to build an OmniOS ISO with the patch included to test it? If not, I can build one for you. Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 11:46 ` maurilio.longo 2024-02-07 12:04 ` maurilio.longo 2024-02-07 12:29 ` Hans Rosenfeld @ 2024-02-07 13:01 ` maurilio.longo 2024-02-07 16:47 ` maurilio.longo 2 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-07 13:01 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 319 bytes --] Hi Hans, I'm sorry, I don't know how to build it. I've tried, in the mean time, to boot it with freeBSD, and it boots and can see the disks, but it uses mr_sas as driver I'd say. Could it be that freeBSD's mr_sar has this status ? In any case, if you can build me an ISO it would be great. Thanks. Maurilio. [-- Attachment #2: Type: text/html, Size: 470 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 13:01 ` maurilio.longo @ 2024-02-07 16:47 ` maurilio.longo 2024-02-07 17:01 ` Hans Rosenfeld 2024-02-07 21:35 ` Hans Rosenfeld 0 siblings, 2 replies; 37+ messages in thread From: maurilio.longo @ 2024-02-07 16:47 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 239 bytes --] Hi Hans, no need for a full ISO, I can install latest omniOS on a sata disk using the onboard AHCI controller after the removal of the MR216i-p, upgrade the driver and then reinstall the HBA and see if it works or not. Regards. Maurilio [-- Attachment #2: Type: text/html, Size: 334 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 16:47 ` maurilio.longo @ 2024-02-07 17:01 ` Hans Rosenfeld 2024-02-07 21:35 ` Hans Rosenfeld 1 sibling, 0 replies; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-07 17:01 UTC (permalink / raw) To: illumos-developer On Wed, Feb 07, 2024 at 11:47:02AM -0500, maurilio.longo via illumos-developer wrote: > Hi Hans, > no need for a full ISO, I can install latest omniOS on a sata disk using the onboard AHCI controller after the removal of the MR216i-p, upgrade the driver and then reinstall the HBA and see if it works or not. > Regards. > Maurilio Let me know when you have the install ready. I need to know the exact 'uname -v' output to build a matching lmrc module. Or you wait another hour or so until my OmniOS ISO build finishes... Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 16:47 ` maurilio.longo 2024-02-07 17:01 ` Hans Rosenfeld @ 2024-02-07 21:35 ` Hans Rosenfeld 2024-02-08 7:59 ` maurilio.longo 1 sibling, 1 reply; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-07 21:35 UTC (permalink / raw) To: illumos-developer Hi Maurilio, try this iso, please: https://grumpf.hope-2000.org/r151049.iso Hans On Wed, Feb 07, 2024 at 11:47:02AM -0500, maurilio.longo via illumos-developer wrote: > Hi Hans, > no need for a full ISO, I can install latest omniOS on a sata disk using the onboard AHCI controller after the removal of the MR216i-p, upgrade the driver and then reinstall the HBA and see if it works or not. > Regards. > Maurilio > ------------------------------------------ > illumos: illumos-developer > Permalink: https://illumos.topicbox.com/groups/developer/Tf091423c9add514f-M05c068719ffc969c3ef0b50b > Delivery options: https://illumos.topicbox.com/groups/developer/subscription -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-07 21:35 ` Hans Rosenfeld @ 2024-02-08 7:59 ` maurilio.longo 2024-02-08 10:42 ` Hans Rosenfeld 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-08 7:59 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 393 bytes --] Hi Hans, thank a lot for the ISO, I've just bootstrapped my PC with it and it does not panic anymore. Disks are recognized and working. I'll be making tests in the coming days to be sure everything works as expected. Is this driver dependant on a particular kernel or can I use it with a non bloody build or different distros, like hipster? Thanks again and best regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 562 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-08 7:59 ` maurilio.longo @ 2024-02-08 10:42 ` Hans Rosenfeld 2024-02-08 11:01 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-08 10:42 UTC (permalink / raw) To: illumos-developer On Thu, Feb 08, 2024 at 02:59:32AM -0500, maurilio.longo via illumos-developer wrote: > thank a lot for the ISO, I've just bootstrapped my PC with it and it does not panic anymore. > Disks are recognized and working. > I'll be making tests in the coming days to be sure everything works as > expected. Thanks! > Is this driver dependant on a particular kernel or can I use it with a > non bloody build or different distros, like hipster? It may work. Or it may fail in interesting ways. I'd recommend waiting for this fix to integrate, which should happen within the next days. Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-08 10:42 ` Hans Rosenfeld @ 2024-02-08 11:01 ` maurilio.longo 2024-02-09 8:06 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-08 11:01 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 260 bytes --] Hi Hans, I've moved the controller on a newer PC, this is an ML30 Gen10plus, and here you can see, if you're interested, the boot log. https://pastebin.com/Z9gPsLwD There are a few status 76 messages without apparent consequences. Regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 460 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-08 11:01 ` maurilio.longo @ 2024-02-09 8:06 ` maurilio.longo 2024-02-09 19:55 ` Hans Rosenfeld 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-09 8:06 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 2961 bytes --] Hi Hans, I got a new panic this morning (for which I have the dump) and one yesterday, but my dump space was limited and so I've lost it. this is last page of ::msgbuf and i see lmrc in the stack so I presume it is related. IP Filter: v4.1.9, running. WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b NOTICE: lmrc0: Drive 00(e252/Port 1I Box 0 Bay 0) Path 300062b20dde3640 reset (Type 03) NOTICE: bge0: bge_check_copper: link now up speed 1000 duplex 2 NOTICE: bge0 link up, 1000 Mbps, full duplex panic[cpu1]/thread=fffffe00f4f1fc20: BAD TRAP: type=e (#pf Page fault) rp=fffffe00f4f1f7e0 addr=0 occurred in module "scsi" due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x0 pid=0, pc=0xfffffffff3908e2d, sp=0xfffffe00f4f1f8d0, eflags=0x10246 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 3626f8<smap,smep,osxsav,pcide,vmxe,xmme,fxsr,pge,mce,pae,pse,de> cr2: 0 cr3: 8000000 cr8: 0 rdi: 0 rsi: 1 rdx: fffffe00f4f1fc20 rcx: 2 r8: 4c1224e968 r9: 0 rax: 0 rbx: 0 rbp: fffffe00f4f1f920 r10: d5895e41ab r11: fffffe00f4f1fc20 r12: 0 r13: fffffeb1dab77000 r14: 1 r15: fffffeb1daacc058 fsb: 0 gsb: fffffeb1d4daf000 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffff3908e2d cs: 30 rfl: 10246 rsp: fffffe00f4f1f8d0 ss: 38 fffffe00f4f1f6f0 unix:die+c0 () fffffe00f4f1f7d0 unix:trap+999 () fffffe00f4f1f7e0 unix:cmntrap+e9 () fffffe00f4f1f920 scsi:scsi_tgtmap_beginf+2d () fffffe00f4f1f940 scsi:scsi_hba_tgtmap_set_begin+16 () fffffe00f4f1fa90 lmrc:lmrc_phys_update_tgtmap+40 () fffffe00f4f1fad0 lmrc:lmrc_get_pd_list+5a () fffffe00f4f1faf0 lmrc:lmrc_phys_aen_handler+4d () fffffe00f4f1fb50 lmrc:lmrc_aen_handler+1eb () fffffe00f4f1fc00 genunix:taskq_thread+2a6 () fffffe00f4f1fc10 unix:thread_start+b () dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel + curproc NOTICE: ahci0: ahci_tran_reset_dport port 5 reset port NOTICE: ahci0: ahci_tran_reset_dport port 6 reset port > > ::quit Regards Maurilio. [-- Attachment #2: Type: text/html, Size: 5193 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-09 8:06 ` maurilio.longo @ 2024-02-09 19:55 ` Hans Rosenfeld 2024-02-09 20:36 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-09 19:55 UTC (permalink / raw) To: illumos-developer On Fri, Feb 09, 2024 at 03:06:50AM -0500, maurilio.longo via illumos-developer wrote: > Hi Hans, > I got a new panic this morning (for which I have the dump) and one > yesterday, but my dump space was limited and so I've lost it. Can you get me the dump, provided you still have one or can get one? Was this system running the OmniOS that I prepared for you earlier this week? Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-09 19:55 ` Hans Rosenfeld @ 2024-02-09 20:36 ` maurilio.longo 2024-02-12 8:11 ` maurilio.longo 2024-02-13 19:23 ` Hans Rosenfeld 0 siblings, 2 replies; 37+ messages in thread From: maurilio.longo @ 2024-02-09 20:36 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 190 bytes --] Hi Hans, yes I'm running your ISO while doing tests and here you can find the dump file https://mega.nz/file/cmkxBATJ#89P3wguYNEBxeeib4AZxwIKVEXR-AmTBl56C4zX8nxE Regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 416 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-09 20:36 ` maurilio.longo @ 2024-02-12 8:11 ` maurilio.longo 2024-02-13 19:32 ` Hans Rosenfeld 2024-02-13 19:23 ` Hans Rosenfeld 1 sibling, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-12 8:11 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 6257 bytes --] Hi Hans, I'm sorry to report that I got a new problem, I was executing a zfs recv of a few GBs of data while, at the same time, executing a dd if=/dev/zero of=/... onto my test pool, which right now has 4 disks in two mirror vdevs, when the pool stopped responding. In /var/adm/messages this is what I have Feb 9 19:38:54 pg-1 ipmi: [ID 183295 kern.info] SMBIOS type 0x1, addr 0xca2 Feb 9 19:38:54 pg-1 ipmi: [ID 306142 kern.info] device rev. 3, firmware rev. 2.65, version 2.0 Feb 9 19:38:54 pg-1 ipmi: [ID 935091 kern.info] number of channels 2 Feb 9 19:38:54 pg-1 ipmi: [ID 699450 kern.info] watchdog supported Feb 9 19:38:55 pg-1 scsi: [ID 583861 kern.info] ses0 at lmrc2: target-port w300162b20dde3640 lun 0 Feb 9 19:38:55 pg-1 genunix: [ID 936769 kern.info] ses0 is /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@p0/enclosure@w300162b20dde3> Feb 9 19:42:05 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Fri> Feb 9 19:51:46 pg-1 rootnex: [ID 349649 kern.info] xsvc0 at root: space 0 offset 0 Feb 9 19:51:46 pg-1 genunix: [ID 936769 kern.info] xsvc0 is /xsvc@0,0 Feb 10 08:24:08 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 08:24:16 pg-1 lmrc: [ID 998901 kern.warning] WARNING: lmrc0: AEN failed, status = 255 Feb 10 08:24:16 pg-1 lmrc: [ID 831201 kern.warning] WARNING: lmrc0: PD map sync failed, status = 255 Feb 10 08:24:17 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been su> Feb 10 08:24:22 pg-1 lmrc: [ID 864919 kern.notice] NOTICE: lmrc0: FW is in fault state! Feb 10 08:24:22 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 08:28:02 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> Feb 10 08:28:02 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> Feb 10 08:28:03 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> Feb 10 08:28:03 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> Feb 10 08:28:10 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 08:28:18 pg-1 lmrc: [ID 380853 kern.warning] WARNING: lmrc0: LD target map sync failed, status = 255 Feb 10 08:34:49 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 08:41:27 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 08:48:06 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 08:54:45 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 09:01:24 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 09:08:03 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 09:14:42 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 10 09:21:21 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... and it was still trying to reset it this morning, iostat -indexC shows this r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 123.3 0.0 651.6 0.0 0.0 0.0 0.3 0 4 0 0 0 0 c2 0.0 61.1 0.0 325.8 0.0 0.0 0.0 0.4 0 3 0 0 0 0 c2t3d0 0.0 62.1 0.0 325.8 0.0 0.0 0.0 0.2 0 1 0 0 0 0 c2t4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t001B448B4A7140BFd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 116 116 c4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 24 24 c4t50014EE6B2513C38d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 24 24 c4t5000CCA85EE5ECB0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 18 18 c4t5000C500AAF9B0C3d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 50 50 c4t5000C500AAF9BF2Fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c5tACE42E0005CFF480d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 dati 0.0 119.3 0.0 651.6 0.4 0.0 3.3 0.3 3 3 0 0 0 0 rpool here the pool configuration where the problem occurred pool: dati state: ONLINE scan: scrub repaired 0 in 0 days 00:53:43 with 0 errors on Fri Feb 9 19:03:47 2024 config: NAME STATE READ WRITE CKSUM dati ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c4t5000CCA85EE5ECB0d0 ONLINE 0 0 0 c4t50014EE6B2513C38d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c4t5000C500AAF9B0C3d0 ONLINE 0 0 0 c4t5000C500AAF9BF2Fd0 ONLINE 0 0 0 logs c3t001B448B4A7140BFd0s0 ONLINE 0 0 0 errors: No known data errors So I forced a reboot with a dump, which you can find here: https://mega.nz/file/YqczlQbS#XJ7q0-NIDezq3czIu3qyqVdEs8JA7aM2uAicEUEjX0E Best regards. Maurilio [-- Attachment #2: Type: text/html, Size: 21072 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-12 8:11 ` maurilio.longo @ 2024-02-13 19:32 ` Hans Rosenfeld 2024-02-13 20:55 ` maurilio.longo 2024-02-13 21:03 ` maurilio.longo 0 siblings, 2 replies; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-13 19:32 UTC (permalink / raw) To: developer On Mon, Feb 12, 2024 at 03:11:52AM -0500, maurilio.longo via illumos-developer wrote: > Hi Hans, > I'm sorry to report that I got a new problem, I was executing a zfs recv of a few GBs of data while, at the same time, executing a dd if=/dev/zero of=/... onto my test pool, which right now has 4 disks in two mirror vdevs, when the pool stopped responding. > > In /var/adm/messages this is what I have > > Feb 9 19:38:54 pg-1 ipmi: [ID 183295 kern.info] SMBIOS type 0x1, addr 0xca2 > Feb 9 19:38:54 pg-1 ipmi: [ID 306142 kern.info] device rev. 3, firmware rev. 2.65, version 2.0 > Feb 9 19:38:54 pg-1 ipmi: [ID 935091 kern.info] number of channels 2 > Feb 9 19:38:54 pg-1 ipmi: [ID 699450 kern.info] watchdog supported > Feb 9 19:38:55 pg-1 scsi: [ID 583861 kern.info] ses0 at lmrc2: target-port w300162b20dde3640 lun 0 > Feb 9 19:38:55 pg-1 genunix: [ID 936769 kern.info] ses0 is /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@p0/enclosure@w300162b20dde3> > Feb 9 19:42:05 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Fri> > Feb 9 19:51:46 pg-1 rootnex: [ID 349649 kern.info] xsvc0 at root: space 0 offset 0 > Feb 9 19:51:46 pg-1 genunix: [ID 936769 kern.info] xsvc0 is /xsvc@0,0 > Feb 10 08:24:08 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 08:24:16 pg-1 lmrc: [ID 998901 kern.warning] WARNING: lmrc0: AEN failed, status = 255 > Feb 10 08:24:16 pg-1 lmrc: [ID 831201 kern.warning] WARNING: lmrc0: PD map sync failed, status = 255 > Feb 10 08:24:17 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been su> > Feb 10 08:24:22 pg-1 lmrc: [ID 864919 kern.notice] NOTICE: lmrc0: FW is in fault state! > Feb 10 08:24:22 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 08:28:02 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> > Feb 10 08:28:02 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> > Feb 10 08:28:03 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> > Feb 10 08:28:03 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Sat> > Feb 10 08:28:10 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 08:28:18 pg-1 lmrc: [ID 380853 kern.warning] WARNING: lmrc0: LD target map sync failed, status = 255 > Feb 10 08:34:49 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 08:41:27 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 08:48:06 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 08:54:45 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 09:01:24 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 09:08:03 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 09:14:42 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... > Feb 10 09:21:21 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... This looks vaguely similar to what we've seen on DELL H755 controllers, where the controller just drops dead after a while, needing a full power cycle to come back to life. (See https://www.illumos.org/issues/15935) If you just reset the system (not power cycling), does it come up correctly again after this? Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-13 19:32 ` Hans Rosenfeld @ 2024-02-13 20:55 ` maurilio.longo 2024-02-13 21:03 ` maurilio.longo 1 sibling, 0 replies; 37+ messages in thread From: maurilio.longo @ 2024-02-13 20:55 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 895 bytes --] Hi Hans, > Thanks! I've filed bug for this: https://www.illumos.org/issues/16277 reguarding your bug report, I had just plugged in a new disk before expanding my pool from two disks to four, and I did this as soon as the system reached the login: prompt after a reboot. When I inserted the second one, nothing happened. Thursday I'll be able to try to add a new disk right after a reboot to see if I can cause the problem again. Btw, > char [128] evt_descr = [ "Inserted: Drive 02(e252/Port 1I Box 0 Bay 0)" ] My unit has a single SFF disk cage which can hold 8 disks, I did insert mine into slot 5 or 6, but as you can see it shows Bay 0, which is wrong; the cage has disks numbered from 1 to 8, from left to right. This is the cage: https://www.servershop24.de/en/hpe-sff-gen9-gen10-cage/a-117573/ Thanks, I'll let you know if I can break it again ;-) Maurilio. [-- Attachment #2: Type: text/html, Size: 1459 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-13 19:32 ` Hans Rosenfeld 2024-02-13 20:55 ` maurilio.longo @ 2024-02-13 21:03 ` maurilio.longo 2024-02-15 6:50 ` maurilio.longo 1 sibling, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-13 21:03 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 585 bytes --] Hi Hans, > This looks vaguely similar to what we've seen on DELL H755 controllers, I think I did just issue a reboot from the remote session and it rebooted without problems. Tomorrow I'm away so the system, which is powered on, will be idle most of the time, let's see if it happens again, in which case I'll just reset it, meaning, [Ctrl][Alt][Del] from the console. I've also changed a couple of things since the other day: disabled ILO and changed the power profile, see my other thread on ACPI parsing errors. Best regards and thanks for your help. Maurilio. [-- Attachment #2: Type: text/html, Size: 876 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-13 21:03 ` maurilio.longo @ 2024-02-15 6:50 ` maurilio.longo 2024-02-15 7:38 ` Carsten Grzemba 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-15 6:50 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 261 bytes --] Hi Hans, this morning I've found the unit completely frozen, no keyboard input, no network access. I had to power cycle it. Nothing inside /var/adm/messages, apart from a reboot yesterday, but with no new crash dump in /var/crash. Regards. Maurilio [-- Attachment #2: Type: text/html, Size: 412 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-15 6:50 ` maurilio.longo @ 2024-02-15 7:38 ` Carsten Grzemba 2024-02-15 8:11 ` Toomas Soome 0 siblings, 1 reply; 37+ messages in thread From: Carsten Grzemba @ 2024-02-15 7:38 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 501 bytes --] If it again the case try to force a crash dump on shutdown, e.g. add /etc/system: set pcplusmp:apic_panic_on_nmi = 1 then if the system is frozen $ ipmitool -Ilanplus -U idrac-user -P password -H idrac-ip power diag Note that you probably won't have any luck with a crash dump if the dump device is controlled by lmrc (if lmrc is the reason for frozen system) long text: https://illumos.org/docs/user-guide/debug-systems/#gathering-information-from-a-running-system-using-only-nmi-x86 [-- Attachment #2: Type: text/html, Size: 1227 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-15 7:38 ` Carsten Grzemba @ 2024-02-15 8:11 ` Toomas Soome 2024-02-15 9:44 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: Toomas Soome @ 2024-02-15 8:11 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 1396 bytes --] > On 15. Feb 2024, at 09:38, Carsten Grzemba via illumos-developer <developer@lists.illumos.org> wrote: > > If it again the case try to force a crash dump on shutdown, e.g. > add /etc/system: > > set pcplusmp:apic_panic_on_nmi = 1 > > then if the system is frozen > $ ipmitool -Ilanplus -U idrac-user -P password -H idrac-ip power diag > > Note that you probably won't have any luck with a crash dump if the dump device is controlled by lmrc (if lmrc is the reason for frozen system) > > long text: > https://illumos.org/docs/user-guide/debug-systems/#gathering-information-from-a-running-system-using-only-nmi-x86 > illumos <https://illumos.topicbox.com/latest> / illumos-developer / see discussions <https://illumos.topicbox.com/groups/developer> + participants <https://illumos.topicbox.com/groups/developer/members> + delivery options <https://illumos.topicbox.com/groups/developer/subscription>Permalink <https://illumos.topicbox.com/groups/developer/Tf091423c9add514f-M400094e7f67a309bd7a2ce68> in such case it is good idea to boot as: ok set nmi=kmdb ok boot -k or ok boot -kd this way your nmi will get you to the kmdb. And yes, we should document the ‘nmi’ property, it can have values ‘ignore’, ‘panic’ and ‘kmdb’, see https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/mlsetup.c?r=d32f26ee#159 rgds, toomas [-- Attachment #2: Type: text/html, Size: 2963 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-15 8:11 ` Toomas Soome @ 2024-02-15 9:44 ` maurilio.longo 2024-02-16 14:36 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-15 9:44 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 389 bytes --] Hi Carsten and Toomas, the set pcplusmp:apic_panic_on_nmi = 1 is already present in /etc/system.d/_omnios:system:defaults and I've added there a set snooping=1 which should enable the deadman timer. If this is not enough I'll re-enable ILO (iDrac on HPE systems) and try with the ipmitool as per your suggestion. Thanks to both, I'll keep you posted. Maurilio. [-- Attachment #2: Type: text/html, Size: 648 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-15 9:44 ` maurilio.longo @ 2024-02-16 14:36 ` maurilio.longo 2024-02-16 15:20 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-16 14:36 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 5175 bytes --] Hi Hans, the system just rebooted by itself, no crash dump has been created but now I think I'm in this situation > This looks vaguely similar to what we've seen on DELL H755 controllers, > where the controller just drops dead after a while, needing a full power > cycle to come back to life. (See https://www.illumos.org/issues/15935) because of this: --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Feb 16 13:38:31 ba948ca4-681c-4d79-981c-6da3ca6e6b05 PCIEX-8000-DJ Major Host : pg-1 Platform : ProLiant-ML30-Gen10-Plus Chassis_id : CZJ2360PLT Product_sn : Fault class : fault.io.pciex.device-noresp 40% fault.io.pciex.device-interr 40% fault.io.pciex.bus-noresp 20% Affects : dev:////pci@0,0/pci8086,43b8@1c/pci1590,32b@0 faulted and taken out of service FRU : "PCI-E Slot 4" (hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0) faulty Description : A problem has been detected on one of the specified devices or on one of the specified connecting buses. Refer to http://illumos.org/msg/PCIEX-8000-DJ for more information. Response : One or more device instances may be disabled Impact : Loss of services provided by the device instances associated with this fault Action : If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure to replace the affected device(s). Use fmadm faulty to identify the devices or contact your illumos distribution team for support. PCI-E Slot 4 is where the controller is located. zpool status shows the pool as online, it is not zpool status pool: dati state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://illumos.org/msg/ZFS-8000-HC scan: scrub repaired 0 in 0 days 01:07:00 with 0 errors on Fri Feb 16 12:30:01 2024 config: NAME STATE READ WRITE CKSUM dati ONLINE 0 39 0 mirror-0 ONLINE 0 45 0 c4t5000CCA85EE5ECB0d0 ONLINE 0 49 0 c4t50014EE6B2513C38d0 ONLINE 0 49 0 mirror-2 ONLINE 0 73 0 c4t5000C500AAF9B0C3d0 ONLINE 0 83 0 c4t5000C500AAF9BF2Fd0 ONLINE 0 83 0 logs c3t001B448B4A7140BFd0s0 ONLINE 0 0 0 cache c3t001B448B4A7140BFd0s1 ONLINE 0 0 0 errors: 20 data errors, use '-v' for a list zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT dati 928G 80.1G 848G - - 12% 8% 1.00x ONLINE - rpool 222G 138G 83.5G - - 0% 62% 1.00x ONLINE - with format I don't see the vdevs upon which dati is built. format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c2t3d0 <SanDisk-SSD PLUS 240GB-UF8704RL-223.57GB> /pci@0,0/pci1590,28d@17/disk@3,0 1. c2t4d0 <WDC- WDS100T1R0A-68A4W0-411010WR-931.51GB> /pci@0,0/pci1590,28d@17/disk@4,0 2. c3t001B448B4A7140BFd0 <WD_BLACK-SN770 500GB-731100WD cyl 38932 alt 0 hd 224 sec 112> /pci@0,0/pci8086,43c4@1b,4/pci15b7,5017@0/blkdev@w001B448B4A7140BF,0 3. c5tACE42E0005CFF480d0 <NVMe-VS000480KXALB-85030G00-447.13GB> /pci@0,0/pci8086,43b0@1d/pci1c5c,2f3@0/blkdev@wACE42E0005CFF480,0 Specify disk (enter its number): ^C zpool status -v dati gives this error, probably because it can't read from the pool errors: List of errors unavailable (insufficient privileges) The last line of dmesg reads Feb 16 13:38:43 pg-1 genunix: [ID 390243 kern.info] Creating /etc/devices/retire_store which contains the ID for the controller. I'll power it off to see if upon restart it goes back to a working state. Regards. Maurilio [-- Attachment #2: Type: text/html, Size: 22699 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-16 14:36 ` maurilio.longo @ 2024-02-16 15:20 ` maurilio.longo 2024-02-16 15:27 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-16 15:20 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 4614 bytes --] The system reboots, I can see all the disks spinning, on the console appears reading ZFS configuration mounting filesystems (n/m) Then a new Creating /etc/devices/retire_store appears on /var/adm/messages and the pool is marked as unavailable and the disks become invisible to format. I've power-cycled the system several times now, without changes, it seems that the controller cannot be used anymore. modinfo | grep lmrc 194 fffffffff3fb1000 a388 123 1 lmrc (Broadcom MegaRAID 12G SAS RAID) During boot disks seem to be ok dmesg | grep lmrc Feb 16 15:57:20 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:21 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:21 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:21 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:21 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:21 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:21 pg-1 scsi: [ID 583861 kern.info] sd3 at lmrc1: target-port 5000cca85ee5ecb0 lun 0 Feb 16 15:57:23 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:23 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:23 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:23 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:23 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:23 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:24 pg-1 scsi: [ID 583861 kern.info] sd2 at lmrc1: target-port 50014ee6b2513c38 lun 0 Feb 16 15:57:25 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:25 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:25 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:25 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:25 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:26 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:26 pg-1 scsi: [ID 583861 kern.info] sd4 at lmrc1: target-port 5000c500aaf9b0c3 lun 0 Feb 16 15:57:27 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:27 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:27 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:27 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:28 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:28 pg-1 lmrc: [ID 934365 kern.warning] WARNING: lmrc0: command failed, status = 76, ex_status = 0, cdb[0] = 1b Feb 16 15:57:28 pg-1 scsi: [ID 583861 kern.info] sd5 at lmrc1: target-port 5000c500aaf9bf2f lun 0 Feb 16 15:57:33 pg-1 genunix: [ID 408114 kern.info] /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@p0 (lmrc2) online Feb 16 15:57:35 pg-1 scsi: [ID 583861 kern.info] ses0 at lmrc2: target-port w300162b20dde3640 lun 0 and dmesg | grep sd4 Feb 16 15:57:26 pg-1 scsi: [ID 583861 kern.info] sd4 at lmrc1: target-port 5000c500aaf9b0c3 lun 0 Feb 16 15:57:26 pg-1 genunix: [ID 936769 kern.info] sd4 is /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000c500aaf9b0c3,0 Feb 16 15:57:27 pg-1 genunix: [ID 408114 kern.info] /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000c500aaf9b0c3,0 (sd4) online Maurilio. [-- Attachment #2: Type: text/html, Size: 11872 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-16 15:20 ` maurilio.longo @ 2024-02-16 15:27 ` maurilio.longo 2024-02-16 15:42 ` Robert Mustacchi 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-16 15:27 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 135 bytes --] fmadm faulty -f fmadm repaired "PCI-E Slot 4" Fixed the issue with invisible disks... sorry for the noise. Regards. Maurilio [-- Attachment #2: Type: text/html, Size: 273 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-16 15:27 ` maurilio.longo @ 2024-02-16 15:42 ` Robert Mustacchi 2024-02-16 20:41 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: Robert Mustacchi @ 2024-02-16 15:42 UTC (permalink / raw) To: illumos-developer Hi Maurilio, On 2/16/24 07:27, maurilio.longo via illumos-developer wrote: > fmadm faulty -f > fmadm repaired "PCI-E Slot 4" > > Fixed the issue with invisible disks... sorry for the noise. Having this happen suggests that the PCIe controller or its corresponding root port observed AERs (PCIe's error reporting mechanism). If you look at fmudmp -e do you have entries that relate to around the timestamp of the original time the controller was retired? Robert ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-16 15:42 ` Robert Mustacchi @ 2024-02-16 20:41 ` maurilio.longo 2024-02-19 21:14 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-16 20:41 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 4042 bytes --] Hi Robert, yes, here it is fmdump -e at around 01:30pm Feb 16 13:31:24.4290 ereport.io.pci.fabric Feb 16 13:31:24.4290 ereport.io.pciex.a-nonfatal Feb 16 13:31:24.4290 ereport.io.pciex.tl.cto Feb 16 13:31:24.4290 ereport.io.pciex.rc.ce-msg Feb 16 13:38:31.5763 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5764 ereport.fs.zfs.data Feb 16 13:38:31.5763 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5765 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5766 ereport.fs.zfs.data Feb 16 13:38:31.5771 ereport.fs.zfs.io_failure Feb 16 15:11:59.2352 ereport.fs.zfs.io Feb 16 15:11:59.2352 ereport.fs.zfs.io Feb 16 15:11:59.2352 ereport.fs.zfs.io Feb 16 15:11:59.2352 ereport.fs.zfs.io and fmdump -v Feb 16 13:38:31.5102 ba948ca4-681c-4d79-981c-6da3ca6e6b05 PCIEX-8000-DJ Diagnosed 40% fault.io.pciex.device-noresp Problem in: hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0/pciexfn=0 Affects: dev:////pci@0,0/pci8086,43b8@1c/pci1590,32b@0 FRU: hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0 Location: PCI-E Slot 4 40% fault.io.pciex.device-interr Problem in: hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0/pciexfn=0 Affects: dev:////pci@0,0/pci8086,43b8@1c/pci1590,32b@0 FRU: hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0 Location: PCI-E Slot 4 20% fault.io.pciex.bus-noresp Problem in: hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0/pciexfn=0 Affects: dev:////pci@0,0/pci8086,43b8@1c/pci1590,32b@0 FRU: hc://:product-id=ProLiant-ML30-Gen10-Plus:server-id=pg-1:chassis-id=CZJ2360PLT/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=3/pciexdev=0 Location: PCI-E Slot 4 Feb 16 13:38:32.3326 447a2843-776e-4e39-a509-25817d74bf2d ZFS-8000-HC Diagnosed 100% fault.fs.zfs.io_failure_wait Problem in: zfs://pool=dati Affects: zfs://pool=dati FRU: - Location: - Regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 22867 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-16 20:41 ` maurilio.longo @ 2024-02-19 21:14 ` maurilio.longo 2024-02-20 8:58 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-19 21:14 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 4015 bytes --] Hi all, new problem today, similar to a previous one with loss of access to disks inside 'dati' pool. Feb 19 15:39:41 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 19 15:39:42 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 19 15:39:42 pg-1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Feb 19 15:39:42 pg-1 lmrc: [ID 383856 kern.warning] WARNING: lmrc0: reset failed Feb 19 15:39:42 pg-1 lmrc: [ID 998901 kern.warning] WARNING: lmrc0: AEN failed, status = 255 Feb 19 15:39:42 pg-1 lmrc: [ID 380853 kern.warning] WARNING: lmrc0: LD target map sync failed, status = 255 Feb 19 15:39:42 pg-1 lmrc: [ID 831201 kern.warning] WARNING: lmrc0: PD map sync failed, status = 255 Feb 19 15:39:42 pg-1 lmrc: [ID 383856 kern.warning] WARNING: lmrc0: reset failed Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:39:43 pg-1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dati' has encountered an uncorrectable I/O failure and has been sus> Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000cca85ee5ecb0,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@50014ee6b2513c38,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000c500aaf9b0c3,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000c500aaf9bf2f,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000cca85ee5ecb0,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@50014ee6b2513c38,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000c500aaf9b0c3,0 > Feb 19 15:40:10 pg-1 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,43b8@1c/pci1590,32b@0/iport@v0/disk@5000c500aaf9bf2f,0 > Feb 19 15:40:46 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Mon > Feb 19 15:40:46 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Mon > Feb 19 15:40:47 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Mon > Feb 19 15:40:47 pg-1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major#012EVENT-TIME: Mon > Feb 19 15:42:44 pg-1 unix: [ID 836849 kern.notice] #012#015panic[cpu0]/thread=fffffeb1f5b1e100: Feb 19 15:42:44 pg-1 genunix: [ID 156897 kern.notice] forced crash dump initiated at user request Feb 19 15:42:44 pg-1 unix: [ID 100000 kern.notice] #012 I've forced a kernel dump which can be seen here. https://mega.nz/file/ojVHjQyI#63qlDThAL3FvM4pB04QmkjraMh4hoKXmaN5aHyC3-UU Today, when the problem arised, I had just iozone running "alone". Regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 12578 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-19 21:14 ` maurilio.longo @ 2024-02-20 8:58 ` maurilio.longo 2024-02-22 17:44 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-20 8:58 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 1692 bytes --] After googling around about my problems I've made two changes, the first one was to upgrade the controller firmware from 52.16.3-3913, dated April 2021, to the lastest one, 52.26.3-5250_A, dated December 2023. Then, given several reports of HPE Gen10 units rebooting unexpectedly all suggesting to change the workload profile to max performance, I did this change to the BIOS, loosing SpeedStep, as it appears from /var/adm/messages. Feb 20 09:20:33 pg-1 unix: [ID 950921 kern.info] cpu1: x86 (chipid 0x0 GenuineIntel A0671 family 6 model 167 step 1 clock 2807 MHz) Feb 20 09:20:33 pg-1 unix: [ID 950921 kern.info] cpu1: Intel(r) Xeon(r) E-2314 CPU @ 2.80GHz Feb 20 09:20:33 pg-1 unix: [ID 557947 kern.info] cpu1 initialization complete - online Feb 20 09:20:33 pg-1 unix: [ID 977644 kern.info] NOTICE: cpu_acpi: _TSS package bad count 1 for CPU 2. Feb 20 09:20:33 pg-1 unix: [ID 340435 kern.info] NOTICE: Support for CPU throttling is being disabled due to errors parsing ACPI T-state objects exported by BIOS. Feb 20 09:20:33 pg-1 unix: [ID 950921 kern.info] cpu2: x86 (chipid 0x0 GenuineIntel A0671 family 6 model 167 step 1 clock 2807 MHz) Feb 20 09:20:33 pg-1 unix: [ID 950921 kern.info] cpu2: Intel(r) Xeon(r) E-2314 CPU @ 2.80GHz Feb 20 09:20:33 pg-1 unix: [ID 557947 kern.info] cpu2 initialization complete - online Feb 20 09:20:33 pg-1 unix: [ID 977644 kern.info] NOTICE: cpu_acpi: _TSS package bad count 1 for CPU 3. Feb 20 09:20:33 pg-1 unix: [ID 340435 kern.info] NOTICE: Support for CPU throttling is being disabled due to errors parsing ACPI T-state objects exported by BIOS. Restarted iozone, let's see if something improves or not. Regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 4554 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-20 8:58 ` maurilio.longo @ 2024-02-22 17:44 ` maurilio.longo 2024-02-26 8:59 ` maurilio.longo 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-22 17:44 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 5446 bytes --] So, after upgrading the controller's firmware I did reboot the computer and restart my tests which ended a few hours laters with the disks in a state similar to this: extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 3.0 0.0 0.1 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t3d0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t001B448B4A7140BFd0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 0.0 0 400 0 0 0 0 c4 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c4t50014EE6B2513C38d0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c4t5000CCA85EE5ECB0d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c4t5000C500AAF9B0C3d0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c4t5000C500AAF9BF2Fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c5tACE42E0005CFF480d0 0.0 0.0 0.0 0.0 246.0 13.0 0.0 0.0 100 100 0 0 0 0 dati 0.0 0.0 0.0 0.0 0.2 0.1 0.0 0.0 2 2 0 0 0 0 rpool extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 15.0 0.0 0.4 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2 0.0 7.0 0.0 0.2 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t3d0 0.0 8.0 0.0 0.2 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t001B448B4A7140BFd0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 0.0 0 400 0 0 0 0 c4 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c4t50014EE6B2513C38d0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c4t5000CCA85EE5ECB0d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c4t5000C500AAF9B0C3d0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c4t5000C500AAF9BF2Fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c5tACE42E0005CFF480d0 0.0 0.0 0.0 0.0 246.0 13.0 0.0 0.0 100 100 0 0 0 0 dati 0.0 0.0 0.0 0.0 0.3 0.2 0.0 0.0 4 5 0 0 0 0 rpool So I powered off the unit and restarted it but this time I executed the command "mdb -K" followed by ":c" on the console, to have the kernel debugger ready for the next lockup. Instead my unit spent the next two days running my tests (iozone + zfs recv + zpool scrub) without any problem. Given I needed a crash dump during the lockup, I decided to re-enable the onboard ILO5 to be able to generate an NMI and started again my tests (without the mdb -K part) and went on with my other chores. Well, it took less than three hours for the disks to become stuck again, just like above, but this time I've generated a NMI and I have a crash dump here: https://mega.nz/file/16kShQ6Y#nAv0tLbIvydBy6uaEX87d1VXMD-NIeLkg4GgvDnMatY ::msgbuf ends with this lmrc warning/error NOTICE: lmrc0: Drive 00(e252/Port 1I Box 0 Bay 0) Path 300062b20dde3640 reset (Type 03) NOTICE: lmrc0: unknown AEN received, seqnum = 19954, timestamp = 761927783, code = 27f, locale = 2, class = 0, argtype = 10 NOTICE: lmrc0: Drive 00(e252/Port 1I Box 0 Bay 0) link speed changed panic[cpu0]/thread=fffffe00f3e05c20: NMI received I hope this can shed some light on the problem because it seems that running with kernel debugger active it alters something (timings?) just that little that is needed to make the system (a lot more) solid. I have, very seldomly in the past two weeks, been able to run my tests for 48 straight hours without issues. Best regards Maurilio. [-- Attachment #2: Type: text/html, Size: 16823 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-22 17:44 ` maurilio.longo @ 2024-02-26 8:59 ` maurilio.longo 2024-02-26 9:30 ` Carsten Grzemba 0 siblings, 1 reply; 37+ messages in thread From: maurilio.longo @ 2024-02-26 8:59 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 548 bytes --] Just for the record, after nearly a month of tests, last friday I decided I had enough and replaced the MR216i-p with an LSI prtdiag -v | grep -i lsi 4 in use PCI Exp. Gen 3 x8 PCI-E Slot 4, Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (mpt_sas) The system has been working ok since then, with all the tests running, no lockups, no timeouts etc. Sadly, the lmrc driver, while promising, is not ready yet. Thanks to all who helped me and gave advice and to Hans and all those who wrote lmrc. Regards. Maurilio. [-- Attachment #2: Type: text/html, Size: 1312 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-26 8:59 ` maurilio.longo @ 2024-02-26 9:30 ` Carsten Grzemba 2024-08-26 12:30 ` manipm 0 siblings, 1 reply; 37+ messages in thread From: Carsten Grzemba @ 2024-02-26 9:30 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 267 bytes --] I hope you're not generally right. We have been using the driver for 3 months (Dell R650xs with PERC H355, Rpool with 2 NVME disks). We only had a problem once. The HBA was changed and the firmware was updated. Since then there has been no more trouble. [-- Attachment #2: Type: text/html, Size: 1083 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-26 9:30 ` Carsten Grzemba @ 2024-08-26 12:30 ` manipm 2024-08-26 15:23 ` Hans Rosenfeld 0 siblings, 1 reply; 37+ messages in thread From: manipm @ 2024-08-26 12:30 UTC (permalink / raw) To: illumos-developer [-- Attachment #1: Type: text/plain, Size: 1832 bytes --] I am also facing lmrc driver issue. I recently purchased DELL R750xs with PERC H755(in passthrough mode) consisting of 5disk. I am running latest DELL BIOS. Can someone help me on this. Aug 26 05:17:43 server1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Aug 26 05:17:43 server1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Aug 26 05:17:44 server1 lmrc: [ID 408335 kern.warning] WARNING: lmrc0: resetting... Aug 26 05:17:44 server1 lmrc: [ID 383856 kern.warning] WARNING: lmrc0: reset failed Aug 26 05:17:44 server1 lmrc: [ID 998901 kern.warning] WARNING: lmrc0: AEN failed, status = 255 Aug 26 05:17:44 server1 lmrc: [ID 380853 kern.warning] WARNING: lmrc0: LD target map sync failed, status = 255 Aug 26 05:17:44 server1 lmrc: [ID 831201 kern.warning] WARNING: lmrc0: PD map sync failed, status = 255 Aug 26 05:17:44 server1 lmrc: [ID 383856 kern.warning] WARNING: lmrc0: reset failed Aug 26 05:17:44 server1 zfs: [ID 961531 kern.warning] WARNING: Pool 'dpool' has encountered an uncorrectable I/O failure and has been suspended; `zpool clear` will be required before the pool can be written to. Aug 26 05:17:59 server1 scsi: [ID 107833 kern.warning] WARNING: /pci@bc,0/pci8086,347c@4/pci1028,1ae1@0/iport@v0/disk@5000c500f8162e4f,0 (sd2):#012#011d rive offline#012 Aug 26 05:18:00 server1 scsi: [ID 107833 kern.warning] WARNING: /pci@bc,0/pci8086,347c@4/pci1028,1ae1@0/iport@v0/disk@5000c500f815d253,0 (sd6):#012#011d rive offline#012 Aug 26 05:18:00 server1 scsi: [ID 107833 kern.warning] WARNING: /pci@bc,0/pci8086,347c@4/pci1028,1ae1@0/iport@v0/disk@5000c500f815ba63,0 (sd5):#012#011d rive offline#012 Aug 26 05:18:00 server1 scsi: [ID 107833 kern.warning] WARNING: /pci@bc,0/pci8086,347c@4/pci1028,1ae1@0/iport@v0/disk@5000c500f815e003,0 (sd3):#012#011d rive offline#012 [-- Attachment #2: Type: text/html, Size: 2087 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-08-26 12:30 ` manipm @ 2024-08-26 15:23 ` Hans Rosenfeld 0 siblings, 0 replies; 37+ messages in thread From: Hans Rosenfeld @ 2024-08-26 15:23 UTC (permalink / raw) To: developer On Mon, Aug 26, 2024 at 08:30:47AM -0400, manipm via illumos-developer wrote: > I am also facing lmrc driver issue. I recently purchased DELL R750xs with PERC H755(in passthrough mode) consisting of 5disk. > > I am running latest DELL BIOS. Can someone help me on this. I'm sorry that you ran into that problem. https://www.illumos.org/issues/15935 This is a known problem with the DELL PERC H755 which already occured during the late stages of the driver development. The HBA firmware pretty much drops dead after a while for no apparent reason, whether the HBA is under load or completely idle. After a warm reset, the UEFI firmware complains that it doesn't even see the device on the bus, and a full power cycle is required to get the controller going again. We've tried to root-cause this with the help from DELL and Broadcom, but they didn't ever tell us what really happened to their firmware, nor what our driver did to cause this. (That is not to say that our driver did anything wrong in particular, nor that anything a driver could possibly do should ever be considered an excuse for the HBA firmware to behave that way.) That being said, the PERC H355 worked flawlessly with lmrc, and so did HBAs from other vendors such as Intel. Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [developer] panic inside lmrc driver 2024-02-09 20:36 ` maurilio.longo 2024-02-12 8:11 ` maurilio.longo @ 2024-02-13 19:23 ` Hans Rosenfeld 1 sibling, 0 replies; 37+ messages in thread From: Hans Rosenfeld @ 2024-02-13 19:23 UTC (permalink / raw) To: illumos-developer On Fri, Feb 09, 2024 at 03:36:56PM -0500, maurilio.longo via illumos-developer wrote: > Hi Hans, > yes I'm running your ISO while doing tests and here you can find the dump file > > https://mega.nz/file/cmkxBATJ#89P3wguYNEBxeeib4AZxwIKVEXR-AmTBl56C4zX8nxE Thanks! I've filed bug for this: https://www.illumos.org/issues/16277 Can you reproduce this easily? I can build another ISO for testing if you like. Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2024-08-26 15:23 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-07 10:31 panic inside lmrc driver maurilio.longo 2024-02-07 11:14 ` [developer] " Peter Tribble 2024-02-07 11:16 ` Hans Rosenfeld 2024-02-07 11:46 ` maurilio.longo 2024-02-07 12:04 ` maurilio.longo 2024-02-07 12:29 ` Hans Rosenfeld 2024-02-07 13:01 ` maurilio.longo 2024-02-07 16:47 ` maurilio.longo 2024-02-07 17:01 ` Hans Rosenfeld 2024-02-07 21:35 ` Hans Rosenfeld 2024-02-08 7:59 ` maurilio.longo 2024-02-08 10:42 ` Hans Rosenfeld 2024-02-08 11:01 ` maurilio.longo 2024-02-09 8:06 ` maurilio.longo 2024-02-09 19:55 ` Hans Rosenfeld 2024-02-09 20:36 ` maurilio.longo 2024-02-12 8:11 ` maurilio.longo 2024-02-13 19:32 ` Hans Rosenfeld 2024-02-13 20:55 ` maurilio.longo 2024-02-13 21:03 ` maurilio.longo 2024-02-15 6:50 ` maurilio.longo 2024-02-15 7:38 ` Carsten Grzemba 2024-02-15 8:11 ` Toomas Soome 2024-02-15 9:44 ` maurilio.longo 2024-02-16 14:36 ` maurilio.longo 2024-02-16 15:20 ` maurilio.longo 2024-02-16 15:27 ` maurilio.longo 2024-02-16 15:42 ` Robert Mustacchi 2024-02-16 20:41 ` maurilio.longo 2024-02-19 21:14 ` maurilio.longo 2024-02-20 8:58 ` maurilio.longo 2024-02-22 17:44 ` maurilio.longo 2024-02-26 8:59 ` maurilio.longo 2024-02-26 9:30 ` Carsten Grzemba 2024-08-26 12:30 ` manipm 2024-08-26 15:23 ` Hans Rosenfeld 2024-02-13 19:23 ` Hans Rosenfeld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).