public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
* Intel NUC (Gen 11) panic during pcieadm illumos testing
@ 2024-02-22 16:08 Dan McDonald
  2024-02-22 18:16 ` [developer] " Dan McDonald
  2024-02-22 18:47 ` Robert Mustacchi
  0 siblings, 2 replies; 14+ messages in thread
From: Dan McDonald @ 2024-02-22 16:08 UTC (permalink / raw)
  To: illumos-developer

TL;DR:  I can panic my NUC 11 seemingly at will IF AND ONLY IF I use `pcieadm save-cfgspace` to a directory in /tmp.

...

To help out in recent igc(4D) bringups, I acquired a NUC 11.

For giggles, I ran the non-ZFS tests on it with this week's SmartOS release (with the addition of igc(4D) of course).   I got this crash:

> ::status
debugging crash dump vmcore.0 (64-bit) from nuc
operating system: 5.11 joyent_20240221T162850Z (i86pc)
git branch: igc
git rev: 01394a6ab5cbb41d518fdc5af9ab0948844923d6
image uuid: (not set)
panic message: pcieb-2: PCI(-X) Express Fatal Error. (0x43)
dump content: kernel pages only
> $C
fffffe0079d0cae0 vpanic()
fffffe0079d0cb80 pcieb_intr_handler+0x2aa(fffffe58c8663120, 0)
fffffe0079d0cbd0 apix_dispatch_by_vector+0x8c(20)
fffffe0079d0cc00 apix_dispatch_lowlevel+0x29(20, 0)
fffffe0079cb5a50 switch_sp_and_call+0x15()
fffffe0079cb5ab0 apix_do_interrupt+0xf3(fffffe0079cb5ac0, 0)
fffffe0079cb5ac0 _interrupt+0xc3()
fffffe0079cb5bb0 i86_mwait+0x12()
fffffe0079cb5be0 cpu_idle_mwait+0x14b()
fffffe0079cb5c00 idle+0xa8()
fffffe0079cb5c10 thread_start+0xb()
> 

AND I had `fmadm faulty -v` data:

[root@nuc ~]# fmadm faulty -v
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 22 05:49:07 61a2503a-bc29-4bd4-bc0c-5cf6b915f084  PCIEX-8000-1P  Major      
Host        : nuc
Platform    : NUC11PAHi5 Chassis_id  : G6PA13900DV9
Product_sn  :  
Fault class : fault.io.pciex.device-interr 67%
              fault.io.pciex.device-invreq 33%
Affects     : dev:////pci@0,0/pci8086,a0bc@1c/pci8086,3004@0
              dev:////pci@0,0/pci8086,a0bc@1c
                  faulted and taken out of service
Problem in  : "MB" (hc://:product-id=NUC11PAHi5:server-id=nuc:chassis-id=G6PA13900DV9/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=87/pciexdev=0/pciexfn=0)
              "MB" (hc://:product-id=NUC11PAHi5:server-id=nuc:chassis-id=G6PA13900DV9/motherboard=0/hostbridge=2/pciexrc=2)
                  faulted and taken out of service
FRU         : "MB" (hc://:product-id=NUC11PAHi5:server-id=nuc:chassis-id=G6PA13900DV9/motherboard=0)
                  faulty

Description : Either the transmitting device sent an invalid request or the
              receiving device is reporting an internal fault.
              Refer to http://illumos.org/msg/PCIEX-8000-1P for more
              information.

Response    : One or more device instances may be disabled

Impact      : Loss of services provided by the device instances associated with
              this fault

Action      : Ensure that the latest drivers and patches are installed.                Otherwise schedule a repair procedure to replace the affected
              device(s).  Use fmadm faulty to identify the devices or contact
              your illumos distribution team for support.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 22 05:49:07 783364d5-38f0-4fca-9f40-9460ac76a025  SUNOS-8000-J0  Major      
Host        : nuc
Platform    : NUC11PAHi5 Chassis_id  : G6PA13900DV9
Product_sn  :  
Fault class : defect.sunos.eft.unexpected_telemetry 50%
              fault.sunos.eft.unexpected_telemetry 50%
Problem in  : dev:////pci@0,0
                  faulted and taken out of service

Description : The diagnosis engine encountered telemetry from the listed
              devices for which it was unable to perform a diagnosis -                Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
              information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 for
              more information.

Response    : Error reports have been logged for examination by your illumos
              distribution team.

Impact      : Automated diagnosis and response for these events will not occur.

Action      : Ensure that the latest illumos Kernel and Predictive Self-Healing
              (PSH) updates are installed.


A quick look at `pcieadm show-devs` gave me this device which matches up with the first fault report:

57/0/0  PCIe Gen 2x1   sdhost0        GL9755 SD Host Controller

I have no SD card in this machine.

BUT the process that was running:

	/usr/lib/pci/pcieadm save-cfgspace -a /tmp/pcieadm-priv.41234

might be important.

I tried it again w/o clearing FMA, and I induced another panic.

When I tried it a third time, clearing FMA and to a different directory on ZFS, I couldn't induce the panic.  When I pushed it back to the tmp directory (exactly as above), I did induce the panic.

What I've figured out:

- Can induce if and only if the dirname in /tmp is sufficiently long (TBD)

- Same device faults:
	87 aka 0x57 aka
	57/0/0  PCIe Gen 2x1   sdhost0        GL9755 SD Host Controller


And I have multiple coredumps for examination.

Thanks,
Dan


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-29 13:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-22 16:08 Intel NUC (Gen 11) panic during pcieadm illumos testing Dan McDonald
2024-02-22 18:16 ` [developer] " Dan McDonald
2024-02-22 18:47 ` Robert Mustacchi
2024-02-22 18:58   ` Dan McDonald
2024-02-22 19:17     ` Robert Mustacchi
2024-02-22 19:38       ` Dan McDonald
2024-03-07  7:25         ` Dan McDonald
2024-03-07  7:58           ` Pramod Batni
2024-03-07 16:04             ` Dan McDonald
2024-03-08  2:35               ` Pramod Batni
2024-03-08  4:40                 ` Dan McDonald
2024-03-11  6:38                   ` Pramod Batni
2024-03-13 17:21                     ` Dan McDonald
2024-03-29 13:40                       ` Pramod Batni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).