public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
* [developer] Panic during pkg update
@ 2024-10-30 18:57 Gary Mills
  2024-10-30 19:10 ` Peter Tribble
  2024-11-11 18:36 ` Toomas Soome via illumos-developer
  0 siblings, 2 replies; 7+ messages in thread
From: Gary Mills @ 2024-10-30 18:57 UTC (permalink / raw)
  To: developer

I'm not sure if this is a bug or just ZFS being careful, but I got a
panic and reboot while I was doing a "pkg update".  The system
has an AMD 6-core CPU with B550 support hardware.  The next
"pkg update" completed normally, without a panic.  Here's what
I found in /var/adm/messages.  Does it look familiar?

Oct 30 09:14:31 b550 unix: [ID 836849 kern.notice] 
Oct 30 09:14:31 b550 ^Mpanic[cpu4]/thread=fffffe2cc9e88780: 
Oct 30 09:14:31 b550 genunix: [ID 129249 kern.notice] checksum of cached data doesn't match BP err=50 hdr=fffffe3d478f51c0 bp=fffffe0040433988 abd=fffffe3d478f7cc0 buf=fffffe3b5a6f9000
Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433760 zfs:zfs_nfsshare_inited+378b87f0 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433890 zfs:arc_read+de1 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe00404338e0 zfs:dbuf_issue_final_prefetch+77 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433a70 zfs:dbuf_prefetch_impl+502 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433b20 zfs:dmu_zfetch+2ed ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433bd0 zfs:dmu_buf_hold_array_by_dnode+321 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433c70 zfs:dmu_read_uio_dnode+54 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433cc0 zfs:dmu_read_uio_dbuf+51 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433d60 zfs:zfs_read+19c ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433de0 genunix:fop_read+60 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f00 genunix:read+2b5 ()
Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f10 unix:brand_sys_syscall+1fe ()
Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
Oct 30 09:14:31 b550 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Oct 30 09:14:31 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port
Oct 30 09:14:32 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 1 reset port
Oct 30 09:14:50 b550 genunix: [ID 100000 kern.notice] 
Oct 30 09:14:50 b550 genunix: [ID 665016 kern.notice] ^M100% done: 859875 pages dumped, 
Oct 30 09:14:50 b550 genunix: [ID 851671 kern.notice] dump succeeded
Oct 30 09:15:34 b550 genunix: [ID 107833 kern.notice] ^MOpenIndiana Hipster 2022.10 Version illumos-806838751b 64-bit


-- 
-Gary Mills-            -refurb-                -Winnipeg, Manitoba, Canada-

------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-M70e6e59ba1570ed5ad3b1843
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] Panic during pkg update
  2024-10-30 18:57 [developer] Panic during pkg update Gary Mills
@ 2024-10-30 19:10 ` Peter Tribble
  2024-10-30 20:57   ` Gary Mills
  2024-11-11 18:36 ` Toomas Soome via illumos-developer
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Tribble @ 2024-10-30 19:10 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 3557 bytes --]

On Wed, Oct 30, 2024 at 6:57 PM Gary Mills <gary_mills@fastmail.fm> wrote:

> I'm not sure if this is a bug or just ZFS being careful, but I got a
> panic and reboot while I was doing a "pkg update".  The system
> has an AMD 6-core CPU with B550 support hardware.  The next
> "pkg update" completed normally, without a panic.  Here's what
> I found in /var/adm/messages.  Does it look familiar?
>

Looks similar to 12242

https://www.illumos.org/issues/12242


> Oct 30 09:14:31 b550 unix: [ID 836849 kern.notice]
> Oct 30 09:14:31 b550 ^Mpanic[cpu4]/thread=fffffe2cc9e88780:
> Oct 30 09:14:31 b550 genunix: [ID 129249 kern.notice] checksum of cached
> data doesn't match BP err=50 hdr=fffffe3d478f51c0 bp=fffffe0040433988
> abd=fffffe3d478f7cc0 buf=fffffe3b5a6f9000
> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice]
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433760
> zfs:zfs_nfsshare_inited+378b87f0 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433890
> zfs:arc_read+de1 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe00404338e0
> zfs:dbuf_issue_final_prefetch+77 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433a70
> zfs:dbuf_prefetch_impl+502 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433b20
> zfs:dmu_zfetch+2ed ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433bd0
> zfs:dmu_buf_hold_array_by_dnode+321 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433c70
> zfs:dmu_read_uio_dnode+54 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433cc0
> zfs:dmu_read_uio_dbuf+51 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433d60
> zfs:zfs_read+19c ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433de0
> genunix:fop_read+60 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f00
> genunix:read+2b5 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f10
> unix:brand_sys_syscall+1fe ()
> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice]
> Oct 30 09:14:31 b550 genunix: [ID 111219 kern.notice] dumping to
> /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> Oct 30 09:14:31 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0:
> ahci_tran_reset_dport port 0 reset port
> Oct 30 09:14:32 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0:
> ahci_tran_reset_dport port 1 reset port
> Oct 30 09:14:50 b550 genunix: [ID 100000 kern.notice]
> Oct 30 09:14:50 b550 genunix: [ID 665016 kern.notice] ^M100% done: 859875
> pages dumped,
> Oct 30 09:14:50 b550 genunix: [ID 851671 kern.notice] dump succeeded
> Oct 30 09:15:34 b550 genunix: [ID 107833 kern.notice] ^MOpenIndiana
> Hipster 2022.10 Version illumos-806838751b 64-bit
>
>
> --
> -Gary Mills-            -refurb-                -Winnipeg, Manitoba,
> Canada-
>
> ------------------------------------------
> illumos: illumos-developer
> Permalink:
> https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-M70e6e59ba1570ed5ad3b1843
> Delivery options:
> https://illumos.topicbox.com/groups/developer/subscription
>


-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-Mb3a8e17dbdbd92d15f5db628
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

[-- Attachment #2: Type: text/html, Size: 5321 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] Panic during pkg update
  2024-10-30 19:10 ` Peter Tribble
@ 2024-10-30 20:57   ` Gary Mills
  2024-11-11 17:19     ` Gergő Doma
  0 siblings, 1 reply; 7+ messages in thread
From: Gary Mills @ 2024-10-30 20:57 UTC (permalink / raw)
  To: illumos-developer

On Wed, Oct 30, 2024 at 07:10:47PM +0000, Peter Tribble wrote:

>    Looks similar to 12242
>    [2]https://www.illumos.org/issues/12242

Thanks.  Yes, it does.  So, it's been fixed.  I was upgrading from
hipster-20230306 to hipster-20241030.  It was the kernel from
2023-03-06 that panicked.


-- 
-Gary Mills-            -refurb-                -Winnipeg, Manitoba, Canada-

------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-M148f55cde38926233be0b045
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] Panic during pkg update
  2024-10-30 20:57   ` Gary Mills
@ 2024-11-11 17:19     ` Gergő Doma
  0 siblings, 0 replies; 7+ messages in thread
From: Gergő Doma @ 2024-11-11 17:19 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

>
> So, it's been fixed.
>
It seems to me that this is still an open issue - not fixed.

------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-Ma339ab77916f5339647f367a
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

[-- Attachment #2: Type: text/html, Size: 995 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] Panic during pkg update
  2024-10-30 18:57 [developer] Panic during pkg update Gary Mills
  2024-10-30 19:10 ` Peter Tribble
@ 2024-11-11 18:36 ` Toomas Soome via illumos-developer
  2024-11-11 21:11   ` gusev.vitaliy via illumos-developer
  1 sibling, 1 reply; 7+ messages in thread
From: Toomas Soome via illumos-developer @ 2024-11-11 18:36 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]



> On 30. Oct 2024, at 20:57, Gary Mills <gary_mills@fastmail.fm> wrote:
> 
> I'm not sure if this is a bug or just ZFS being careful, but I got a
> panic and reboot while I was doing a "pkg update".  The system
> has an AMD 6-core CPU with B550 support hardware.  The next
> "pkg update" completed normally, without a panic.  Here's what
> I found in /var/adm/messages.  Does it look familiar?
> 
> Oct 30 09:14:31 b550 unix: [ID 836849 kern.notice] 
> Oct 30 09:14:31 b550 ^Mpanic[cpu4]/thread=fffffe2cc9e88780: 
> Oct 30 09:14:31 b550 genunix: [ID 129249 kern.notice] checksum of cached data doesn't match BP err=50 hdr=fffffe3d478f51c0 bp=fffffe0040433988 abd=fffffe3d478f7cc0 buf=fffffe3b5a6f9000
> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433760 zfs:zfs_nfsshare_inited+378b87f0 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433890 zfs:arc_read+de1 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe00404338e0 zfs:dbuf_issue_final_prefetch+77 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433a70 zfs:dbuf_prefetch_impl+502 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433b20 zfs:dmu_zfetch+2ed ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433bd0 zfs:dmu_buf_hold_array_by_dnode+321 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433c70 zfs:dmu_read_uio_dnode+54 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433cc0 zfs:dmu_read_uio_dbuf+51 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433d60 zfs:zfs_read+19c ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433de0 genunix:fop_read+60 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f00 genunix:read+2b5 ()
> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f10 unix:brand_sys_syscall+1fe ()
> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
> Oct 30 09:14:31 b550 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> Oct 30 09:14:31 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port
> Oct 30 09:14:32 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 1 reset port
> Oct 30 09:14:50 b550 genunix: [ID 100000 kern.notice] 
> Oct 30 09:14:50 b550 genunix: [ID 665016 kern.notice] ^M100% done: 859875 pages dumped, 
> Oct 30 09:14:50 b550 genunix: [ID 851671 kern.notice] dump succeeded
> Oct 30 09:15:34 b550 genunix: [ID 107833 kern.notice] ^MOpenIndiana Hipster 2022.10 Version illumos-806838751b 64-bit
> 
> 

Dan got blown up while running zfs-tests (rsend), and that resulted on me picking one series of updates from OpenZFS concerning dbuf and dmu. There are still few XXX notes for myself, but so far both debug and non-debug builds have been behaving nicely (debug build used to run zfs-tests). I have seen myself also panic from arc (ASSERT fired while running zfs-tests on debuilg build — that was before the work mentioned above). Most likely need to pick some arc bits as well.

The current wip branch is: https://github.com/tsoome/illumos-gate/tree/rsend if you like to test. The problem about those panics is that they seem to be random, or at least not easily repeatable.

rgds,
toomas


------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-Mb8af562876078fc2e3f7037c
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

[-- Attachment #2: Type: text/html, Size: 4768 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] Panic during pkg update
  2024-11-11 18:36 ` Toomas Soome via illumos-developer
@ 2024-11-11 21:11   ` gusev.vitaliy via illumos-developer
  2024-11-12  6:18     ` gusev.vitaliy via illumos-developer
  0 siblings, 1 reply; 7+ messages in thread
From: gusev.vitaliy via illumos-developer @ 2024-11-11 21:11 UTC (permalink / raw)
  To: illumos-developer, Toomas Soome via illumos-developer; +Cc: Tom Caputi

[-- Attachment #1: Type: text/plain, Size: 7309 bytes --]

Panic comes from this function:

usr/src/uts/common/fs/zfs/arc.c

 5369 /*                                                                                                                                               
 5370  * XXX this should be changed to return an error, and callers                                                                                    
 5371  * re-read from disk on failure (on nondebug bits).                                                                                              
 5372  */                                                                                                                                              
 5373 static void                                                                                                                                      
 5374 arc_hdr_verify_checksum(spa_t *spa, arc_buf_hdr_t *hdr, const blkptr_t *bp)

  …

 5395                 err = zio_checksum_error_impl(spa, bp,                                                                                           
 5396                     BP_GET_CHECKSUM(bp), abd, psize, 0, NULL);                                                                                   
 5397                 if (err != 0) {                                                                                                                  
 5398                         /*                                                                                                                       
 5399                          * Use abd_copy_to_buf() rather than                                                                                     
 5400                          * abd_borrow_buf_copy() so that we are sure to                                                                          
 5401                          * include the buf in crash dumps.                                                                                       
 5402                          */                                                                                                                      
 5403                         void *buf = kmem_alloc(psize, KM_SLEEP);                                                                                 
 5404                         abd_copy_to_buf(buf, abd, psize);                                                                                        
 5405                         panic("checksum of cached data doesn't match BP "                                                                        
 5406                             "err=%u hdr=%p bp=%p abd=%p buf=%p",                                                                                 
 5407                             err, (void *)hdr, (void *)bp, (void *)abd, buf);          

OpenZFS though doesn’t have this piece of code and generally returns error as written in comment XXX.

Tom, do you think the panic() call should be replaced with returning an error?

—
Vitaliy Gusev

> On 11 Nov 2024, at 21:36, Toomas Soome via illumos-developer <developer@lists.illumos.org> wrote:
> 
> 
> 
>> On 30. Oct 2024, at 20:57, Gary Mills <gary_mills@fastmail.fm> wrote:
>> 
>> I'm not sure if this is a bug or just ZFS being careful, but I got a
>> panic and reboot while I was doing a "pkg update".  The system
>> has an AMD 6-core CPU with B550 support hardware.  The next
>> "pkg update" completed normally, without a panic.  Here's what
>> I found in /var/adm/messages.  Does it look familiar?
>> 
>> Oct 30 09:14:31 b550 unix: [ID 836849 kern.notice] 
>> Oct 30 09:14:31 b550 ^Mpanic[cpu4]/thread=fffffe2cc9e88780: 
>> Oct 30 09:14:31 b550 genunix: [ID 129249 kern.notice] checksum of cached data doesn't match BP err=50 hdr=fffffe3d478f51c0 bp=fffffe0040433988 abd=fffffe3d478f7cc0 buf=fffffe3b5a6f9000
>> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433760 zfs:zfs_nfsshare_inited+378b87f0 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433890 zfs:arc_read+de1 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe00404338e0 zfs:dbuf_issue_final_prefetch+77 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433a70 zfs:dbuf_prefetch_impl+502 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433b20 zfs:dmu_zfetch+2ed ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433bd0 zfs:dmu_buf_hold_array_by_dnode+321 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433c70 zfs:dmu_read_uio_dnode+54 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433cc0 zfs:dmu_read_uio_dbuf+51 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433d60 zfs:zfs_read+19c ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433de0 genunix:fop_read+60 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f00 genunix:read+2b5 ()
>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f10 unix:brand_sys_syscall+1fe ()
>> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
>> Oct 30 09:14:31 b550 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
>> Oct 30 09:14:31 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port
>> Oct 30 09:14:32 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 1 reset port
>> Oct 30 09:14:50 b550 genunix: [ID 100000 kern.notice] 
>> Oct 30 09:14:50 b550 genunix: [ID 665016 kern.notice] ^M100% done: 859875 pages dumped, 
>> Oct 30 09:14:50 b550 genunix: [ID 851671 kern.notice] dump succeeded
>> Oct 30 09:15:34 b550 genunix: [ID 107833 kern.notice] ^MOpenIndiana Hipster 2022.10 Version illumos-806838751b 64-bit
>> 
>> 
> 
> Dan got blown up while running zfs-tests (rsend), and that resulted on me picking one series of updates from OpenZFS concerning dbuf and dmu. There are still few XXX notes for myself, but so far both debug and non-debug builds have been behaving nicely (debug build used to run zfs-tests). I have seen myself also panic from arc (ASSERT fired while running zfs-tests on debuilg build — that was before the work mentioned above). Most likely need to pick some arc bits as well.
> 
> The current wip branch is: https://github.com/tsoome/illumos-gate/tree/rsend if you like to test. The problem about those panics is that they seem to be random, or at least not easily repeatable.
> 
> rgds,
> toomas
> 
> illumos <https://illumos.topicbox.com/latest> / illumos-developer / see discussions <https://illumos.topicbox.com/groups/developer> + participants <https://illumos.topicbox.com/groups/developer/members> + delivery options <https://illumos.topicbox.com/groups/developer/subscription>Permalink <https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-Mb8af562876078fc2e3f7037c>

------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-M5484aceef4b12b181bfff6f0
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

[-- Attachment #2: Type: text/html, Size: 33143 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] Panic during pkg update
  2024-11-11 21:11   ` gusev.vitaliy via illumos-developer
@ 2024-11-12  6:18     ` gusev.vitaliy via illumos-developer
  0 siblings, 0 replies; 7+ messages in thread
From: gusev.vitaliy via illumos-developer @ 2024-11-12  6:18 UTC (permalink / raw)
  To: illumos-developer, Tom Caputi

[-- Attachment #1: Type: text/plain, Size: 7851 bytes --]

> On 12 Nov 2024, at 00:11, gusev.vitaliy via illumos-developer <developer@lists.illumos.org> wrote:
> 
> Panic comes from this function:
> 
> usr/src/uts/common/fs/zfs/arc.c
> 
>  5369 /*                                                                                                                                               
>  5370  * XXX this should be changed to return an error, and callers                                                                                    
>  5371  * re-read from disk on failure (on nondebug bits).                                                                                              
>  5372  */                                                                                                                                              
>  5373 static void                                                                                                                                      
>  5374 arc_hdr_verify_checksum(spa_t *spa, arc_buf_hdr_t *hdr, const blkptr_t *bp)
> 
>   …
> 
>  5395                 err = zio_checksum_error_impl(spa, bp,                                                                                           
>  5396                     BP_GET_CHECKSUM(bp), abd, psize, 0, NULL);                                                                                   
>  5397                 if (err != 0) {                                                                                                                  
>  5398                         /*                                                                                                                       
>  5399                          * Use abd_copy_to_buf() rather than                                                                                     
>  5400                          * abd_borrow_buf_copy() so that we are sure to                                                                          
>  5401                          * include the buf in crash dumps.                                                                                       
>  5402                          */                                                                                                                      
>  5403                         void *buf = kmem_alloc(psize, KM_SLEEP);                                                                                 
>  5404                         abd_copy_to_buf(buf, abd, psize);                                                                                        
>  5405                         panic("checksum of cached data doesn't match BP "                                                                        
>  5406                             "err=%u hdr=%p bp=%p abd=%p buf=%p",                                                                                 
>  5407                             err, (void *)hdr, (void *)bp, (void *)abd, buf);          
> 
> OpenZFS though doesn’t have this piece of code and generally returns error as written in comment XXX.
> 
> Tom, do you think the panic() call should be replaced with returning an error?
> 

I mean Tom Caputi :) Since this code was added in "8727 Native data and metadata encryption for zfs”.

Tom, could you clarify,  in which cases zio_checksum_error_impl() can return error here: arc_read -> arc_hdr_verify_checksum -> zio_checksum_error_impl ? Is is so critical to panic?

—
Vitaliy Gusev


> —
> Vitaliy Gusev
> 
>> On 11 Nov 2024, at 21:36, Toomas Soome via illumos-developer <developer@lists.illumos.org> wrote:
>> 
>> 
>> 
>>> On 30. Oct 2024, at 20:57, Gary Mills <gary_mills@fastmail.fm> wrote:
>>> 
>>> I'm not sure if this is a bug or just ZFS being careful, but I got a
>>> panic and reboot while I was doing a "pkg update".  The system
>>> has an AMD 6-core CPU with B550 support hardware.  The next
>>> "pkg update" completed normally, without a panic.  Here's what
>>> I found in /var/adm/messages.  Does it look familiar?
>>> 
>>> Oct 30 09:14:31 b550 unix: [ID 836849 kern.notice] 
>>> Oct 30 09:14:31 b550 ^Mpanic[cpu4]/thread=fffffe2cc9e88780: 
>>> Oct 30 09:14:31 b550 genunix: [ID 129249 kern.notice] checksum of cached data doesn't match BP err=50 hdr=fffffe3d478f51c0 bp=fffffe0040433988 abd=fffffe3d478f7cc0 buf=fffffe3b5a6f9000
>>> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433760 zfs:zfs_nfsshare_inited+378b87f0 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433890 zfs:arc_read+de1 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe00404338e0 zfs:dbuf_issue_final_prefetch+77 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433a70 zfs:dbuf_prefetch_impl+502 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433b20 zfs:dmu_zfetch+2ed ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433bd0 zfs:dmu_buf_hold_array_by_dnode+321 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433c70 zfs:dmu_read_uio_dnode+54 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433cc0 zfs:dmu_read_uio_dbuf+51 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433d60 zfs:zfs_read+19c ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433de0 genunix:fop_read+60 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f00 genunix:read+2b5 ()
>>> Oct 30 09:14:31 b550 genunix: [ID 655072 kern.notice] fffffe0040433f10 unix:brand_sys_syscall+1fe ()
>>> Oct 30 09:14:31 b550 unix: [ID 100000 kern.notice] 
>>> Oct 30 09:14:31 b550 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
>>> Oct 30 09:14:31 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port
>>> Oct 30 09:14:32 b550 ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 1 reset port
>>> Oct 30 09:14:50 b550 genunix: [ID 100000 kern.notice] 
>>> Oct 30 09:14:50 b550 genunix: [ID 665016 kern.notice] ^M100% done: 859875 pages dumped, 
>>> Oct 30 09:14:50 b550 genunix: [ID 851671 kern.notice] dump succeeded
>>> Oct 30 09:15:34 b550 genunix: [ID 107833 kern.notice] ^MOpenIndiana Hipster 2022.10 Version illumos-806838751b 64-bit
>>> 
>>> 
>> 
>> Dan got blown up while running zfs-tests (rsend), and that resulted on me picking one series of updates from OpenZFS concerning dbuf and dmu. There are still few XXX notes for myself, but so far both debug and non-debug builds have been behaving nicely (debug build used to run zfs-tests). I have seen myself also panic from arc (ASSERT fired while running zfs-tests on debuilg build — that was before the work mentioned above). Most likely need to pick some arc bits as well.
>> 
>> The current wip branch is: https://github.com/tsoome/illumos-gate/tree/rsend if you like to test. The problem about those panics is that they seem to be random, or at least not easily repeatable.
>> 
>> rgds,
>> toomas
>> 
> 
> illumos <https://illumos.topicbox.com/latest> / illumos-developer / see discussions <https://illumos.topicbox.com/groups/developer> + participants <https://illumos.topicbox.com/groups/developer/members> + delivery options <https://illumos.topicbox.com/groups/developer/subscription>Permalink <https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-M5484aceef4b12b181bfff6f0>

------------------------------------------
illumos: illumos-developer
Permalink: https://illumos.topicbox.com/groups/developer/Te1153c7aaa3e05c7-Mc08c98af25e077799cc2d612
Delivery options: https://illumos.topicbox.com/groups/developer/subscription

[-- Attachment #2: Type: text/html, Size: 34920 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-12  6:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-30 18:57 [developer] Panic during pkg update Gary Mills
2024-10-30 19:10 ` Peter Tribble
2024-10-30 20:57   ` Gary Mills
2024-11-11 17:19     ` Gergő Doma
2024-11-11 18:36 ` Toomas Soome via illumos-developer
2024-11-11 21:11   ` gusev.vitaliy via illumos-developer
2024-11-12  6:18     ` gusev.vitaliy via illumos-developer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).