9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] [PATCH] fossil: fix a deadlock in the caching logic
@ 2023-04-04 17:15 noam
  2023-04-04 18:03 ` Steve Simon
  0 siblings, 1 reply; 18+ messages in thread
From: noam @ 2023-04-04 17:15 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

I've sporadically encountered a deadlock in fossil. Naturally, when your root file system crashes, it can be hard to debug. My solution: stop having a root file system. Was able to attach acid using mycroft's tooling from ANTS, and get a clean stack trace (https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1).

After a few hours yesterday (https://pixelhero.dev/notebook/fossil/2023-04-03.html <https://pixelhero.dev/2023-04-03-fossil.html>), I eventually tracked down the deadlock. When blockWrite is told to flush a clean block to disk - i.e. one which is already flushed - it removes the block from the cache's free list, locks the block, detects that it's clean, and then... drops the reference. While keeping the block locked. And in the cache.

This leak of the lock, of course, means that the *next* access to the block - which is still in the cache! - hangs indefinitely. This is seen exactly in the stack trace:

_cacheLocal grabs the block from the cache, tries to lock it, and hangs indefinitely. Worse, it does so under a call to fileWalk, which holds a different lock, so the effect spreads out and makes even more of the file system inaccessible as well (the fileMetaFlush proc hangs waiting on this file lock).

This patch just ensures we call blockPut on the BioClean path as well, thus unlocking the block and readding it to the cache's free lists.

The patch is on my branch - https://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f - but I figured I should probably post it here as well.

If anyone has any other patches that weren't in the 9legacy download as of ~2018, please let me know! :)

---
sys/src/cmd/fossil/cache.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c
index f473d211e..2fec44949 100644
--- a/sys/src/cmd/fossil/cache.c
+++ b/sys/src/cmd/fossil/cache.c
@@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock)
fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n",
argv0, bb->part, bb->addr, bb->l.type, bb->iostate);
/* probably BioWriting if it happens? */
-                       if(bb->iostate == BioClean)
+                       if(bb->iostate == BioClean){
+                               blockPut(bb);
goto ignblock;
+                       }
}

blockPut(bb);
--

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Ma4bc3ee3ea80defbb239f382
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 3759 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 17:15 [9fans] [PATCH] fossil: fix a deadlock in the caching logic noam
@ 2023-04-04 18:03 ` Steve Simon
  2023-04-04 18:34   ` Skip Tavakkolian
  2023-04-04 22:15   ` noam
  0 siblings, 2 replies; 18+ messages in thread
From: Steve Simon @ 2023-04-04 18:03 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/html, Size: 4831 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 18:03 ` Steve Simon
@ 2023-04-04 18:34   ` Skip Tavakkolian
  2023-04-04 20:44     ` Charles Forsyth
  2023-04-04 22:15   ` noam
  1 sibling, 1 reply; 18+ messages in thread
From: Skip Tavakkolian @ 2023-04-04 18:34 UTC (permalink / raw)
  To: 9fans

it definitely was not me. My bet would be on rsc, geoff, richard,
forsyth, quanstrom or djc.

On Tue, Apr 4, 2023 at 11:05 AM Steve Simon <steve@quintile.net> wrote:
>
>
> was this hard to reproduce?
>
> i have not seen fossil deadlocking and have used it since i installed my first home server in 2004.
>
> there definitely _was_ a problem in the snapshot code which was finally resolved around 2015 (roughly), i think perhaps skip, or forsyth found it - i apologise if i have the attribution wrong.
>
> fossil is also unhelpful if it runs out of space - i don’t believe brucee ever forgave it for that.
> this is less of a problem when it is run with venti of course.
>
> -Steve
>
>
> On 4 Apr 2023, at 6:16 pm, noam@pixelhero.dev wrote:
>
> 
> I've sporadically encountered a deadlock in fossil. Naturally, when your root file system crashes, it can be hard to debug. My solution: stop having a root file system. Was able to attach acid using mycroft's tooling from ANTS, and get a clean stack trace (https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1).
>
> After a few hours yesterday (https://pixelhero.dev/notebook/fossil/2023-04-03.html), I eventually tracked down the deadlock. When blockWrite is told to flush a clean block to disk - i.e. one which is already flushed - it removes the block from the cache's free list, locks the block, detects that it's clean, and then... drops the reference. While keeping the block locked. And in the cache.
>
> This leak of the lock, of course, means that the *next* access to the block - which is still in the cache! - hangs indefinitely. This is seen exactly in the stack trace:
>
> _cacheLocal grabs the block from the cache, tries to lock it, and hangs indefinitely. Worse, it does so under a call to fileWalk, which holds a different lock, so the effect spreads out and makes even more of the file system inaccessible as well (the fileMetaFlush proc hangs waiting on this file lock).
>
> This patch just ensures we call blockPut on the BioClean path as well, thus unlocking the block and readding it to the cache's free lists.
>
> The patch is on my branch - https://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f - but I figured I should probably post it here as well.
>
> If anyone has any other patches that weren't in the 9legacy download as of ~2018, please let me know! :)
>
> ---
> sys/src/cmd/fossil/cache.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c
> index f473d211e..2fec44949 100644
> --- a/sys/src/cmd/fossil/cache.c
> +++ b/sys/src/cmd/fossil/cache.c
> @@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock)
> fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n",
> argv0, bb->part, bb->addr, bb->l.type, bb->iostate);
> /* probably BioWriting if it happens? */
> - if(bb->iostate == BioClean)
> + if(bb->iostate == BioClean){
> + blockPut(bb);
> goto ignblock;
> + }
> }
>
> blockPut(bb);
> --
>
> 9fans / 9fans / see discussions + participants + delivery options Permalink

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Mc25a40069de1a1f118f53839
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 18:34   ` Skip Tavakkolian
@ 2023-04-04 20:44     ` Charles Forsyth
  2023-04-04 20:50       ` Charles Forsyth
  2023-04-04 22:07       ` Anthony Martin
  0 siblings, 2 replies; 18+ messages in thread
From: Charles Forsyth @ 2023-04-04 20:44 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

It's funny that usually "it wasn't me" is used when breaking things. here
it's fixing them, but I'm fairly sure "it wasn't me" that fixed it.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Md75914502384917e733de7a4
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 885 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 20:44     ` Charles Forsyth
@ 2023-04-04 20:50       ` Charles Forsyth
  2023-04-05  1:59         ` noam
  2023-04-04 22:07       ` Anthony Martin
  1 sibling, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2023-04-04 20:50 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

One of the nice things about several of the papers at iwp9 was the implied
or express reconsidering of secondary storage given various types of modern
technology.
Fossil works hard to do certain things that now we probably wouldn't bother
to do.

On Tue, 4 Apr 2023 at 21:44, Charles Forsyth <charles.forsyth@gmail.com>
wrote:

> It's funny that usually "it wasn't me" is used when breaking things. here
> it's fixing them, but I'm fairly sure "it wasn't me" that fixed it.
>

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M9c2d5cbd6cdf66a3a9443d64
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 1514 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 20:44     ` Charles Forsyth
  2023-04-04 20:50       ` Charles Forsyth
@ 2023-04-04 22:07       ` Anthony Martin
  2023-04-05  2:48         ` noam
  1 sibling, 1 reply; 18+ messages in thread
From: Anthony Martin @ 2023-04-04 22:07 UTC (permalink / raw)
  To: 9fans; +Cc: noam

Charles Forsyth <charles.forsyth@gmail.com> once said:
> It's funny that usually "it wasn't me" is used when breaking things. here
> it's fixing them, but I'm fairly sure "it wasn't me" that fixed it.

It was Richard Miller in 2012.

        https://9p.io/sources/patch/applied/fossil-snap-deadlock/
        https://9p.io/sources/patch/applied/fossil-archive-corruption/
        https://9p.io/sources/patch/applied/fossil-superblock-write/

He fixed another issue a year or two ago.

        http://9legacy.org/9legacy/patch/fossil-deadlocks.diff

Noam, can you reproduce your problem with the above patch?

Cheers,
  Anthony

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Mce8ab1dcb5e0d8d96f33af8e
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 18:03 ` Steve Simon
  2023-04-04 18:34   ` Skip Tavakkolian
@ 2023-04-04 22:15   ` noam
  1 sibling, 0 replies; 18+ messages in thread
From: noam @ 2023-04-04 22:15 UTC (permalink / raw)
  To: 9fans

Quoth Steve Simon <steve@quintile.net>:
> was this hard to reproduce?

I've seen it sporadically over the last year, and - when looking for it -
was able to trigger it deliberately yesterday with minimal effort the
first time I looked for it.

...of course, when I was later trying to reproduce it a third time, I
wasn't able to trigger it even on the fs running *without* the patch, so.
~50% reproducibility rate so far when I'm actively trying to hit it.

I'm fairly sure the root cause is a race condition between some of the
periodic threads - this is only triggered when we try to flush a clean
block, which isn't a common occurrence - but I wouldn't have put in so
much effort to fix this if it wasn't something I semiregularly ran into.

for(f in `{walk /sys})
        chmod +w $f

I think this, combined with the periodic flush routines and tight timing,
consistently reproduces it. There's probably a more general way to do so,
but without diving even deeper and seeing how we end up trying to flush
clean blocks, it's hard to say for sure.

fossil is *usually* pretty stable for me these days. My thinkpad often
has an uptime of weeks, and usually resets because I'm hacking on the
system and need to reboot to test it, not because of fossil.

I *have* seen bugs depressingly often, though. Once every month or so,
pretty consistently.

e.g. building any version of Go newer than 1.7 on my thinkpad crashes
fossil with ~50% consistency. The rest of the time, it alternates
between sporadic failures due to bugs in the Go compiler, and actually
working.

Similarly, using kvik's clone tool to move large volumes of data has
been a reliable way to crash fossil for me in the past (100% reliability
- there was an invocation that would cause the system to die literally
every time, but I don't remember exactly which dataset it was, or
what level of parallelism was required).


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M4de9844d7316ae1c346391d7
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 20:50       ` Charles Forsyth
@ 2023-04-05  1:59         ` noam
  2023-04-05 21:25           ` Charles Forsyth
  0 siblings, 1 reply; 18+ messages in thread
From: noam @ 2023-04-05  1:59 UTC (permalink / raw)
  To: 9fans

Quoth Charles Forsyth <charles.forsyth@gmail.com>:
> Fossil works hard to do certain things that now we probably wouldn't bother
> to do.

Such as?


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M8a784194389d102772df0a9e
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-04 22:07       ` Anthony Martin
@ 2023-04-05  2:48         ` noam
  0 siblings, 0 replies; 18+ messages in thread
From: noam @ 2023-04-05  2:48 UTC (permalink / raw)
  To: 9fans, ality; +Cc: noam

Quoth Anthony Martin <ality@pbrane.org>:
> Noam, can you reproduce your problem with the above patch?

Just from reading the patch: yes, with ~95% confidence.
The logic it's fixing is completely unrelated.

So: thanks! Now I can import that patch and have *another* deadlock fixed!

Are there any other patches I should know about?



------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M888aed33d63b46d671dc5ba0
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-05  1:59         ` noam
@ 2023-04-05 21:25           ` Charles Forsyth
  2023-04-06  3:22             ` noam
  0 siblings, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2023-04-05 21:25 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

fussing about certain things for hard drives that probably don't matter for
SSD let alone nvme

On Tue, 4 Apr 2023 at 23:04, <noam@pixelhero.dev> wrote:

> Quoth Charles Forsyth <charles.forsyth@gmail.com>:
> > Fossil works hard to do certain things that now we probably wouldn't
> bother
> > to do.
> 
> Such as?
> 

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Md583b0e2164eb70218c630e4
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 1851 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-05 21:25           ` Charles Forsyth
@ 2023-04-06  3:22             ` noam
  2023-04-06  3:57               ` Lucio De Re
  0 siblings, 1 reply; 18+ messages in thread
From: noam @ 2023-04-06  3:22 UTC (permalink / raw)
  To: 9fans

Quoth Charles Forsyth <charles.forsyth@gmail.com>:
> fussing about certain things for hard drives that probably don't matter for
> SSD let alone nvme
> certain things

I am once again asking you to be more specific, please :)

I have Plans for improving venti for myself, it'd be great to actually
have a specific list of issues that others have noticed!

Thanks,


- Noam Preil


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M9ba84777a354c147540a1e76
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-06  3:22             ` noam
@ 2023-04-06  3:57               ` Lucio De Re
  2023-04-08  7:50                 ` hiro
  0 siblings, 1 reply; 18+ messages in thread
From: Lucio De Re @ 2023-04-06  3:57 UTC (permalink / raw)
  To: 9fans

On 4/6/23, noam@pixelhero.dev <noam@pixelhero.dev> wrote:
> Quoth Charles Forsyth <charles.forsyth@gmail.com>:
>> fussing about certain things for hard drives that probably don't matter
>> for
>> SSD let alone nvme
>
> I am once again asking you to be more specific, please :)
>
> I have Plans for improving venti for myself, it'd be great to actually
> have a specific list of issues that others have noticed!
>
I presume that fossil doesn't apply special treatment to SSD and NVME
which to my limited understand could be a serious downside. I guess
I'm asking whether one should seriously consider ditching the
fossil/venti combination and consider centralising permanent storage
on something like ZFS instead?

Lucio.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M71e442edb5e1099e40508e7c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-06  3:57               ` Lucio De Re
@ 2023-04-08  7:50                 ` hiro
  2023-04-08 14:30                   ` Charles Forsyth
  0 siblings, 1 reply; 18+ messages in thread
From: hiro @ 2023-04-08  7:50 UTC (permalink / raw)
  To: 9fans

fixing another couple deadlocks makes you finally consider ditching fossil?
zfs storage isn't always permanent either, for example if you use
encryption or deduplication.

On 4/6/23, Lucio De Re <lucio.dere@gmail.com> wrote:
> On 4/6/23, noam@pixelhero.dev <noam@pixelhero.dev> wrote:
>> Quoth Charles Forsyth <charles.forsyth@gmail.com>:
>>> fussing about certain things for hard drives that probably don't matter
>>> for
>>> SSD let alone nvme
>>
>> I am once again asking you to be more specific, please :)
>>
>> I have Plans for improving venti for myself, it'd be great to actually
>> have a specific list of issues that others have noticed!
>>
> I presume that fossil doesn't apply special treatment to SSD and NVME
> which to my limited understand could be a serious downside. I guess
> I'm asking whether one should seriously consider ditching the
> fossil/venti combination and consider centralising permanent storage
> on something like ZFS instead?
> 
> Lucio.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Md68125af550af6687f52fa58
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-08  7:50                 ` hiro
@ 2023-04-08 14:30                   ` Charles Forsyth
  2023-04-08 14:36                     ` Charles Forsyth
  0 siblings, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2023-04-08 14:30 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]

zfs is very big, complicated and the code looks ancient. I did not enjoy
working with it.

On Sat, 8 Apr 2023 at 08:51, hiro <23hiro@gmail.com> wrote:

> fixing another couple deadlocks makes you finally consider ditching fossil?
> zfs storage isn't always permanent either, for example if you use
> encryption or deduplication.
>
> On 4/6/23, Lucio De Re <lucio.dere@gmail.com> wrote:
> > On 4/6/23, noam@pixelhero.dev <noam@pixelhero.dev> wrote:
> >> Quoth Charles Forsyth <charles.forsyth@gmail.com>:
> >>> fussing about certain things for hard drives that probably don't matter
> >>> for
> >>> SSD let alone nvme
> >>
> >> I am once again asking you to be more specific, please :)
> >>
> >> I have Plans for improving venti for myself, it'd be great to actually
> >> have a specific list of issues that others have noticed!
> >>
> > I presume that fossil doesn't apply special treatment to SSD and NVME
> > which to my limited understand could be a serious downside. I guess
> > I'm asking whether one should seriously consider ditching the
> > fossil/venti combination and consider centralising permanent storage
> > on something like ZFS instead?
> >
> > Lucio.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M0d0a1ec97cd26b13e9294866
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 3072 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-08 14:30                   ` Charles Forsyth
@ 2023-04-08 14:36                     ` Charles Forsyth
  2023-04-08 15:09                       ` Dan Cross
  0 siblings, 1 reply; 18+ messages in thread
From: Charles Forsyth @ 2023-04-08 14:36 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

It was the different characteristics of hard drives, even decent SATA,
compared to SSD and nvme that I had in mind.


On Sat, 8 Apr 2023 at 15:30, Charles Forsyth <charles.forsyth@gmail.com>
wrote:

> zfs is very big, complicated and the code looks ancient. I did not enjoy
> working with it.
>
> On Sat, 8 Apr 2023 at 08:51, hiro <23hiro@gmail.com> wrote:
>
>> fixing another couple deadlocks makes you finally consider ditching
>> fossil?
>> zfs storage isn't always permanent either, for example if you use
>> encryption or deduplication.
>>
>> On 4/6/23, Lucio De Re <lucio.dere@gmail.com> wrote:
>> > On 4/6/23, noam@pixelhero.dev <noam@pixelhero.dev> wrote:
>> >> Quoth Charles Forsyth <charles.forsyth@gmail.com>:
>> >>> fussing about certain things for hard drives that probably don't
>> matter
>> >>> for
>> >>> SSD let alone nvme
>> >>
>> >> I am once again asking you to be more specific, please :)
>> >>
>> >> I have Plans for improving venti for myself, it'd be great to actually
>> >> have a specific list of issues that others have noticed!
>> >>
>> > I presume that fossil doesn't apply special treatment to SSD and NVME
>> > which to my limited understand could be a serious downside. I guess
>> > I'm asking whether one should seriously consider ditching the
>> > fossil/venti combination and consider centralising permanent storage
>> > on something like ZFS instead?
>> >
>> > Lucio.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M3a916ec52a7ef1d354e615bd
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 3590 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-08 14:36                     ` Charles Forsyth
@ 2023-04-08 15:09                       ` Dan Cross
  2023-04-08 15:27                         ` Steffen Nurpmeso
  2023-04-08 17:12                         ` Bakul Shah
  0 siblings, 2 replies; 18+ messages in thread
From: Dan Cross @ 2023-04-08 15:09 UTC (permalink / raw)
  To: 9fans

On Sat, Apr 8, 2023 at 10:37 AM Charles Forsyth
<charles.forsyth@gmail.com> wrote:
> It was the different characteristics of hard drives, even decent SATA, compared to SSD and nvme that I had in mind.

Since details have been requested about this. I wouldn't presume to
speak from Charles, but some of those differences _may_ include:

1. Optimizing for the rotational latency of spinning media, and its effects vis:
  a. the layout of storage structures on the disk,
  b. placement of _data_ on the device.
2. Effects with respect to things that aren't considerations for rotating disks
  a. Wear-leveling may be the canonical example here
3. Effects at the controller level.
  a. Caching, and the effect that has on how operations are ordered to
ensure consistency
  b. Queuing for related objects written asynchronously and
assumptions about latency

In short, when you change storage technologies, assumptions that were
made with, say, a filesystem was initially written may be invalidated.
Consider the BSD FFS for example: UFS was written in an era of VAXen
and slow, 3600 RPM spinning disks like RA81s attached to relatively
unintelligent controllers; it made a number of fundamental design
decisions based on that, trying to optimize placement of data and
metadata near each other (to minimize head travel--this is the whole
cylinder group thing), implementation that explicitly accounted for
platter rotation with respect to scheduling operations for the
underlying storage device, putting multiple copies of the superblock
in multiple locations in the disk to maximize the chances of recovery
in the event of the (all-too-common) head crashes of the era, etc.
They also did very careful ordering of operations for soft-updates in
UFS2 to ensure filesystem consistency when updating metadata in the
face of a system crash (or power failure, or whatever). It turns out
that many of those optimizations become pessimizations (or at least
irrelevant) when you're all of a sudden writing to a solid-state
device, nevermind battery-backed DRAM on a much more advanced
controller.

        - Dan C.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M28a486accd8e735904418630
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-08 15:09                       ` Dan Cross
@ 2023-04-08 15:27                         ` Steffen Nurpmeso
  2023-04-08 17:12                         ` Bakul Shah
  1 sibling, 0 replies; 18+ messages in thread
From: Steffen Nurpmeso @ 2023-04-08 15:27 UTC (permalink / raw)
  To: 9fans

Dan Cross wrote in
 <CAEoi9W6HFi8J4FOQXWkpd007gQfwVWMYVRqNcMY8VEPUE2M=Rg@mail.gmail.com>:
 |On Sat, Apr 8, 2023 at 10:37 AM Charles Forsyth
 |<charles.forsyth@gmail.com> wrote:
 |> It was the different characteristics of hard drives, even decent \
 |> SATA, compared to SSD and nvme that I had in mind.
 ...
 |In short, when you change storage technologies, assumptions that were
 |made with, say, a filesystem was initially written may be invalidated.
 |Consider the BSD FFS for example: UFS was written in an era of VAXen
 |and slow, 3600 RPM spinning disks like RA81s attached to relatively
 |unintelligent controllers; it made a number of fundamental design
 |decisions based on that, trying to optimize placement of data and
 |metadata near each other (to minimize head travel--this is the whole
 |cylinder group thing), implementation that explicitly accounted for
 |platter rotation with respect to scheduling operations for the
 |underlying storage device, putting multiple copies of the superblock
 |in multiple locations in the disk to maximize the chances of recovery
 |in the event of the (all-too-common) head crashes of the era, etc.
 |They also did very careful ordering of operations for soft-updates in
 |UFS2 to ensure filesystem consistency when updating metadata in the
 |face of a system crash (or power failure, or whatever). It turns out
 |that many of those optimizations become pessimizations (or at least
 |irrelevant) when you're all of a sudden writing to a solid-state
 |device, nevermind battery-backed DRAM on a much more advanced
 |controller.

Funnily Kirk McKusick committed on March 29th
fe5e6e2cc5d6f2e4121eccdb3a8ceba646aef2c9, saying

    Improvement in UFS/FFS directory placement when doing mkdir(2).

    The algorithm for laying out new directories was devised in the 1980s
    and markedly improved the performance of the filesystem. In those days
    large disks had at most 100 cylinder groups and often as few as 10-20.
    Modern multi-terrabyte disks have thousands of cylinder groups. The
    original algorithm does not handle these large sizes well. This change
    attempts to expand the scope of the original algorithm to work well
    with these much larger disks while still retaining the properties
    of the original algorithm for small disks.
    ...
    This change updates the ffs_dirpref() routine which is responsible
    for selecting the cylinder group into which a new directory should
    be placed. If we are near the root of the filesystem we aim to
    spread them out as much as possible. As we descend deeper from the
    root we cluster them closer together around their parent as we
    expect them to be more closely interactive. Higher-level directories
    like usr/src/sys and usr/src/bin should be separated while the
    directories in these areas are more likely to be accessed together
    so should be closer. And directories within commands or kernel
    subsystems should be closer still.

    We pick a range of cylinder groups around the cylinder group of the
    directory in which we are being created. The size of the range for
    our search is based on our depth from the root of our filesystem.
    We then probe that range based on how many directories are already
    present. The first new directory is at 1/2 (middle) of the range;
    the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,

I only took a shallow look as i have no glue, but is sprung into
my eyes so i remembered it.

--steffen
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Mc6cf9913394dadc41af96812
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
  2023-04-08 15:09                       ` Dan Cross
  2023-04-08 15:27                         ` Steffen Nurpmeso
@ 2023-04-08 17:12                         ` Bakul Shah
  1 sibling, 0 replies; 18+ messages in thread
From: Bakul Shah @ 2023-04-08 17:12 UTC (permalink / raw)
  To: 9fans

Things like wear leveling are done by the FTL (flash translation layer) in the firmware. Other things it does: erase before write, logical to physical mapping, erasing blocks, garbage collection (moving live data around to free up whole blocks) etc. Typically ease blocks are 128KB or larger but seem to be treated as a secret by the SSD companies! At least NVMe SSDs provide at least 64 request queues, each can hold lots of requests. There is enough buffering to be  able to flush all data to the Flash in case of power failure but not sure if that is exposed to the user (apart from a flush command).

Apart from not doing seek related optimizations and placement, you’d probably want to minimize unnecessary writes as SSD lifetime is limited by the amount you write (seems to be about at least 600 times the capacity so a TB disk will have 600TBW lifetime). That means avoiding metadata updates if you can, Deduplication may also help. I have heard that you can never really erase data even if you do a secure erase so the FS should have an encryption layer. On the flip side it may *lose* data if left unpowered for a long time (this period goes down fast with increased temperature). JEDEC says 1 year retention at 30°C for consumer and 3 month retention at 40°C for enterprise SSDs. So may be a FS driver should do a background scrub on reconnect if the device was not powered on for a long time.

> On Apr 8, 2023, at 8:12 AM, Dan Cross <crossd@gmail.com> wrote:
> 
> On Sat, Apr 8, 2023 at 10:37 AM Charles Forsyth
> <charles.forsyth@gmail.com> wrote:
>> It was the different characteristics of hard drives, even decent SATA, compared to SSD and nvme that I had in mind.
> 
> Since details have been requested about this. I wouldn't presume to
> speak from Charles, but some of those differences _may_ include:
> 
> 1. Optimizing for the rotational latency of spinning media, and its effects vis:
>  a. the layout of storage structures on the disk,
>  b. placement of _data_ on the device.
> 2. Effects with respect to things that aren't considerations for rotating disks
>  a. Wear-leveling may be the canonical example here
> 3. Effects at the controller level.
>  a. Caching, and the effect that has on how operations are ordered to
> ensure consistency
>  b. Queuing for related objects written asynchronously and
> assumptions about latency
> 
> In short, when you change storage technologies, assumptions that were
> made with, say, a filesystem was initially written may be invalidated.
> Consider the BSD FFS for example: UFS was written in an era of VAXen
> and slow, 3600 RPM spinning disks like RA81s attached to relatively
> unintelligent controllers; it made a number of fundamental design
> decisions based on that, trying to optimize placement of data and
> metadata near each other (to minimize head travel--this is the whole
> cylinder group thing), implementation that explicitly accounted for
> platter rotation with respect to scheduling operations for the
> underlying storage device, putting multiple copies of the superblock
> in multiple locations in the disk to maximize the chances of recovery
> in the event of the (all-too-common) head crashes of the era, etc.
> They also did very careful ordering of operations for soft-updates in
> UFS2 to ensure filesystem consistency when updating metadata in the
> face of a system crash (or power failure, or whatever). It turns out
> that many of those optimizations become pessimizations (or at least
> irrelevant) when you're all of a sudden writing to a solid-state
> device, nevermind battery-backed DRAM on a much more advanced
> controller.
> 
> - Dan C.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M30011df66797d8263cd1bf6c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-04-08 17:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-04 17:15 [9fans] [PATCH] fossil: fix a deadlock in the caching logic noam
2023-04-04 18:03 ` Steve Simon
2023-04-04 18:34   ` Skip Tavakkolian
2023-04-04 20:44     ` Charles Forsyth
2023-04-04 20:50       ` Charles Forsyth
2023-04-05  1:59         ` noam
2023-04-05 21:25           ` Charles Forsyth
2023-04-06  3:22             ` noam
2023-04-06  3:57               ` Lucio De Re
2023-04-08  7:50                 ` hiro
2023-04-08 14:30                   ` Charles Forsyth
2023-04-08 14:36                     ` Charles Forsyth
2023-04-08 15:09                       ` Dan Cross
2023-04-08 15:27                         ` Steffen Nurpmeso
2023-04-08 17:12                         ` Bakul Shah
2023-04-04 22:07       ` Anthony Martin
2023-04-05  2:48         ` noam
2023-04-04 22:15   ` noam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).