From: Bakul Shah <bakul@iitbombay.org>
To: 9fans <9fans@9fans.net>
Subject: Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
Date: Sat, 8 Apr 2023 10:12:25 -0700 [thread overview]
Message-ID: <043A7033-4868-47E2-B58A-EF6E543FC944@iitbombay.org> (raw)
In-Reply-To: <CAEoi9W6HFi8J4FOQXWkpd007gQfwVWMYVRqNcMY8VEPUE2M=Rg@mail.gmail.com>
Things like wear leveling are done by the FTL (flash translation layer) in the firmware. Other things it does: erase before write, logical to physical mapping, erasing blocks, garbage collection (moving live data around to free up whole blocks) etc. Typically ease blocks are 128KB or larger but seem to be treated as a secret by the SSD companies! At least NVMe SSDs provide at least 64 request queues, each can hold lots of requests. There is enough buffering to be able to flush all data to the Flash in case of power failure but not sure if that is exposed to the user (apart from a flush command).
Apart from not doing seek related optimizations and placement, you’d probably want to minimize unnecessary writes as SSD lifetime is limited by the amount you write (seems to be about at least 600 times the capacity so a TB disk will have 600TBW lifetime). That means avoiding metadata updates if you can, Deduplication may also help. I have heard that you can never really erase data even if you do a secure erase so the FS should have an encryption layer. On the flip side it may *lose* data if left unpowered for a long time (this period goes down fast with increased temperature). JEDEC says 1 year retention at 30°C for consumer and 3 month retention at 40°C for enterprise SSDs. So may be a FS driver should do a background scrub on reconnect if the device was not powered on for a long time.
> On Apr 8, 2023, at 8:12 AM, Dan Cross <crossd@gmail.com> wrote:
>
> On Sat, Apr 8, 2023 at 10:37 AM Charles Forsyth
> <charles.forsyth@gmail.com> wrote:
>> It was the different characteristics of hard drives, even decent SATA, compared to SSD and nvme that I had in mind.
>
> Since details have been requested about this. I wouldn't presume to
> speak from Charles, but some of those differences _may_ include:
>
> 1. Optimizing for the rotational latency of spinning media, and its effects vis:
> a. the layout of storage structures on the disk,
> b. placement of _data_ on the device.
> 2. Effects with respect to things that aren't considerations for rotating disks
> a. Wear-leveling may be the canonical example here
> 3. Effects at the controller level.
> a. Caching, and the effect that has on how operations are ordered to
> ensure consistency
> b. Queuing for related objects written asynchronously and
> assumptions about latency
>
> In short, when you change storage technologies, assumptions that were
> made with, say, a filesystem was initially written may be invalidated.
> Consider the BSD FFS for example: UFS was written in an era of VAXen
> and slow, 3600 RPM spinning disks like RA81s attached to relatively
> unintelligent controllers; it made a number of fundamental design
> decisions based on that, trying to optimize placement of data and
> metadata near each other (to minimize head travel--this is the whole
> cylinder group thing), implementation that explicitly accounted for
> platter rotation with respect to scheduling operations for the
> underlying storage device, putting multiple copies of the superblock
> in multiple locations in the disk to maximize the chances of recovery
> in the event of the (all-too-common) head crashes of the era, etc.
> They also did very careful ordering of operations for soft-updates in
> UFS2 to ensure filesystem consistency when updating metadata in the
> face of a system crash (or power failure, or whatever). It turns out
> that many of those optimizations become pessimizations (or at least
> irrelevant) when you're all of a sudden writing to a solid-state
> device, nevermind battery-backed DRAM on a much more advanced
> controller.
>
> - Dan C.
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M30011df66797d8263cd1bf6c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
next prev parent reply other threads:[~2023-04-08 17:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 17:15 noam
2023-04-04 18:03 ` Steve Simon
2023-04-04 18:34 ` Skip Tavakkolian
2023-04-04 20:44 ` Charles Forsyth
2023-04-04 20:50 ` Charles Forsyth
2023-04-05 1:59 ` noam
2023-04-05 21:25 ` Charles Forsyth
2023-04-06 3:22 ` noam
2023-04-06 3:57 ` Lucio De Re
2023-04-08 7:50 ` hiro
2023-04-08 14:30 ` Charles Forsyth
2023-04-08 14:36 ` Charles Forsyth
2023-04-08 15:09 ` Dan Cross
2023-04-08 15:27 ` Steffen Nurpmeso
2023-04-08 17:12 ` Bakul Shah [this message]
2023-04-04 22:07 ` Anthony Martin
2023-04-05 2:48 ` noam
2023-04-04 22:15 ` noam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=043A7033-4868-47E2-B58A-EF6E543FC944@iitbombay.org \
--to=bakul@iitbombay.org \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).