Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Development discussion of WireGuard
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: paulmck@kernel.org
Cc: Uladzislau Rezki <urezki@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Julia Lawall <Julia.Lawall@inria.fr>,
	linux-block@vger.kernel.org, kernel-janitors@vger.kernel.org,
	bridge@lists.linux.dev, linux-trace-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Nicholas Piggin <npiggin@gmail.com>,
	netdev@vger.kernel.org, wireguard@lists.zx2c4.com,
	linux-kernel@vger.kernel.org, ecryptfs@vger.kernel.org,
	Neil Brown <neilb@suse.de>, Olga Kornievskaia <kolga@netapp.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, linux-can@vger.kernel.org,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	kasan-dev <kasan-dev@googlegroups.com>
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Date: Wed, 19 Jun 2024 11:28:13 +0200	[thread overview]
Message-ID: <4cba4a48-902b-4fb6-895c-c8e6b64e0d5f@suse.cz> (raw)
In-Reply-To: <6dad6e9f-e0ca-4446-be9c-1be25b2536dd@paulmck-laptop>

On 6/18/24 7:53 PM, Paul E. McKenney wrote:
> On Tue, Jun 18, 2024 at 07:21:42PM +0200, Vlastimil Babka wrote:
>> On 6/18/24 6:48 PM, Paul E. McKenney wrote:
>> > On Tue, Jun 18, 2024 at 11:31:00AM +0200, Uladzislau Rezki wrote:
>> >> > On 6/17/24 8:42 PM, Uladzislau Rezki wrote:
>> >> > >> +
>> >> > >> +	s = container_of(work, struct kmem_cache, async_destroy_work);
>> >> > >> +
>> >> > >> +	// XXX use the real kmem_cache_free_barrier() or similar thing here
>> >> > > It implies that we need to introduce kfree_rcu_barrier(), a new API, which i
>> >> > > wanted to avoid initially.
>> >> > 
>> >> > I wanted to avoid new API or flags for kfree_rcu() users and this would
>> >> > be achieved. The barrier is used internally so I don't consider that an
>> >> > API to avoid. How difficult is the implementation is another question,
>> >> > depending on how the current batching works. Once (if) we have sheaves
>> >> > proven to work and move kfree_rcu() fully into SLUB, the barrier might
>> >> > also look different and hopefully easier. So maybe it's not worth to
>> >> > invest too much into that barrier and just go for the potentially
>> >> > longer, but easier to implement?
>> >> > 
>> >> Right. I agree here. If the cache is not empty, OK, we just defer the
>> >> work, even we can use a big 21 seconds delay, after that we just "warn"
>> >> if it is still not empty and leave it as it is, i.e. emit a warning and
>> >> we are done.
>> >> 
>> >> Destroying the cache is not something that must happen right away. 
>> > 
>> > OK, I have to ask...
>> > 
>> > Suppose that the cache is created and destroyed by a module and
>> > init/cleanup time, respectively.  Suppose that this module is rmmod'ed
>> > then very quickly insmod'ed.
>> > 
>> > Do we need to fail the insmod if the kmem_cache has not yet been fully
>> > cleaned up?
>> 
>> We don't have any such link between kmem_cache and module to detect that, so
>> we would have to start tracking that. Probably not worth the trouble.
> 
> Fair enough!
> 
>> >  If not, do we have two versions of the same kmem_cache in
>> > /proc during the overlap time?
>> 
>> Hm could happen in /proc/slabinfo but without being harmful other than
>> perhaps confusing someone. We could filter out the caches being destroyed
>> trivially.
> 
> Or mark them in /proc/slabinfo?  Yet another column, yay!!!  Or script
> breakage from flagging the name somehow, for example, trailing "/"
> character.

Yeah I've been resisting such changes to the layout and this wouldn't be
worth it, apart from changing the name itself but not in a dangerous way
like with "/" :)

>> Sysfs and debugfs might be more problematic as I suppose directory names
>> would clash. I'll have to check... might be even happening now when we do
>> detect leaked objects and just leave the cache around... thanks for the
>> question.
> 
> "It is a service that I provide."  ;-)
> 
> But yes, we might be living with it already and there might already
> be ways people deal with it.

So it seems if the sysfs/debugfs directories already exist, they will
silently not be created. Wonder if we have such cases today already because
caches with same name exist. I think we do with the zsmalloc using 32 caches
with same name that we discussed elsewhere just recently.

Also indeed if the cache has leaked objects and won't be thus destroyed,
these directories indeed stay around, as well as the slabinfo entry, and can
prevent new ones from being created (slabinfo lines with same name are not
prevented).

But it wouldn't be great to introduce this possibility to happen for the
temporarily delayed removal due to kfree_rcu() and a module re-insert, since
that's a legitimate case and not buggy state due to leaks.

The debugfs directory we could remove immediately before handing over to the
scheduled workfn, but if it turns out there was a leak and the workfn leaves
the cache around, debugfs dir will be gone and we can't check the
alloc_traces/free_traces files there (but we have the per-object info
including the traces in the dmesg splat).

The sysfs directory is currently removed only with the whole cache being
destryed due to sysfs/kobject lifetime model. I'd love to untangle it for
other reasons too, but haven't investigated it yet. But again it might be
useful for sysfs dir to stay around for inspection, as for the debugfs.

We could rename the sysfs/debugfs directories before queuing the work? Add
some prefix like GOING_AWAY-$name. If leak is detected and cache stays
forever, another rename to LEAKED-$name. (and same for the slabinfo). But
multiple ones with same name might pile up, so try adding a counter then?
Probably messy to implement, but perhaps the most robust in the end? The
automatic counter could also solve the general case of people using same
name for multiple caches.

Other ideas?

Thanks,
Vlastimil

> 
> 							Thanx, Paul
> 
>> >> > > Since you do it asynchronous can we just repeat
>> >> > > and wait until it a cache is furry freed?
>> >> > 
>> >> > The problem is we want to detect the cases when it's not fully freed
>> >> > because there was an actual read. So at some point we'd need to stop the
>> >> > repeats because we know there can no longer be any kfree_rcu()'s in
>> >> > flight since the kmem_cache_destroy() was called.
>> >> > 
>> >> Agree. As noted above, we can go with 21 seconds(as an example) interval
>> >> and just perform destroy(without repeating).
>> >> 
>> >> --
>> >> Uladzislau Rezki
>>

next prev parent reply	other threads:[~2024-06-19  9:28 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-09  8:27 Julia Lawall
2024-06-09  8:27 ` [PATCH 01/14] wireguard: allowedips: " Julia Lawall
2024-06-09 14:32   ` Jason A. Donenfeld
2024-06-09 14:36     ` Julia Lawall
2024-06-10 20:38     ` Vlastimil Babka
2024-06-10 20:59       ` Jason A. Donenfeld
2024-06-12 21:33 ` [PATCH 00/14] " Jakub Kicinski
2024-06-12 22:37   ` Paul E. McKenney
2024-06-12 22:46     ` Jakub Kicinski
     [not found]     ` <7e58e73d-4173-49fe-8f05-38a3699bc2c1@kernel.dk>
2024-06-12 23:04       ` Paul E. McKenney
2024-06-12 23:31     ` Jason A. Donenfeld
2024-06-13  0:31       ` Jason A. Donenfeld
2024-06-13  3:38         ` Paul E. McKenney
2024-06-13 12:22           ` Jason A. Donenfeld
2024-06-13 12:46             ` Paul E. McKenney
2024-06-13 14:11               ` Jason A. Donenfeld
2024-06-13 15:12                 ` Paul E. McKenney
2024-06-17 15:10             ` Vlastimil Babka
2024-06-17 16:12               ` Paul E. McKenney
2024-06-17 17:23                 ` Vlastimil Babka
2024-06-17 18:42                   ` Uladzislau Rezki
2024-06-17 21:08                     ` Vlastimil Babka
2024-06-18  9:31                       ` Uladzislau Rezki
2024-06-18 16:48                         ` Paul E. McKenney
2024-06-18 17:21                           ` Vlastimil Babka
2024-06-18 17:53                             ` Paul E. McKenney
2024-06-19  9:28                               ` Vlastimil Babka [this message]
2024-06-19 16:46                                 ` Paul E. McKenney
2024-06-21  9:32                                 ` Uladzislau Rezki
2024-06-19  9:51                           ` Uladzislau Rezki
2024-06-19  9:56                             ` Vlastimil Babka
2024-06-19 11:22                               ` Uladzislau Rezki
2024-06-17 18:54                   ` Paul E. McKenney
2024-06-17 21:34                     ` Vlastimil Babka
2024-06-13 14:17           ` Jakub Kicinski
2024-06-13 14:53             ` Paul E. McKenney
2024-06-13 11:58     ` Jason A. Donenfeld
2024-06-13 12:47       ` Paul E. McKenney
2024-06-13 13:06         ` Uladzislau Rezki
2024-06-13 15:06           ` Paul E. McKenney
2024-06-13 17:38             ` Uladzislau Rezki
2024-06-13 17:45               ` Paul E. McKenney
2024-06-13 17:58                 ` Uladzislau Rezki
2024-06-13 18:13                   ` Paul E. McKenney
2024-06-14 12:35                     ` Uladzislau Rezki
2024-06-14 14:17                       ` Paul E. McKenney
2024-06-14 14:50                         ` Uladzislau Rezki
2024-06-14 19:33                       ` Jason A. Donenfeld
2024-06-17 13:50                         ` Uladzislau Rezki
2024-06-17 14:56                           ` Jason A. Donenfeld
2024-06-17 16:30                             ` Uladzislau Rezki
2024-06-17 16:33                               ` Jason A. Donenfeld
2024-06-17 16:38                                 ` Vlastimil Babka
2024-06-17 17:04                                   ` Jason A. Donenfeld
2024-06-17 21:19                                     ` Vlastimil Babka
2024-06-17 16:42                                 ` Uladzislau Rezki
2024-06-17 16:57                                   ` Jason A. Donenfeld
2024-06-17 17:19                                     ` Uladzislau Rezki
2024-06-17 14:37                         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4cba4a48-902b-4fb6-895c-c8e6b64e0d5f@suse.cz \
    --to=vbabka@suse.cz \
    --cc=Dai.Ngo@oracle.com \
    --cc=Jason@zx2c4.com \
    --cc=Julia.Lawall@inria.fr \
    --cc=bridge@lists.linux.dev \
    --cc=christophe.leroy@csgroup.eu \
    --cc=coreteam@netfilter.org \
    --cc=ecryptfs@vger.kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kernel-janitors@vger.kernel.org \
    --cc=kolga@netapp.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=tom@talpey.com \
    --cc=urezki@gmail.com \
    --cc=wireguard@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).