From: john at keeping.me.uk (John Keeping)
Subject: cache-size implementation downsides
Date: Wed, 20 Jun 2018 00:02:59 +0100 [thread overview]
Message-ID: <20180619230259.GF1922@john.keeping.me.uk> (raw)
In-Reply-To: <20180616154621.GA1922@john.keeping.me.uk>
On Sat, Jun 16, 2018 at 04:46:21PM +0100, John Keeping wrote:
> On Wed, Jun 13, 2018 at 03:02:42PM -0400, Konstantin Ryabitsev wrote:
> > 2. I have witnessed cache corruption due to collisions (which is
> > a bug in itself). One of our frontends was hit by a lot of agressive
> > crawling of snapshots that raised the load to 60+ (many, many gzip
> > processes). After we blackholed the bot, some of the cache objects for
> > non-snapshot URLs had trailing gzip junk in them, meaning that either
> > two instances were writing to the same file, or something else resulted
> > in cache corruption. This is probably a race condition somewhere in the
> > locking code.
>
> I've had a look at this, and I think we might end up dropping our lock
> too early thanks to this code (in fill_slot()):
>
> /* Restore stdout */
> if (dup2(tmp, STDOUT_FILENO) == -1) {
>
> Before this line, STDOUT_FILENO refers to lock_fd which has a POSIX
> advisory record lock on the entire file. However, the documentation for
> that says:
>
> * If a process closes any file descriptor referring to a file, then all
> of the process's locks on that file are released, regardless of the
> file descriptor(s) on which the locks were obtained. This is
> bad: it means that a process can lose its locks on a file such as
> /etc/passwd or /etc/mtab when for some reason a library function
> decides to open, read, and close the same file.
>
> I haven't verified this, but I suspect that dup'ing the original stdout
> over STDOUT_FILENO is equivalent to closing a file descriptor referring
> to our lock file. And thus the lock is released at this point, which is
> before we rename the lock file over the cache file.
>
> If that is correct, then there is a window during which a new process
> can open the lock file to write new content and successfully acquire the
> lock on that file even though it is still being used by another process.
I confirmed this behaviour with trace-cmd.
Before:
cgit-7291 : posix_lock_inode: fl=0x0xffff88020faa6258 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : fcntl_setlk: fl=0x0xffff88020faa6258 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 type=F_UNLCK ctx=0xffff8801beeb7930
cgit-7291 : posix_lock_inode: fl=0x0xffffc90003627da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : locks_remove_posix: fl=0x0xffffc90003627da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7291 : sys_enter_rename: oldname: 0x559bd0bd5830, newname: 0x559bd0bd57d0
cgit-7291 : sys_exit_rename: 0x0
After:
cgit-7488 : posix_lock_inode: fl=0x0xffff8802122c43e8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7488 : fcntl_setlk: fl=0x0xffff8802122c43e8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7488 : sys_enter_rename: oldname: 0x56512cd7f830, newname: 0x56512cd7f7d0
cgit-7488 : sys_exit_rename: 0x0
cgit-7488 : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 type=F_UNLCK ctx=0xffff8802144daa10
cgit-7488 : posix_lock_inode: fl=0x0xffffc900038d7da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
cgit-7488 : locks_remove_posix: fl=0x0xffffc900038d7da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
I'm planning to queue the patch below on jk/for-jason and send a PR in
the next day or two, but it would be nice to get a reviewed-by before I
do that.
> -- >8 --
> Subject: [PATCH] cache: close race window when unlocking slots
>
> We use POSIX advisory record locks to control access to cache slots, but
> these have an unhelpful behaviour in that they are released when any
> file descriptor referencing the file is closed by this process.
>
> Mostly this is okay, since we know we won't be opening the lock file
> anywhere else, but there is one place that it does matter: when we
> restore stdout we dup2() over a file descriptor referring to the file,
> thus closing that descriptor.
>
> Since we restore stdout before unlocking the slot, this creates a window
> during which the slot content can be overwritten. The fix is reasonably
> straightforward: simply restore stdout after unlocking the slot, but the
> diff is a bit bigger because this requires us to move the temporary
> stdout FD into struct cache_slot.
>
> Signed-off-by: John Keeping <john at keeping.me.uk>
> ---
> cache.c | 37 ++++++++++++++-----------------------
> 1 file changed, 14 insertions(+), 23 deletions(-)
>
> diff --git a/cache.c b/cache.c
> index 0901e6e..2c70be7 100644
> --- a/cache.c
> +++ b/cache.c
> @@ -29,6 +29,7 @@ struct cache_slot {
> cache_fill_fn fn;
> int cache_fd;
> int lock_fd;
> + int stdout_fd;
> const char *cache_name;
> const char *lock_name;
> int match;
> @@ -197,6 +198,13 @@ static int unlock_slot(struct cache_slot *slot, int replace_old_slot)
> else
> err = unlink(slot->lock_name);
>
> + /* Restore stdout and close the temporary FD. */
> + if (slot->stdout_fd >= 0) {
> + dup2(slot->stdout_fd, STDOUT_FILENO);
> + close(slot->stdout_fd);
> + slot->stdout_fd = -1;
> + }
> +
> if (err)
> return errno;
>
> @@ -208,42 +216,24 @@ static int unlock_slot(struct cache_slot *slot, int replace_old_slot)
> */
> static int fill_slot(struct cache_slot *slot)
> {
> - int tmp;
> -
> /* Preserve stdout */
> - tmp = dup(STDOUT_FILENO);
> - if (tmp == -1)
> + slot->stdout_fd = dup(STDOUT_FILENO);
> + if (slot->stdout_fd == -1)
> return errno;
>
> /* Redirect stdout to lockfile */
> - if (dup2(slot->lock_fd, STDOUT_FILENO) == -1) {
> - close(tmp);
> + if (dup2(slot->lock_fd, STDOUT_FILENO) == -1)
> return errno;
> - }
>
> /* Generate cache content */
> slot->fn();
>
> /* Make sure any buffered data is flushed to the file */
> - if (fflush(stdout)) {
> - close(tmp);
> + if (fflush(stdout))
> return errno;
> - }
>
> /* update stat info */
> - if (fstat(slot->lock_fd, &slot->cache_st)) {
> - close(tmp);
> - return errno;
> - }
> -
> - /* Restore stdout */
> - if (dup2(tmp, STDOUT_FILENO) == -1) {
> - close(tmp);
> - return errno;
> - }
> -
> - /* Close the temporary filedescriptor */
> - if (close(tmp))
> + if (fstat(slot->lock_fd, &slot->cache_st))
> return errno;
>
> return 0;
> @@ -393,6 +383,7 @@ int cache_process(int size, const char *path, const char *key, int ttl,
> strbuf_addstr(&lockname, ".lock");
> slot.fn = fn;
> slot.ttl = ttl;
> + slot.stdout_fd = -1;
> slot.cache_name = filename.buf;
> slot.lock_name = lockname.buf;
> slot.key = key;
> --
> 2.17.1
next prev parent reply other threads:[~2018-06-19 23:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-13 19:02 konstantin
2018-06-16 15:46 ` john
2018-06-19 23:02 ` john [this message]
2018-06-20 6:01 ` list
2018-06-20 7:44 ` john
2018-06-20 7:55 ` list
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180619230259.GF1922@john.keeping.me.uk \
--to=cgit@lists.zx2c4.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).