From mboxrd@z Thu Jan 1 00:00:00 1970 From: john at keeping.me.uk (John Keeping) Date: Wed, 20 Jun 2018 00:02:59 +0100 Subject: cache-size implementation downsides In-Reply-To: <20180616154621.GA1922@john.keeping.me.uk> References: <20180613190241.GC11657@chatter> <20180616154621.GA1922@john.keeping.me.uk> Message-ID: <20180619230259.GF1922@john.keeping.me.uk> On Sat, Jun 16, 2018 at 04:46:21PM +0100, John Keeping wrote: > On Wed, Jun 13, 2018 at 03:02:42PM -0400, Konstantin Ryabitsev wrote: > > 2. I have witnessed cache corruption due to collisions (which is > > a bug in itself). One of our frontends was hit by a lot of agressive > > crawling of snapshots that raised the load to 60+ (many, many gzip > > processes). After we blackholed the bot, some of the cache objects for > > non-snapshot URLs had trailing gzip junk in them, meaning that either > > two instances were writing to the same file, or something else resulted > > in cache corruption. This is probably a race condition somewhere in the > > locking code. > > I've had a look at this, and I think we might end up dropping our lock > too early thanks to this code (in fill_slot()): > > /* Restore stdout */ > if (dup2(tmp, STDOUT_FILENO) == -1) { > > Before this line, STDOUT_FILENO refers to lock_fd which has a POSIX > advisory record lock on the entire file. However, the documentation for > that says: > > * If a process closes any file descriptor referring to a file, then all > of the process's locks on that file are released, regardless of the > file descriptor(s) on which the locks were obtained. This is > bad: it means that a process can lose its locks on a file such as > /etc/passwd or /etc/mtab when for some reason a library function > decides to open, read, and close the same file. > > I haven't verified this, but I suspect that dup'ing the original stdout > over STDOUT_FILENO is equivalent to closing a file descriptor referring > to our lock file. And thus the lock is released at this point, which is > before we rename the lock file over the cache file. > > If that is correct, then there is a window during which a new process > can open the lock file to write new content and successfully acquire the > lock on that file even though it is still being used by another process. I confirmed this behaviour with trace-cmd. Before: cgit-7291 : posix_lock_inode: fl=0x0xffff88020faa6258 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7291 : fcntl_setlk: fl=0x0xffff88020faa6258 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7291 : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 type=F_UNLCK ctx=0xffff8801beeb7930 cgit-7291 : posix_lock_inode: fl=0x0xffffc90003627da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7291 : locks_remove_posix: fl=0x0xffffc90003627da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7291 : sys_enter_rename: oldname: 0x559bd0bd5830, newname: 0x559bd0bd57d0 cgit-7291 : sys_exit_rename: 0x0 After: cgit-7488 : posix_lock_inode: fl=0x0xffff8802122c43e8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7488 : fcntl_setlk: fl=0x0xffff8802122c43e8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7488 : sys_enter_rename: oldname: 0x56512cd7f830, newname: 0x56512cd7f7d0 cgit-7488 : sys_exit_rename: 0x0 cgit-7488 : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 type=F_UNLCK ctx=0xffff8802144daa10 cgit-7488 : posix_lock_inode: fl=0x0xffffc900038d7da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0 cgit-7488 : locks_remove_posix: fl=0x0xffffc900038d7da8 dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0 I'm planning to queue the patch below on jk/for-jason and send a PR in the next day or two, but it would be nice to get a reviewed-by before I do that. > -- >8 -- > Subject: [PATCH] cache: close race window when unlocking slots > > We use POSIX advisory record locks to control access to cache slots, but > these have an unhelpful behaviour in that they are released when any > file descriptor referencing the file is closed by this process. > > Mostly this is okay, since we know we won't be opening the lock file > anywhere else, but there is one place that it does matter: when we > restore stdout we dup2() over a file descriptor referring to the file, > thus closing that descriptor. > > Since we restore stdout before unlocking the slot, this creates a window > during which the slot content can be overwritten. The fix is reasonably > straightforward: simply restore stdout after unlocking the slot, but the > diff is a bit bigger because this requires us to move the temporary > stdout FD into struct cache_slot. > > Signed-off-by: John Keeping > --- > cache.c | 37 ++++++++++++++----------------------- > 1 file changed, 14 insertions(+), 23 deletions(-) > > diff --git a/cache.c b/cache.c > index 0901e6e..2c70be7 100644 > --- a/cache.c > +++ b/cache.c > @@ -29,6 +29,7 @@ struct cache_slot { > cache_fill_fn fn; > int cache_fd; > int lock_fd; > + int stdout_fd; > const char *cache_name; > const char *lock_name; > int match; > @@ -197,6 +198,13 @@ static int unlock_slot(struct cache_slot *slot, int replace_old_slot) > else > err = unlink(slot->lock_name); > > + /* Restore stdout and close the temporary FD. */ > + if (slot->stdout_fd >= 0) { > + dup2(slot->stdout_fd, STDOUT_FILENO); > + close(slot->stdout_fd); > + slot->stdout_fd = -1; > + } > + > if (err) > return errno; > > @@ -208,42 +216,24 @@ static int unlock_slot(struct cache_slot *slot, int replace_old_slot) > */ > static int fill_slot(struct cache_slot *slot) > { > - int tmp; > - > /* Preserve stdout */ > - tmp = dup(STDOUT_FILENO); > - if (tmp == -1) > + slot->stdout_fd = dup(STDOUT_FILENO); > + if (slot->stdout_fd == -1) > return errno; > > /* Redirect stdout to lockfile */ > - if (dup2(slot->lock_fd, STDOUT_FILENO) == -1) { > - close(tmp); > + if (dup2(slot->lock_fd, STDOUT_FILENO) == -1) > return errno; > - } > > /* Generate cache content */ > slot->fn(); > > /* Make sure any buffered data is flushed to the file */ > - if (fflush(stdout)) { > - close(tmp); > + if (fflush(stdout)) > return errno; > - } > > /* update stat info */ > - if (fstat(slot->lock_fd, &slot->cache_st)) { > - close(tmp); > - return errno; > - } > - > - /* Restore stdout */ > - if (dup2(tmp, STDOUT_FILENO) == -1) { > - close(tmp); > - return errno; > - } > - > - /* Close the temporary filedescriptor */ > - if (close(tmp)) > + if (fstat(slot->lock_fd, &slot->cache_st)) > return errno; > > return 0; > @@ -393,6 +383,7 @@ int cache_process(int size, const char *path, const char *key, int ttl, > strbuf_addstr(&lockname, ".lock"); > slot.fn = fn; > slot.ttl = ttl; > + slot.stdout_fd = -1; > slot.cache_name = filename.buf; > slot.lock_name = lockname.buf; > slot.key = key; > -- > 2.17.1