From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.2 Received: from mail-wr1-x43f.google.com (mail-wr1-x43f.google.com [IPv6:2a00:1450:4864:20::43f]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id b42a701a for ; Thu, 14 Feb 2019 21:54:13 +0000 (UTC) Received: by mail-wr1-x43f.google.com with SMTP id w12sf2814876wru.20 for ; Thu, 14 Feb 2019 13:54:13 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1550181253; cv=pass; d=google.com; s=arc-20160816; b=qt2hX/0eTF0Wt65+fltaxqqgWSAOfpEqNJnBWoHBBqxJkKPNa7nxNvpRgCa1ruptrB x4p44Z4tjcXEMY/pxObtRMEpdN2ZjbjFlNU0lzLpCvBUcCEXiR0gNwmJ+jcqfRC2OJHj h4X7Dv10CVchFMZ9w4gxiM5d/4QPrsV+EiQJyI3KfD/+fu32Cmghwxk1wiR1DJbE9put c5EFasxX3nVcUUgziC50ZHeq5XxHLrmvX1wgz9VslFQJ3LYHeWMuqe5sPxLpIRqnpkIC zOw4MErhbAsmSlRfLHY0XrjBLGPFsUx7QYEhJmaFEGTyzd0R40NNcXjDffsk2o5GBMXE N1Tg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:mime-version:user-agent:message-id :date:subject:cc:to:from:sender:dkim-signature; bh=poOQaC5Y9XUx4ewNcxB9HJvCuSxG9tRTkETSwtvRZ0Q=; b=FnxfarUFOVOUkmyk5XhTT1VJ8TNwsDyXexFFNbNFSpVPHp/UJRE42Hv5liv3rH6xno Lf39H4mL67XG64BVo2AGhf10dRYy3FTS71E7IDMm/sR0CflbnYecuDMh2Vztzg031LdX 5nw6+0+nn5oZbEUEk00rW6YXTtlBSv5wxNDsmhZNnp5uOtK3zaXUbZrxCsEB3kOiTm21 WbUUytdc7Yf8GD0UBUJGBT1tDCBGdx27S8kc0+Fp0NKUIGuDehSFRbcnLiQzBnawmIhk tkq4sCMQUZDIxqogL2b61VvITnYhioYsCnwJs1oZpym6un9ZqxZ4mweJ/9O1cf4APccp nYEQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of chneukirchen@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=chneukirchen@gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:user-agent:mime-version :x-original-sender:x-original-authentication-results:precedence :mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=poOQaC5Y9XUx4ewNcxB9HJvCuSxG9tRTkETSwtvRZ0Q=; b=VTrrIScs72l2I3962RZqyxiFByGmS/e3vbYGaQROAqTZnsHzhdYrifKOToyOuaNkr7 vIA5zTbTmNm2jYNALa6o81Io0N1/1vmV0NQmjiT86XaZUa7ltNAZaHtTCDyKC4s6fdG9 bjxxvoi5UtMABc/pZmOd50ivBMRmq5lLwDInLxGN7Z64mK4xUDGFY8Ne1BVd33ev8S2C 2GKYDhhynVMzn30g9aSjXquvlVJcv7ZEq53y2gcwVp6FjOf1OZIUpGLK/OICUJW7/lw1 +OIAg1WnUwSN+jYJGvjba/zHUFGxtht2LpOgXUKuNEnFigsCmBgt2TMwMFoqKejeReUk 6Wag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:cc:subject:date:message-id :user-agent:mime-version:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=poOQaC5Y9XUx4ewNcxB9HJvCuSxG9tRTkETSwtvRZ0Q=; b=TJ0l79hQA62pnl9aKnoSsD8GYUSMT4Es99mZd9j/6V6Ny30GDCc5oz6rGhbx7hHofH B0qGfiKwFdbvDKlBJHqwpTp+iBHr9Kt2LhuG2/ksrfN8rH2jDm2lhPYgtmuvBT2ADuCk Az2vEEr3gHB77E4iRzlLuP/MYNpAlYUk/G9XzoIbzlq4AIkKGd4gRccFkfj3s9PymyyX 9L7qdL8N0Z6XAbgo2SvvUITHxY++HxZqcKT7D6FuuT+lf+hIjyjb22AViwxDBOTQIcxv qeeCnPmQjod5qI2PE+bE0INS7zU5VEg6T1H2N4QJVp3q8r75/CIQsxqAQOlMUBj6nEs2 T8UQ== Sender: voidlinux@googlegroups.com X-Gm-Message-State: AHQUAuampDKRwz1iReSAs2ZHXGOA2MX198L6a3SfIRX0FwJpudNgwCOy gbCIXVDOxBIluKRUuI+vRX8= X-Google-Smtp-Source: AHgI3IbaJQNK/+2FXaGfhklkoKJbr5SyWGjYUqsuf8WcHlCsRXNG81vLRM/71QNX9/BlGwXzL2K8Jg== X-Received: by 2002:a1c:2dca:: with SMTP id t193mr37941wmt.2.1550181253369; Thu, 14 Feb 2019 13:54:13 -0800 (PST) X-BeenThere: voidlinux@googlegroups.com Received: by 2002:a7b:cc01:: with SMTP id f1ls184005wmh.7.gmail; Thu, 14 Feb 2019 13:54:12 -0800 (PST) X-Received: by 2002:a1c:c3c5:: with SMTP id t188mr384854wmf.15.1550181252700; Thu, 14 Feb 2019 13:54:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550181252; cv=none; d=google.com; s=arc-20160816; b=ihxubmRhlhY6I6mmchqACk7xsBtFab9syEYPHw/nA93v7GGuKlZN3O+BFJC4oonODt 44XOUo6ddy/JkhTBfdmIpSEMr4jFjz+1E4PraEqT+B/Jw2MRp7KMukFWk+XWwTU4W6xr dxtWyiUyay4/sKBQ3c6Hu+6ze1YbYpNqzokA3ZNQXjjQTYkAieSJLRILRhxE6UycZp8U v/7yy0thn0gKSPisTJCcoSp060xPjN6GKgi+J7o3LYhm5dYMCOcvGBDBam+dsQjEI95r UbySSfVpjEt38mUspSj5LpUDxOF/8dvevQFtPDOkCO6Zm5ymfxWW+MyjiXo0bXYd8JXO +nwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:cc:to:from; bh=mzqFfpINwCiyHjDAb0cc55qXWSwAufBRNJksbWoW8Kc=; b=CUbUlkyLFHGmgHKq6zNvW+/H5rr6zplpweKc80jQAjl8CZ4jEIkj79BKgIVwTAIUBA z6UiPRVjPAJl/SQ8/ZhG/hNILWXAUEV0c1bn1DFGPqhBJogetXG5BuyZbawfI2k2vcPm VT/uNwTMNFvJ2wI7tpRRAH/L75/y+Q7NW87p9wj99ErxALtBCenasC5HzgYC78BsUCo3 9eNfDCCkvdwFl/96ckAkYslDjyWXy5KxhBjfRN2f6bW+B57zfo0Op2ZTquTf/tYwtyId GafqzUjw7WYmUru9obDUVtIq+tGTNuLzCjANhAa2KsHTBply9u87iLfRDtkOeaGq+hqG 2P6A== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of chneukirchen@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=chneukirchen@gmail.com Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com. [209.85.208.50]) by gmr-mx.google.com with ESMTPS id h18si149540wrv.1.2019.02.14.13.54.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Feb 2019 13:54:12 -0800 (PST) Received-SPF: pass (google.com: domain of chneukirchen@gmail.com designates 209.85.208.50 as permitted sender) client-ip=209.85.208.50; Received: by mail-ed1-f50.google.com with SMTP id a2so6364635edi.0 for ; Thu, 14 Feb 2019 13:54:12 -0800 (PST) X-Received: by 2002:a17:906:95c3:: with SMTP id n3mr712865ejy.59.1550181252100; Thu, 14 Feb 2019 13:54:12 -0800 (PST) Received: from rhea.home.vuxu.org ([2001:7f0:3003:235f:f473:422b:501a:17fd]) by smtp.gmail.com with ESMTPSA id l25sm957563edr.45.2019.02.14.13.54.09 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 14 Feb 2019 13:54:10 -0800 (PST) Received: from localhost (rhea.home.vuxu.org [local]) by rhea.home.vuxu.org (OpenSMTPD) with ESMTPA id ac22b5e6; Thu, 14 Feb 2019 21:54:08 +0000 (UTC) From: Leah Neukirchen To: leahutils@inbox.vuxu.org Cc: voidlinux@googlegroups.com Subject: epona.vuxu.org outage post mortem Date: Thu, 14 Feb 2019 22:54:07 +0100 Message-ID: <87mumyvxxc.fsf@vuxu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Original-Sender: chneukirchen@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of chneukirchen@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=chneukirchen@gmail.com Precedence: list Mailing-list: list voidlinux@googlegroups.com; contact voidlinux+owners@googlegroups.com List-ID: X-Google-Group-Id: 289663804196 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , At 00:20 CET tonight (2019-02-14) epona.vuxu.org, a virtual machine host that runs, among others, git.vuxu.org/inbox.vuxu.org and hestia.vuxu.org, the aarch64 builder for Void Linux, went down and only came up around 11 CET again. This was entirely my fault, but how it happened is interesting: I was informed a user-space port forwarding was not working. It was realized using socat, supervised by runit (the init system of Void Linux): socat TCP4-LISTEN:3722,fork,su=nobody TCP6:hestia.vuxu.org:22 However, starting this showed the address was already in use: 2019/02/14 00:20:44 socat[5049] E bind(5, {AF=2 0.0.0.0:3722}, 16): Address already in use My assumption was there was a runaway instance of socat running (for unknown reasons), and I decided to kill all socat instances. My usual tool of choice would have been `killall socat`, but as there were other socat instances running on the machine, I only wanted to kill the port 3722 ones. A quick test with `pgrep` showed a plausible list of PIDs, so I ran kill $(pgrep -f socat.*3722) which seemed to work fine at first. Several seconds later I was greeted with this message: Connection to epona.vuxu.org closed by remote host. Connection to epona.vuxu.org closed. And the box didn't ping anymore... As experienced SSH user, this indicated that the host shut down in some controlled way, else I would have gotten a `broken pipe` message. But how could the `pkill` shut down the machine? I could not come up with any plausible theory, but it was already late and there was alcohol involved as well. So I decided to leave it at rest until the morning. In the morning, the box still was not up and there was no evidence of a network issue or anything. I decided to enter the Hetzner Control Panel and trigger an "automated reset". Nothing changed, the box still didn't ping. I tried to activate the "vKVM rescue system", to no avail. At this point I actually assumed some hardware issue, and I called for a "manual reset", which means someone has to get up, walk to the machine, restart it, and watch a bit whether it seems to boot properly. Of course, the true reason was much simpler: the box was powered off. Unfortuately, nothing about the Hetzner Control Panel shows you this simple fact, so I guess I'm not the only one to send poor support folks to go boot other people's machines. The box booted fine and all services were restored within minutes. The remaining question is how it's possible that the command shut down the machine, and it's easy to answer too: `runsvdir`, the main runit process that controls "stage 2", i.e. while the system is up, displays error messages of all direct child processes in it's `argv[0]`, so you can check for unlogged messages with `ps`: runsvdir -P /run/runit/runsvdir/current log: ....logs here.... Unfortunately, in above sitation this resulted in both "socat" and "3722" to appear in the error messages, and thus the process title, which made `pkill -f` match it and, as commanded, kill `runsvdir`, which results in exiting stage 2 and runit performing an orderly shutdown of the system. Duh. Lessons learned: - The first intuition is often right, even if it's not plausible at first. - Don't use `pkill -f` as root, at least not without careful checking and regexp anchoring. - If a box doesn't react to reset requests, try sending wake-on-lan to turn it on. - runit should reboot by default, not shutdown! -- Leah Neukirchen http://leah.zone -- You received this message because you are subscribed to the Google Groups "voidlinux" group. To unsubscribe from this group and stop receiving emails from it, send an email to voidlinux+unsubscribe@googlegroups.com. To post to this group, send email to voidlinux@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/voidlinux/87mumyvxxc.fsf%40vuxu.org. For more options, visit https://groups.google.com/d/optout.