From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable autolearn_force=no version=3.4.2 Received: from mail-lf1-x140.google.com (mail-lf1-x140.google.com [IPv6:2a00:1450:4864:20::140]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id adccddfb for ; Sat, 16 Feb 2019 02:15:10 +0000 (UTC) Received: by mail-lf1-x140.google.com with SMTP id n193sf1202747lfb.5 for ; Fri, 15 Feb 2019 18:15:10 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1550283310; cv=pass; d=google.com; s=arc-20160816; b=iG3uo0RMrVFH/ArcBlX2rJCgK1DaVf4B/Wsm8T63cFq0+ijxKKO8CVF5LTDdgxUs12 oP9DOYfFTRNMcUYftvYM8sRdPCCPmcvEtY9JZjbisk4ifR2k44sKms9cup4tED5ZdktG NW7XnFRM34713LfjAt020fNJxAY6h/FhfMlQyY9bW2l33u0JGof0RHO27og04Xu3XwpD GjRqhwIbX6U5yP8YoUduT277n2XrwVkhI5K7eK+Npa5OyPR/WXSACAzSV2qOE7w+buwH Y3tQF2M0emvJRQU9Y6ikF9GgVYaoH+6dXAtwd/L2KHQ9NArmlzXpsb6tXxtTEeriaHwA y7Ig== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:sender:dkim-signature :dkim-signature; bh=dbgL6USKp9W2eEAoC5L7jjwHu6AbbsIcAjnzdrrJ8es=; b=nCWx0tfE84ihUQA4GkYcOb2LBtVxX5SDNdGM1A7lOYLACrH3QP97cZ/baQ8RhjtXbz RSiTXWd6oDNQHGQ1NwAdf/Nbly3Zr6Xiyv5x1Ocse9OvBGKS+/H6IT0PvtC8zzU5MGl8 9DJ65II5wUjJ2X0C/HixWaXJE9geLeKXDkDc1+UIvnIJNCVTL9ycc+l3Do3lIaFfz+GS XDZZbkW3Fq0bTnBaYJgXIY3W3rHtBmqWWkl+0gjFQxASXZ3bnzTQNzvOMun+fEDk+4Fi rn0H8Gyl9GwcnGhgom6J1mQfWD45ZmtB2jnUJzUOoQC5aMgeOqG9lJPtmsQn7bVNX7GB tRFg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OIMBbCBr; spf=pass (google.com: domain of diegoaugustomolina@gmail.com designates 2a00:1450:4864:20::141 as permitted sender) smtp.mailfrom=diegoaugustomolina@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:x-original-sender:x-original-authentication-results :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=dbgL6USKp9W2eEAoC5L7jjwHu6AbbsIcAjnzdrrJ8es=; b=O6r8aF/9LbO9w0mF+a5yZqZtSs44GEhJU1ZJoSLbPR1/f0wp6hNbWpUInQ5H42RnlW WbTKtHkksIY1CeaZ/re36hYTw4AY9HH7S6ZQqeaApgE9FyOUXwDg5tprA/B9k9Z5ViVx 0vztNgk3oeWHIzc+5k18m8MAYW4lmSZpfv6/2RePK3A2qdAe+Td4Q3jma8NZuz/jPuyO w4NbXhDUaxAYVG+Azaf1/bQDjx6hMRwfY42LeIPinWYW1QIBYcWD9IeA6qyVMljIqqIs F/5VT4mxk96+UN4ad3CL+iENYurkgjQylisi9OzMb0cYdc6sXhcV3PqIbeUXtuH2Uyd5 us3w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:x-original-sender:x-original-authentication-results:precedence :mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=dbgL6USKp9W2eEAoC5L7jjwHu6AbbsIcAjnzdrrJ8es=; b=pIdtL2par0HVejf15Ibopfk8SGPdv1Ct4JLx8QPPrZnL7Wy9lGg5RBhiWV2O9boWv+ s6fC53c1CQXV1RD3wz2laIAIGixB3dXiJSVvJi+qLeWKob49PU53H4nRZAnRc3msFbS6 +lFnRIjMepddHIOIyNXDFCW/h6LBNwsggZd6p/9TAk4mD6TWTgU15aTLsMYYxm6PaYSP Rjt0CgUfW69OcmLuCOxzh2XidJopA84AO7geTBfQK6XKt4IBNq62xFd0Rytd6VO8T0Iy JUp0ZtukHXjCyizS78aRZRpyaQ4bt66+CbypPhFcrSLVQr3XhO7XVKStgiKApvyJsESl NFEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:mime-version:in-reply-to:references:from :date:message-id:subject:to:cc:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=dbgL6USKp9W2eEAoC5L7jjwHu6AbbsIcAjnzdrrJ8es=; b=gzm26izsHe75EY1zBGmll9/vOZIw0f4S9jIsi5MWtm7Na17H0+UWgH/QlR0IfSkEgy mgcPvojuPOwey63xD3vYfOQW9ZAh7jG/HbbxWWk6hiSIO0QIDiTx6skbQN4KpA1tGzIa DwT+J8NCtmPO86jZtjbywbwPghAEGYR5fHLhULjL9Rztes4CdvDe5yQ/lhn/wQ3PFEc6 7BIR81bc8vghIBQeK++JlB0fOFP6lD+cbgAyJJIOjwXBiRuc1qC76ZrSLTtLoJRAXeMG fuX7ILGGR6CdilHv9Sn86/7FMxzKNtrU5IjfUHNt+MiM60KyvuJgRkIcnK6fwHKT+tZw aakw== Sender: voidlinux@googlegroups.com X-Gm-Message-State: AHQUAuaUrea7NoqphXEjNbB2uVS19avLlaX1NcB/zxY6tIITCC+VKufn UH2o9G6N/ExR2urnhP+tvNQ= X-Google-Smtp-Source: AHgI3IZJw+GjMUhcEqDILNFjHn8cPnwUAsG1gqrdQ/3HQllWH8rKZPHc9L0/Ey8PcLUyAmyoVMHNcQ== X-Received: by 2002:a2e:5d55:: with SMTP id r82mr6807ljb.5.1550283310116; Fri, 15 Feb 2019 18:15:10 -0800 (PST) X-BeenThere: voidlinux@googlegroups.com Received: by 2002:a2e:8551:: with SMTP id u17-v6ls1007392ljj.3.gmail; Fri, 15 Feb 2019 18:15:09 -0800 (PST) X-Received: by 2002:a2e:760a:: with SMTP id r10-v6mr772312ljc.7.1550283309377; Fri, 15 Feb 2019 18:15:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550283309; cv=none; d=google.com; s=arc-20160816; b=qw0ohfld0sZLIiPFSUnEYElZosqtEtZe4BhhSLX5O9n5wojVQVhVOfBfmIaLdNxZHx XakshAqshdFaaXawxf7eM4yLkSi5vI72+sjlsGO2E8xls2Po1pOAu4xAqs2tc7LGnyS0 B8lw8YxAnUylb0qznZ0yuNANNK/6QdQUMXgo5+4mCGAHzUeTcwGiuKn8qlAKlDqWkOub lJT+WAeL2VBJfPZ+j0dCPni6c4B7voj1t/BdDmiWEzaid2zHpRHF37ftVUm8hd4/wHN1 OpH/NgK3QaBvIIJid+lHgW3MiDipFtHDAPcRDqvG538f9riNTyFsv38CeoCZK2J+0A/g HKcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dkim-signature; bh=dTD/n78oWVvBzDFnrGD9HhCvw+1gFpnDOmmOCb60Gcc=; b=dtRIFah3IlBWRBNAeJdzpn4jH2L2scEScSupF7AMLplwln2JCP4/GaJVzCxQeSqRqg fBajOd4y0VSU8E5TLWmA6Ld/h71zxDoLqR7QkQcXgdVEfig6wBfUqdXBzLq+fsYkj4gq Bo9Tr/pzLZmn3O2pk1ndKaeZT9sQTpFh5hjajAGuW32b4PlfiYnKdxbbWZ56qfSFJHsF srC1QiGmEXxhbvYwQTdEJD6oXutx4KCwugT7QwUBzMP3l5SLynZ9DH1meFYsC/phuBuD UKjJeMU9zo0PSTKavPkkRjZYiV4CPFwgJoNhFHidMvbSawAjJl9GViXMcu9gppQwUel2 okZQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OIMBbCBr; spf=pass (google.com: domain of diegoaugustomolina@gmail.com designates 2a00:1450:4864:20::141 as permitted sender) smtp.mailfrom=diegoaugustomolina@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-lf1-x141.google.com (mail-lf1-x141.google.com. [2a00:1450:4864:20::141]) by gmr-mx.google.com with ESMTPS id s3-v6si380229ljg.5.2019.02.15.18.15.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 18:15:09 -0800 (PST) Received-SPF: pass (google.com: domain of diegoaugustomolina@gmail.com designates 2a00:1450:4864:20::141 as permitted sender) client-ip=2a00:1450:4864:20::141; Received: by mail-lf1-x141.google.com with SMTP id t14so8564850lfk.7 for ; Fri, 15 Feb 2019 18:15:09 -0800 (PST) X-Received: by 2002:a19:c942:: with SMTP id z63mr7128350lff.162.1550283308970; Fri, 15 Feb 2019 18:15:08 -0800 (PST) MIME-Version: 1.0 Received: by 2002:ab3:4e8a:0:0:0:0:0 with HTTP; Fri, 15 Feb 2019 18:15:07 -0800 (PST) In-Reply-To: <87mumyvxxc.fsf@vuxu.org> References: <87mumyvxxc.fsf@vuxu.org> From: Diego Augusto Molina Date: Sat, 16 Feb 2019 02:15:07 +0000 Message-ID: Subject: Re: epona.vuxu.org outage post mortem To: Leah Neukirchen Cc: leahutils@inbox.vuxu.org, voidlinux@googlegroups.com Content-Type: text/plain; charset="UTF-8" X-Original-Sender: diegoaugustomolina@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OIMBbCBr; spf=pass (google.com: domain of diegoaugustomolina@gmail.com designates 2a00:1450:4864:20::141 as permitted sender) smtp.mailfrom=diegoaugustomolina@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list voidlinux@googlegroups.com; contact voidlinux+owners@googlegroups.com List-ID: X-Google-Group-Id: 289663804196 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , On 2/14/19, Leah Neukirchen wrote: > > At 00:20 CET tonight (2019-02-14) epona.vuxu.org, a virtual machine > host that runs, among others, git.vuxu.org/inbox.vuxu.org and > hestia.vuxu.org, the aarch64 builder for Void Linux, went down and > only came up around 11 CET again. > > This was entirely my fault, but how it happened is interesting: > > I was informed a user-space port forwarding was not working. It was > realized using socat, supervised by runit (the init system of Void > Linux): > > socat TCP4-LISTEN:3722,fork,su=nobody TCP6:hestia.vuxu.org:22 > > However, starting this showed the address was already in use: > > 2019/02/14 00:20:44 socat[5049] E bind(5, {AF=2 0.0.0.0:3722}, 16): Address > already in use > > My assumption was there was a runaway instance of socat running (for > unknown reasons), and I decided to kill all socat instances. My usual > tool of choice would have been `killall socat`, but as there were other > socat instances running on the machine, I only wanted to kill the port > 3722 ones. > > A quick test with `pgrep` showed a plausible list of PIDs, so I ran > > kill $(pgrep -f socat.*3722) > > which seemed to work fine at first. > > Several seconds later I was greeted with this message: > > Connection to epona.vuxu.org closed by remote host. > Connection to epona.vuxu.org closed. > > And the box didn't ping anymore... > > As experienced SSH user, this indicated that the host shut down in > some controlled way, else I would have gotten a `broken pipe` message. > > But how could the `pkill` shut down the machine? I could not come up > with any plausible theory, but it was already late and there was alcohol > involved as well. So I decided to leave it at rest until the morning. "Oh, no! How could someone possibly drink and sysadmin!" Said no other sysadmin ever. > > In the morning, the box still was not up and there was no evidence of > a network issue or anything. I decided to enter the Hetzner Control > Panel and trigger an "automated reset". Nothing changed, the box > still didn't ping. I tried to activate the "vKVM rescue system", to > no avail. > > At this point I actually assumed some hardware issue, and I called for > a "manual reset", which means someone has to get up, walk to the > machine, restart it, and watch a bit whether it seems to boot > properly. > > Of course, the true reason was much simpler: the box was powered off. > > Unfortuately, nothing about the Hetzner Control Panel shows you this > simple fact, so I guess I'm not the only one to send poor support > folks to go boot other people's machines. > > The box booted fine and all services were restored within minutes. > > The remaining question is how it's possible that the command shut down > the machine, and it's easy to answer too: > `runsvdir`, the main runit process that controls "stage 2", i.e. > while the system is up, displays error messages of all direct > child processes in it's `argv[0]`, so you can check for unlogged > messages with `ps`: > > runsvdir -P /run/runit/runsvdir/current log: ....logs here.... > > Unfortunately, in above sitation this resulted in both "socat" and > "3722" to appear in the error messages, and thus the process title, > which made `pkill -f` match it and, as commanded, kill `runsvdir`, > which results in exiting stage 2 and runit performing an orderly > shutdown of the system. Duh. > > Lessons learned: > - The first intuition is often right, even if it's not plausible at first. > - Don't use `pkill -f` as root, at least not without careful checking > and regexp anchoring. > - If a box doesn't react to reset requests, try sending wake-on-lan to > turn it on. > - runit should reboot by default, not shutdown! > > -- > Leah Neukirchen http://leah.zone > Here's my suggestion: # ss -nlpt | grep 3722 That should include your offending instance of socat listening on TCP 3722, stating the PID that has the resource (a.k.a., the socat process that opened the port). Killing that PID blindly might not always do the trick (e.g. "while true; do socat ...; sleep 1; done") so you may want to kill parents/children too. With that PID in mind use "ps faux" to navigate through the process tree. My way is: # ps faux | grep -vF \[ | less -SRI The grep is to remove kernel processes which drown the output. Bye. -- You received this message because you are subscribed to the Google Groups "voidlinux" group. To unsubscribe from this group and stop receiving emails from it, send an email to voidlinux+unsubscribe@googlegroups.com. To post to this group, send email to voidlinux@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/voidlinux/CAGOxLdFfv%3DEH61i2wnkk9%3DXRHUr9WYG0auUb9yP-yPK8bkGuJA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.