From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk0-f177.google.com (mail-qk0-f177.google.com [209.85.220.177]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id 5e2b0b64 for ; Sun, 5 Feb 2017 10:23:27 -0500 (EST) Received: by mail-qk0-f177.google.com with SMTP id s140so34193617qke.0 for ; Sun, 05 Feb 2017 07:23:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=i3wm-org.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to; bh=lx7545y7fdpX4m/pkAcuu8/yPuQzEHm2QoVhUKwOsl8=; b=adRGULfvBkLqTphI43F3mdRP1H9dSJquFpixN5GNVF9Z6WNRRL2zsLcV3cyulWKG+x 5jPxJGsLmvCdqd7sgtONl9zfWmHMCs0znt0+Gih2O8/PzPLc5v5Vw/aehBPkfqYw1l+v Wwe9OUS331XJh6BiXq0zgJNfT0vqi4YkhV84KG0RrG+hcL9ob+ZWaQDiajQXivLKy7dy qvP3ShBpt5mKZmHp5+Go2h5bqEwaeIIjLicKh3LDNiPbrOyYzRnGxUmIx4rfmb0ElFez MoimL/sMmAOlAXvjfGz3ollwhSYpRqUOkZtjky8voDs1skuuP6NhWZ4EzSkgQ0CQV82n jfPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to; bh=lx7545y7fdpX4m/pkAcuu8/yPuQzEHm2QoVhUKwOsl8=; b=C8nWwYd+wyBqpaiqMKAIpQVTTkmgiZjgbKv9KiF2w3+F+9ncYSs+/HpS9XZ1oDEvsJ L+kofhZYdHEn7V2Lht5XYw531MughVddI8tlsR08Gzg5Jw0ws40sRnEzmrono02MH0QC AT7zYn+3gImvqjqDWibcS78aAnZwrk/QOi+3TvVBgEwLO+Gux372SVDD2FMH1eU23CwS Agf9uWQklWzhmFEKB6jvAD9tx8lzsrQMmV3i6Gg0lb7T2imUQpe6QujYji/YzVb6n5k1 aFfbEO5xGdy4HhRU4BoOMjndq8I7dINOqYQpf3jeeUxrdoBANX2zIyJNaGfBuF9BMEgS B69Q== X-Gm-Message-State: AMke39kAL3+RfeELlrTcejMpwWK+Y+tIps0mByR8pqAium+3VFhQrMzLQsWUfnY7Yu1xBnkkahtbUSb75EyzDg== X-Received: by 10.55.164.77 with SMTP id n74mr5482839qke.322.1486308206025; Sun, 05 Feb 2017 07:23:26 -0800 (PST) X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Sender: michael@i3wm.org Received: by 10.55.98.83 with HTTP; Sun, 5 Feb 2017 07:23:05 -0800 (PST) X-Originating-IP: [2a02:168:4a00:0:225:90ff:fe5d:53a2] In-Reply-To: <20170204125755.GB12010@athene.usta.de> References: <20170204125755.GB12010@athene.usta.de> From: Michael Stapelberg Date: Sun, 5 Feb 2017 16:23:05 +0100 X-Google-Sender-Auth: tF0Tbx_cFlq3gunyoVWOkdGN8Fk Message-ID: Subject: Re: [PATCH] Implement -u (UNIX socket batch processing) To: tech@mdocml.bsd.lv Content-Type: multipart/mixed; boundary=94eb2c06bb2ea5fbf50547ca186c --94eb2c06bb2ea5fbf50547ca186c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for the thoughtful response, replies inline: On Sat, Feb 4, 2017 at 1:57 PM, Ingo Schwarze wrote: > Hi Michael, > > On Sat, Jan 21, 2017 at 1:29 PM, Michael Stapelberg wrote: >> Quoting from the commit message for your convenience: > >>> The option argument identifies the file descriptor number of a UNIX >>> listening socket which was inherited from the parent process when said >>> parent exec'd mandoc. >>> >>> mandoc will accept one connection at a time on that socket. As long as >>> the connection lives, mandoc reads 1-byte messages with ancillary data >>> (control information) from the connection. The ancillary data is used t= o >>> pass three file descriptors from the parent process to mandoc. These >>> file descriptors are then dup2'ed to fd 0, fd 1 and fd 2, respectively. >>> >>> After the file descriptors have been set up, a single manpage is read >>> from stdin and processed to stdout/stderr. >>> >>> This change effectively adds a batch processing mode to mandoc, which >>> eliminates the fork+exec overhead that otherwise accompanies the >>> conversion of a large number of manpages. The resulting speed-up has >>> been observed to exceed a factor of 6, reducing the total conversion >>> wall-clock time of about a 470k manpages from more than 2 hours to >>> a mere 22 minutes. > >> To expand on the rationale: quicker conversion >> >> 1. allows us to provide our users more recent manpages (we can run >> more frequently) >> 2. hogs fewer resources which can instead be used for actually serving >> manpages >> 3. massively simplifies our disaster recovery story: we can just >> re-generate the entire repository within a small number of hours, >> instead of facing days of downtime > > I think i understand why you need this. I repeatedly thought about > it and came to the conclusion that there is little to improve with > the basic concept. In particular, opening the file descriptors, > in particular those for writing, in the controlling process is the > right thing to do. We do not want the same process that is doing > the parsing and formatting to open files for writing. That would > be unsafe in case of programming errors, and parsers are notoriously > buggy. So your original idea of passing the descriptors is clearly > better than my first idea of passing filenames instead. > >> As per http://man.openbsd.org/OpenBSD-current/man3/CMSG_DATA.3, >> control information has been around since 4.2BSD, so support across >> operating systems should be good. > > Yes, but i don't like the idea of clobbering that into main.c. > At some point, i will have to do privilege separation in mandoc. > That will almost certainly use the OpenBSD imsg library, which > internally uses cmsg. It will not be a protability problem: imsg > is very small and can easily be bundled for systems not having it, > exactly like ohash. > > But if i would commit your code now, it would conflict later, using > cmsg on two different levels. Besides, even though your code is > required for your purpose, and even though it may also help speeding > up bulk comparisons that's i'm sometimes doing to catch regressions, > it is not useful for the general public, so it does not belong into > main.c. Just like the code in cgi.c, only a handful of server > administrators will need it. > > Besides, for the daemon you need, main.c is mostly the wrong codebase. > It contains huge amounts of code that, in your server, are just in > the way: many command line options for interactive use, makewhatis(8) > integration, database and search functionality, pager and process > group control, file system lookup, preformatted passthrough... > > So your code should be in a separate program. I have called it > mandocd.c. It now has 270 lines (compare to the 1108 in main.c), > your read_fds() is the largest non-main function, and even in main(), > the meat of the code is what you wrote, plus relatively small glue. That looks good, thanks for merging the patch! > > > There was one major problem with what you wrote. You used abstract > sockets, and that is non-portable and Linux only. On other systems, > you cannot call bind(2) with a name starting in '\0'. To be portable, > you have to create the socket with socketpair(2) and without bind(2), > listen(2), and accept(2). The fact that you cannot reconnect when > you lose the connection may seem like a downside at first, but i don't > think it's really any inconvenience. If the controlling process > wants to reconnect for whatever reason, it can just as well spawn > a new mandocd. I'm not even sure that your client code contained > support for reconnecting, even though your server side supported it. Agreed. I added reconnects in the early stages of development of this patch, when I was still using a named socket. With the feature in its current form, I agree that we do not need reconnects. > > If you absolutely need reconnection capability, we have to use a > named socket in the file system. But i'd rather not do that if we > can avoid it. It just causes security grief because we would have > to protect the socket against unauthorized access. > > I committed new files mandocd.c and catman.c to bsd.lv such that > we can work on them in the tree. catman.c is an improved version > of your demo.c that actually does something useful: Try making > directories src/ and dst/, put some mdoc(7) and/or man(7) manual > pages into src/, and run > > ./catman src dst > > or > > ./catman -T html src dst > > Formatted versions appear in dst/, with the same relative file > names. Of course, this is not polished. For example, if you use > any subdirs in src/, you have to create the same subdirs in dst/ > by hand - i'll probably fix that, and other quirks, later. > > You didn't have a license on demo.c, but i assume you intended to > license it under the same license as your patch to main.c. Yes, you are correct. > Please review the Copyright/license headers in the two files > and protest if i misunderstood what you intended. > > > The important questions right now are: > > * Does the concept of a stand-alone file mandocd.c > and a "mandocd" executable separate from "mandoc" > work for you? Absolutely. > > * Does using socketpair(2) instead of socket(2), bind(2), listen(2), > accept(2) work for you? Yes. I went ahead and changed debiman to use this interface (and I realize you still marked it as experimental, so I=E2=80=99ll gladly update when necessary): https://github.com/Debian/debiman/commit/3715b1eaf9c1793b9a8c7b= 1787e2d6511ca2b004 > > Of course, any other comments on the committed code are also > welcome. I added a tiny patch which includes stdint.h which is required to find uint8_t on Linux (fixes compilation of mandocd). > > Right now, i don't plan to merge this back to VERSION_1_13. > You don't need SQLite3 support anyway, so you should be fine > switching to mandoc-1.14.1 when it comes out. Yep, sounds good. --=20 Best regards, Michael --94eb2c06bb2ea5fbf50547ca186c Content-Type: text/x-patch; charset=US-ASCII; name="0001-mandocd-add-stdint.h-to-fix-compilation-on-Linux.patch" Content-Disposition: attachment; filename="0001-mandocd-add-stdint.h-to-fix-compilation-on-Linux.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_iystg1y50 RnJvbSBlZDgwYmE0NGVlZGQwNTQzZjlkYTliMWUyMzEyMDQyNjIxMjFmYmQ4IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBNaWNoYWVsIFN0YXBlbGJlcmcgPHN0YXBlbGJlcmdAZGViaWFu Lm9yZz4KRGF0ZTogU3VuLCA1IEZlYiAyMDE3IDE1OjM5OjQ2ICswMTAwClN1YmplY3Q6IFtQQVRD SF0gbWFuZG9jZDogYWRkIHN0ZGludC5oIHRvIGZpeCBjb21waWxhdGlvbiBvbiBMaW51eAoKLS0t CiBtYW5kb2NkLmMgfCAxICsKIDEgZmlsZSBjaGFuZ2VkLCAxIGluc2VydGlvbigrKQoKZGlmZiAt LWdpdCBhL21hbmRvY2QuYyBiL21hbmRvY2QuYwppbmRleCBmY2UyODEyLi40MDZlOTkxIDEwMDY0 NAotLS0gYS9tYW5kb2NkLmMKKysrIGIvbWFuZG9jZC5jCkBAIC0yOCw2ICsyOCw3IEBACiAjaW5j bHVkZSA8c3RkbGliLmg+CiAjaW5jbHVkZSA8c3RyaW5nLmg+CiAjaW5jbHVkZSA8dW5pc3RkLmg+ CisjaW5jbHVkZSA8c3RkaW50Lmg+CiAKICNpbmNsdWRlICJtYW5kb2MuaCIKICNpbmNsdWRlICJy b2ZmLmgiCi0tIAoyLjExLjAKCg== --94eb2c06bb2ea5fbf50547ca186c-- -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv