List for cgit developers and users
 help / color / mirror / Atom feed
From: Jason at zx2c4.com (Jason A. Donenfeld)
Subject: RFE: .so filters
Date: Fri, 10 Jan 2014 16:57:02 +0100	[thread overview]
Message-ID: <CAHmME9rgvZJ_r6B6dPLx_zC6z05dx_TS4PJi5rwA5NNYtH=1FA@mail.gmail.com> (raw)
In-Reply-To: <20140110090639.GN7608@serenity.lan>

On Fri, Jan 10, 2014 at 10:06 AM, John Keeping <john at keeping.me.uk> wrote:
>
> This seems drastically over complicated.

So here's the situation. There's a lot of "state" that we're taking
advantage of in using processes that terminate, that needs to be
replicated:

  *a* Sending arguments to the program, and distinguishing these
arguments from data [via argv in main]
  *b* When we are finished sending data to the filter [via a closed
file descriptor]
  *c* When the filter is finished sending data to cgit [via the filter
process terminating / waitpid]

If we skim on any one of these requirements, we introduce either
limited functionality or race conditions. To fully replicate these
required state transitions, we must either:

  *1* Use an out of band messaging mechanism, such as unix signals
(what I've implemented in jd/longfilters, for example)
  *2* Use two file descriptors (which then would require the filter to
select() or similar)
  *3* Come up with an encoding scheme that would separate these
messages from the data (which would then require the client to know
about it)

I don't really like any of these possibilities. I've implemented *1*
already, and while it works, it's a hassle to implement the signal
handling without races in the filter because of the *b* requirement
above. *2* is even harder to implement in simple scripts, so that's
out. And *3* is a full blown disaster, which would be so invasive that
we might as well use shared libraries if we're going to use this. So
that's out.


What all of this points to is the fact that persistent filters are not
going to wind up being a general thing available for all filter types.
I'm going to implement specifically email filters using it, and it's
going to have a domain specific encoding scheme:

  * the filter receives the email address on one line
  * the filter receives the data to filter on the next line
  * the filter then spits out its filtered data on a single line

This specificity is obviously unsuitable for any multiline filtering
or filtering of binary data. But it is simple enough to implement in
scripts that I'm fine with it.

It will require these changes:

  *a* Allowing persistent filter processes, with proper start-up /
tear-down times and pipe preservation (already implemented in
jd/longfilters)
  *b* Not dup2()ing the pipe to stdin/stdout, so that the filter close
function can read from the pipe itself, and block until it receives
its output (which is a bit of a different way of doing things from how
we're doing it now)

I'm not too pumped about *b*, but that's the only way unless we're to
use signals or some other OOB mechanism. I'll code this up and report
back.


  reply	other threads:[~2014-01-10 15:57 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-09 21:34 Jason
2014-01-09 22:29 ` mailings
2014-01-09 22:58 ` john
2014-01-10  1:41   ` Jason
2014-01-10  2:11     ` Jason
2014-01-10  4:26       ` Jason
2014-01-10  9:06       ` john
2014-01-10 15:57         ` Jason [this message]
2014-01-10 17:12           ` bluewind
2014-01-10 17:20             ` john
2014-01-10 17:43               ` mricon
2014-01-10 18:00                 ` Jason
2014-01-10 18:00               ` Jason
2014-01-10 17:57             ` Jason
2014-01-10 20:03               ` bluewind
2014-01-10 20:11                 ` john
2014-01-10 20:25                   ` bluewind
2014-01-10 20:36                     ` john
2014-01-10 20:56                       ` bluewind
2014-01-11  2:37                         ` Jason
2014-01-11  2:34                 ` Jason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHmME9rgvZJ_r6B6dPLx_zC6z05dx_TS4PJi5rwA5NNYtH=1FA@mail.gmail.com' \
    --to=cgit@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).