From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason at zx2c4.com (Jason A. Donenfeld) Date: Fri, 10 Jan 2014 16:57:02 +0100 Subject: RFE: .so filters In-Reply-To: <20140110090639.GN7608@serenity.lan> References: <20140109225802.GM7608@serenity.lan> <20140110090639.GN7608@serenity.lan> Message-ID: On Fri, Jan 10, 2014 at 10:06 AM, John Keeping wrote: > > This seems drastically over complicated. So here's the situation. There's a lot of "state" that we're taking advantage of in using processes that terminate, that needs to be replicated: *a* Sending arguments to the program, and distinguishing these arguments from data [via argv in main] *b* When we are finished sending data to the filter [via a closed file descriptor] *c* When the filter is finished sending data to cgit [via the filter process terminating / waitpid] If we skim on any one of these requirements, we introduce either limited functionality or race conditions. To fully replicate these required state transitions, we must either: *1* Use an out of band messaging mechanism, such as unix signals (what I've implemented in jd/longfilters, for example) *2* Use two file descriptors (which then would require the filter to select() or similar) *3* Come up with an encoding scheme that would separate these messages from the data (which would then require the client to know about it) I don't really like any of these possibilities. I've implemented *1* already, and while it works, it's a hassle to implement the signal handling without races in the filter because of the *b* requirement above. *2* is even harder to implement in simple scripts, so that's out. And *3* is a full blown disaster, which would be so invasive that we might as well use shared libraries if we're going to use this. So that's out. What all of this points to is the fact that persistent filters are not going to wind up being a general thing available for all filter types. I'm going to implement specifically email filters using it, and it's going to have a domain specific encoding scheme: * the filter receives the email address on one line * the filter receives the data to filter on the next line * the filter then spits out its filtered data on a single line This specificity is obviously unsuitable for any multiline filtering or filtering of binary data. But it is simple enough to implement in scripts that I'm fine with it. It will require these changes: *a* Allowing persistent filter processes, with proper start-up / tear-down times and pipe preservation (already implemented in jd/longfilters) *b* Not dup2()ing the pipe to stdin/stdout, so that the filter close function can read from the pipe itself, and block until it receives its output (which is a bit of a different way of doing things from how we're doing it now) I'm not too pumped about *b*, but that's the only way unless we're to use signals or some other OOB mechanism. I'll code this up and report back.