List for cgit developers and users
 help / color / mirror / Atom feed
* RFE: .so filters
@ 2014-01-09 21:34 Jason
  2014-01-09 22:29 ` mailings
  2014-01-09 22:58 ` john
  0 siblings, 2 replies; 21+ messages in thread
From: Jason @ 2014-01-09 21:34 UTC (permalink / raw)


Hey folks,

I'm thinking about this filtering situation w.r.t. gravatar and
potentially running multiple filters on one page. Something I've been
considering is implementing a simple dlopen() mechanism for filters,
if the filter filename starts with "soname:" or "lib:" or similar, so
as to avoid the fork()ing and exec()ing we currently have, for high
frequency filters. The idea is that first use of the filter would be
dlopen()'d, but wouldn't be dlclose()'d until the end of the
processing. This way the same function could be used over and over
again without significant penalty.

In my first thinking of this, the method of action would be the same
as the current system -- "int filter_run(int argc, char *argv[])" is
dlopen()'d, executed, and it reads and writes to the dup2()'d file
descriptor. Unfortunately, the piping in this introduces a cost that
I'd rather avoid. In the case of gravatar (or more generally, email
author filters), we'd be better off with a "char *filter_run(int argc,
char *argv[])", that can just return the string that the html
functions will then print. This, however, breaks the current filtering
paradigm, and might not be ideal for filters that enjoy a stream of
data (such as source code filters). This distinction more or less
points toward coming up with a library API of sorts, but I really
really really don't want to add a full fledged plugin system. So this
has me leaning toward the simpler first idea.

But I'm undecided at the moment. Comments and suggestions are most welcome.

Jason


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-09 21:34 RFE: .so filters Jason
@ 2014-01-09 22:29 ` mailings
  2014-01-09 22:58 ` john
  1 sibling, 0 replies; 21+ messages in thread
From: mailings @ 2014-01-09 22:29 UTC (permalink / raw)




On 09/01/14 22:34, Jason A. Donenfeld wrote:
> Hey folks,
>
> I'm thinking about this filtering situation w.r.t. gravatar and
> potentially running multiple filters on one page. Something I've been
> considering is implementing a simple dlopen() mechanism for filters,
> if the filter filename starts with "soname:" or "lib:" or similar, so
> as to avoid the fork()ing and exec()ing we currently have, for high
> frequency filters. The idea is that first use of the filter would be
> dlopen()'d, but wouldn't be dlclose()'d until the end of the
> processing. This way the same function could be used over and over
> again without significant penalty.

In olsrd we have something similar, but there we load the plugins when 
they are specified in the configuration file.

http://olsr.org/git/?p=olsrd.git;a=summary

>
> In my first thinking of this, the method of action would be the same
> as the current system -- "int filter_run(int argc, char *argv[])" is
> dlopen()'d, executed, and it reads and writes to the dup2()'d file
> descriptor. Unfortunately, the piping in this introduces a cost that
> I'd rather avoid. In the case of gravatar (or more generally, email
> author filters), we'd be better off with a "char *filter_run(int argc,
> char *argv[])", that can just return the string that the html
> functions will then print. This, however, breaks the current filtering
> paradigm, and might not be ideal for filters that enjoy a stream of
> data (such as source code filters). This distinction more or less
> points toward coming up with a library API of sorts, but I really
> really really don't want to add a full fledged plugin system. So this
> has me leaning toward the simpler first idea.
>
> But I'm undecided at the moment. Comments and suggestions are most welcome.
>
> Jason
> _______________________________________________
> CGit mailing list
> CGit at lists.zx2c4.com
> http://lists.zx2c4.com/mailman/listinfo/cgit
>

-- 
Ferry Huberts


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-09 21:34 RFE: .so filters Jason
  2014-01-09 22:29 ` mailings
@ 2014-01-09 22:58 ` john
  2014-01-10  1:41   ` Jason
  1 sibling, 1 reply; 21+ messages in thread
From: john @ 2014-01-09 22:58 UTC (permalink / raw)


On Thu, Jan 09, 2014 at 10:34:26PM +0100, Jason A. Donenfeld wrote:
> I'm thinking about this filtering situation w.r.t. gravatar and
> potentially running multiple filters on one page. Something I've been
> considering is implementing a simple dlopen() mechanism for filters,
> if the filter filename starts with "soname:" or "lib:" or similar, so
> as to avoid the fork()ing and exec()ing we currently have, for high
> frequency filters. The idea is that first use of the filter would be
> dlopen()'d, but wouldn't be dlclose()'d until the end of the
> processing. This way the same function could be used over and over
> again without significant penalty.
> 
> In my first thinking of this, the method of action would be the same
> as the current system -- "int filter_run(int argc, char *argv[])" is
> dlopen()'d, executed, and it reads and writes to the dup2()'d file
> descriptor. Unfortunately, the piping in this introduces a cost that
> I'd rather avoid. In the case of gravatar (or more generally, email
> author filters), we'd be better off with a "char *filter_run(int argc,
> char *argv[])", that can just return the string that the html
> functions will then print. This, however, breaks the current filtering
> paradigm, and might not be ideal for filters that enjoy a stream of
> data (such as source code filters). This distinction more or less
> points toward coming up with a library API of sorts, but I really
> really really don't want to add a full fledged plugin system. So this
> has me leaning toward the simpler first idea.
> 
> But I'm undecided at the moment. Comments and suggestions are most
> welcome.

That interface doesn't really match the way the current filters work.
Currently when we open a filter we replace cgit's stdout with a pipe
into the filter process, so none of the existing CGit code will work
with this interface.  We could swap out write with a function pointer
into the filter, but I don't think we guarantee that all of the data is
written in one go which makes life harder for filter writers (although
for simple cases like author info we probably could guarantee to write
it all at once).

If we allow filters to act incrementally, then we can just leave the
filter running and swap it in or out when required.  That would require
a single dup2 to make it work the same way that the filters currently
work.  Interestingly, there is an "htmlfd" variable in html.c but it is
never changed from STDOUT_FILENO; I wonder if that can be used or are
there other places (possibly in libgit.a code) that just use stdout, in
which case we should remove that variable.  But there is the problem of
terminating the response; Lukas' suggestion of using NUL for that may be
the best, it's not that hard to printf '\0' in shell.

OTOH, the particular case of author details the input is more clearly
defined than items for which we currently provide filters, so maybe it
could use a different interface.

One final point (although I don't think you're suggesting this) is that
we shouldn't require shared objects; I think scripts using stdin+stdout
are a much simpler interface and provides a much lower barrier to entry,
not least because the range of languages that can be used to implement
the filters is so much greater.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-09 22:58 ` john
@ 2014-01-10  1:41   ` Jason
  2014-01-10  2:11     ` Jason
  0 siblings, 1 reply; 21+ messages in thread
From: Jason @ 2014-01-10  1:41 UTC (permalink / raw)


On Thu, Jan 9, 2014 at 11:58 PM, John Keeping <john at keeping.me.uk> wrote:
>
> That interface doesn't really match the way the current filters work.

Yes, hence the post.

>  We could swap out write with a function pointer
> into the filter, but I don't think we guarantee that all of the data is
> written in one go which makes life harder for filter writers (although
> for simple cases like author info we probably could guarantee to write
> it all at once).

I'm actually contemplating doing just this; it would be the only sane
way to reap the benefits of shared objects. Stdio will ensure that the
writes occur once per line, since stdout has line buffering enabled.
This is reasonable enough behavior.

> If we allow filters to act incrementally, then we can just leave the
> filter running and swap it in or out when required.  That would require
> a single dup2 to make it work the same way that the filters currently
> work.

So the other thing I've been considering is doing something like this,
but without shared objects. The idea is -- we exec the filter once.
The first N lines (or instead of \n we use \0 - same difference)
contain the arguments. Then the filter runs, using those arguments,
and does its thing per usual. At the end, however, it does not exit.
Instead of waitpid()ing on it in close filter, we SIGSTOP it, put the
fds back in place, etc. Then the next time that filter is called, we
SIGCONT it, it reads the first N lines as arguments again, and so
forth. I'm most tempted to go with this approach at the moment.

> I think scripts using stdin+stdout
> are a much simpler interface and provides a much lower barrier to entry,
> not least because the range of languages that can be used to implement
> the filters is so much greater.

Yea. I'm leaning toward the SIGSTOP solution with normal scripts... We
could do a megafast shared object plugin architecture, but it seems
like that's needlessly complicated right now...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10  1:41   ` Jason
@ 2014-01-10  2:11     ` Jason
  2014-01-10  4:26       ` Jason
  2014-01-10  9:06       ` john
  0 siblings, 2 replies; 21+ messages in thread
From: Jason @ 2014-01-10  2:11 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 2:41 AM, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> and does its thing per usual. At the end, however, it does not exit.
> Instead of waitpid()ing on it in close filter, we SIGSTOP it, put the
> fds back in place, etc. Then the next time that filter is called, we
> SIGCONT it, it reads the first N lines as arguments again, and so
> forth. I'm most tempted to go with this approach at the moment.

Problems abound. This has race condition issues, where the parent
process will SIGSTOP the child before the child can write its output.
This could be fixed with a more complicated signaling protocol, but
that's more complex than I'd like it. Back to the drawing board.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10  2:11     ` Jason
@ 2014-01-10  4:26       ` Jason
  2014-01-10  9:06       ` john
  1 sibling, 0 replies; 21+ messages in thread
From: Jason @ 2014-01-10  4:26 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 3:11 AM, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> Problems abound. This has race condition issues, where the parent
> process will SIGSTOP the child before the child can write its output.
> This could be fixed with a more complicated signaling protocol, but
> that's more complex than I'd like it. Back to the drawing board.

Okay I've got it sorted. Development has begun in jd/longfilters.
I'll send the patchset to this list to solicit some opinions when I'm
done.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10  2:11     ` Jason
  2014-01-10  4:26       ` Jason
@ 2014-01-10  9:06       ` john
  2014-01-10 15:57         ` Jason
  1 sibling, 1 reply; 21+ messages in thread
From: john @ 2014-01-10  9:06 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 03:11:54AM +0100, Jason A. Donenfeld wrote:
> On Fri, Jan 10, 2014 at 2:41 AM, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> > and does its thing per usual. At the end, however, it does not exit.
> > Instead of waitpid()ing on it in close filter, we SIGSTOP it, put the
> > fds back in place, etc. Then the next time that filter is called, we
> > SIGCONT it, it reads the first N lines as arguments again, and so
> > forth. I'm most tempted to go with this approach at the moment.
> 
> Problems abound. This has race condition issues, where the parent
> process will SIGSTOP the child before the child can write its output.
> This could be fixed with a more complicated signaling protocol, but
> that's more complex than I'd like it. Back to the drawing board.

This seems drastically over complicated.  Why don't we just have
something like this:

    install_filter:
        if filter_running?
            dup2(filter_stdin, STDOUT_FILENO)
        else
            open_filter

    uninstall_filter:
        read until NUL or EOF

Then the filter just sits waiting for data on stdin and we don't need to
stop it at all.  It does complicate things slightly from where we are
because we can't just let the filters writes to stdout go straight to
our (real) stdout but instead we'll have to read data from it.

Annoyingly, although it is probably good enough in this case, we can't
just do the read in uninstall_filter in case we get to a deadlock where
we want to write to the filter but it's waiting for us to read its
output.  I suspect that means we'd need a thread to do the reading and
set a condition variable when it sees NUL or EOF.

I'd rather put that complexity in CGit and make the filter processes
really simple though - it's better to do the complex bit once than many
times!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10  9:06       ` john
@ 2014-01-10 15:57         ` Jason
  2014-01-10 17:12           ` bluewind
  0 siblings, 1 reply; 21+ messages in thread
From: Jason @ 2014-01-10 15:57 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 10:06 AM, John Keeping <john at keeping.me.uk> wrote:
>
> This seems drastically over complicated.

So here's the situation. There's a lot of "state" that we're taking
advantage of in using processes that terminate, that needs to be
replicated:

  *a* Sending arguments to the program, and distinguishing these
arguments from data [via argv in main]
  *b* When we are finished sending data to the filter [via a closed
file descriptor]
  *c* When the filter is finished sending data to cgit [via the filter
process terminating / waitpid]

If we skim on any one of these requirements, we introduce either
limited functionality or race conditions. To fully replicate these
required state transitions, we must either:

  *1* Use an out of band messaging mechanism, such as unix signals
(what I've implemented in jd/longfilters, for example)
  *2* Use two file descriptors (which then would require the filter to
select() or similar)
  *3* Come up with an encoding scheme that would separate these
messages from the data (which would then require the client to know
about it)

I don't really like any of these possibilities. I've implemented *1*
already, and while it works, it's a hassle to implement the signal
handling without races in the filter because of the *b* requirement
above. *2* is even harder to implement in simple scripts, so that's
out. And *3* is a full blown disaster, which would be so invasive that
we might as well use shared libraries if we're going to use this. So
that's out.


What all of this points to is the fact that persistent filters are not
going to wind up being a general thing available for all filter types.
I'm going to implement specifically email filters using it, and it's
going to have a domain specific encoding scheme:

  * the filter receives the email address on one line
  * the filter receives the data to filter on the next line
  * the filter then spits out its filtered data on a single line

This specificity is obviously unsuitable for any multiline filtering
or filtering of binary data. But it is simple enough to implement in
scripts that I'm fine with it.

It will require these changes:

  *a* Allowing persistent filter processes, with proper start-up /
tear-down times and pipe preservation (already implemented in
jd/longfilters)
  *b* Not dup2()ing the pipe to stdin/stdout, so that the filter close
function can read from the pipe itself, and block until it receives
its output (which is a bit of a different way of doing things from how
we're doing it now)

I'm not too pumped about *b*, but that's the only way unless we're to
use signals or some other OOB mechanism. I'll code this up and report
back.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 15:57         ` Jason
@ 2014-01-10 17:12           ` bluewind
  2014-01-10 17:20             ` john
  2014-01-10 17:57             ` Jason
  0 siblings, 2 replies; 21+ messages in thread
From: bluewind @ 2014-01-10 17:12 UTC (permalink / raw)


On 10.01.2014 16:57, Jason A. Donenfeld wrote:
> On Fri, Jan 10, 2014 at 10:06 AM, John Keeping <john at keeping.me.uk> wrote:
>>
>> This seems drastically over complicated.
> 
> So here's the situation. There's a lot of "state" that we're taking
> advantage of in using processes that terminate, that needs to be
> replicated:

Isn't this (fast scripting with lots of features) when people normally
start using lua?

Having said that, I haven't used lua yet so don't expect any code from
me, but this example [1] looks rather simple.

[1]: http://lua-users.org/wiki/SimpleLuaApiExample

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20140110/3f9c5264/attachment.asc>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 17:12           ` bluewind
@ 2014-01-10 17:20             ` john
  2014-01-10 17:43               ` mricon
  2014-01-10 18:00               ` Jason
  2014-01-10 17:57             ` Jason
  1 sibling, 2 replies; 21+ messages in thread
From: john @ 2014-01-10 17:20 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 06:12:25PM +0100, Florian Pritz wrote:
> On 10.01.2014 16:57, Jason A. Donenfeld wrote:
> > On Fri, Jan 10, 2014 at 10:06 AM, John Keeping <john at keeping.me.uk> wrote:
> >>
> >> This seems drastically over complicated.
> > 
> > So here's the situation. There's a lot of "state" that we're taking
> > advantage of in using processes that terminate, that needs to be
> > replicated:
> 
> Isn't this (fast scripting with lots of features) when people normally
> start using lua?

I would have no problem including LuaJIT for this sort of thing.  There
was even a PoC for using Lua to format Git log messages a year or so
ago.

I was also wondering if supporting "unix:/path/to/socket" would be
useful, then the filter would connect on a Unix socket, run and
disconnect, on the assumption that the administrator has a daemon
running to do the filtering.

If we're introducing this "<type>:<spec>" support then it would be good
to do it in a reasonably generic way so that any types that add new
dependencies can be compiled out easily.  Maybe a table like this?

struct filter_handler handlers[] = {
    { "unix", open_unix_socket_filter, close_unix_socket_filter },
    { "persistent", "open_persistent_filter, close_persistent_filter },
#ifndef NO_LUA
    { "lua", open_lua_filter, close_lua_filter },
#endif
};

I might have a look at the Lua case over the weekend if no one beats me
to it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 17:20             ` john
@ 2014-01-10 17:43               ` mricon
  2014-01-10 18:00                 ` Jason
  2014-01-10 18:00               ` Jason
  1 sibling, 1 reply; 21+ messages in thread
From: mricon @ 2014-01-10 17:43 UTC (permalink / raw)


On 10/01/14 12:20 PM, John Keeping wrote:
> I was also wondering if supporting "unix:/path/to/socket" would be
> useful, then the filter would connect on a Unix socket, run and
> disconnect, on the assumption that the administrator has a daemon
> running to do the filtering.

As an administrator, I would be very reluctant from having to use this
mechanism. Administrators generally hate daemons. :)

-- 
Konstantin Ryabitsev
Senior Systems Administrator
Linux Foundation Collab Projects
Montr?al, Qu?bec

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 713 bytes
Desc: OpenPGP digital signature
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20140110/edbf676c/attachment-0001.asc>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 17:12           ` bluewind
  2014-01-10 17:20             ` john
@ 2014-01-10 17:57             ` Jason
  2014-01-10 20:03               ` bluewind
  1 sibling, 1 reply; 21+ messages in thread
From: Jason @ 2014-01-10 17:57 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 6:12 PM, Florian Pritz <bluewind at xinu.at> wrote:
>
> Isn't this (fast scripting with lots of features) when people normally
> start using lua?
>

This would have the same challenges as using .so files, w.r.t. hooking
write() (or the html functions), but would be very interesting indeed,
because Lua...

Any implementation of this would probably have to work in the same way
I was thinking for the .so file. Namely, the lua script implements
event handlers for:

- filter_open(int stdout_fd, char *argv[])
- filter_write(char *data, size_t len)
- filter_close()

This seems doable. I don't know how I feel about the added size of
doing this yet, but it is enticing enough to entertain and see how the
idea develops.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 17:20             ` john
  2014-01-10 17:43               ` mricon
@ 2014-01-10 18:00               ` Jason
  1 sibling, 0 replies; 21+ messages in thread
From: Jason @ 2014-01-10 18:00 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 6:20 PM, John Keeping <john at keeping.me.uk> wrote:
>
> I was also wondering if supporting "unix:/path/to/socket" would be
> useful, then the filter would connect on a Unix socket, run and
> disconnect, on the assumption that the administrator has a daemon
> running to do the filtering.

This has few benefits, and you still have the out of band signaling
issues. Sysadmins don't want to run more daemons.

>
> If we're introducing this "<type>:<spec>" support then it would be good
> to do it in a reasonably generic way so that any types that add new
> dependencies can be compiled out easily.  Maybe a table like this?
>
> struct filter_handler handlers[] = {
>     { "unix", open_unix_socket_filter, close_unix_socket_filter },
>     { "persistent", "open_persistent_filter, close_persistent_filter },
> #ifndef NO_LUA
>     { "lua", open_lua_filter, close_lua_filter },
> #endif
> };

This would make more sense. Look at the commit I just merged to master
where I split filters out into filter.c. This would be the place for
such a function pointer table.

> I might have a look at the Lua case over the weekend if no one beats me
> to it.

Cool. Please take in mind the design considerations in the email I
just sent to Florian with the tree functions... Before you begin, take
a peak at jd/gravatar and jd/persistent. Can't wait to see what you
come up with!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 17:43               ` mricon
@ 2014-01-10 18:00                 ` Jason
  0 siblings, 0 replies; 21+ messages in thread
From: Jason @ 2014-01-10 18:00 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 6:43 PM, Konstantin Ryabitsev <mricon at kernel.org> wrote:
> As an administrator, I would be very reluctant from having to use this
> mechanism. Administrators generally hate daemons. :)

Ditto.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 17:57             ` Jason
@ 2014-01-10 20:03               ` bluewind
  2014-01-10 20:11                 ` john
  2014-01-11  2:34                 ` Jason
  0 siblings, 2 replies; 21+ messages in thread
From: bluewind @ 2014-01-10 20:03 UTC (permalink / raw)


On 10.01.2014 18:57, Jason A. Donenfeld wrote:
> On Fri, Jan 10, 2014 at 6:12 PM, Florian Pritz <bluewind at xinu.at> wrote:
>>
>> Isn't this (fast scripting with lots of features) when people normally
>> start using lua?
>>
> 
> This would have the same challenges as using .so files, w.r.t. hooking
> write() (or the html functions), but would be very interesting indeed,
> because Lua...

How about using the current fork approach but instead of calling execvp
use lua. I believe forks are pretty cheap on linux, it's the exec that's
costly.

If we do it like that we could reuse stdin/stdout, we could pass
arguments via lua tables (like command line arguments now), but we
should have little overhead for the script loading/executing.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20140110/bdc10815/attachment.asc>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 20:03               ` bluewind
@ 2014-01-10 20:11                 ` john
  2014-01-10 20:25                   ` bluewind
  2014-01-11  2:34                 ` Jason
  1 sibling, 1 reply; 21+ messages in thread
From: john @ 2014-01-10 20:11 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 09:03:24PM +0100, Florian Pritz wrote:
> On 10.01.2014 18:57, Jason A. Donenfeld wrote:
> > On Fri, Jan 10, 2014 at 6:12 PM, Florian Pritz <bluewind at xinu.at> wrote:
> >>
> >> Isn't this (fast scripting with lots of features) when people normally
> >> start using lua?
> >>
> > 
> > This would have the same challenges as using .so files, w.r.t. hooking
> > write() (or the html functions), but would be very interesting indeed,
> > because Lua...
> 
> How about using the current fork approach but instead of calling execvp
> use lua. I believe forks are pretty cheap on linux, it's the exec that's
> costly.
> 
> If we do it like that we could reuse stdin/stdout, we could pass
> arguments via lua tables (like command line arguments now), but we
> should have little overhead for the script loading/executing.

Forking and using Lua in the child is an interesting idea.

I need to investigate how Lua generally deals with I/O, but it feels
like it will be simpler to use a simple function interface than deal
with slurping in the input in Lua.  So it may be simpler to swap out the
write function in CGit while the filter is active and collect the output
in a buffer instead, then call a Lua function and pass whatever comes
back from that to the real write(2).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 20:11                 ` john
@ 2014-01-10 20:25                   ` bluewind
  2014-01-10 20:36                     ` john
  0 siblings, 1 reply; 21+ messages in thread
From: bluewind @ 2014-01-10 20:25 UTC (permalink / raw)


On 10.01.2014 21:11, John Keeping wrote:
> Forking and using Lua in the child is an interesting idea.
> 
> I need to investigate how Lua generally deals with I/O, but it feels
> like it will be simpler to use a simple function interface than deal
> with slurping in the input in Lua.

Looks rather easy to slurp stdin (from http://www.lua.org/pil/21.1.html):

> t = io.read("*all")         -- read the whole file
> t = string.gsub(t, ...)     -- do the job
> io.write(t)                 -- write the file


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20140110/588c177d/attachment.asc>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 20:25                   ` bluewind
@ 2014-01-10 20:36                     ` john
  2014-01-10 20:56                       ` bluewind
  0 siblings, 1 reply; 21+ messages in thread
From: john @ 2014-01-10 20:36 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 09:25:18PM +0100, Florian Pritz wrote:
> On 10.01.2014 21:11, John Keeping wrote:
> > Forking and using Lua in the child is an interesting idea.
> > 
> > I need to investigate how Lua generally deals with I/O, but it feels
> > like it will be simpler to use a simple function interface than deal
> > with slurping in the input in Lua.
> 
> Looks rather easy to slurp stdin (from http://www.lua.org/pil/21.1.html):

Interesting.  But I think it will be simpler from both side if the
interface is just a function call:

    function filter(value)
        return value .. " some trailing data"
    end

The change on the CGit side is then quite easy, we just change the
switchable value in html.c from htmlfd to html_out_fn which has the same
signature as html_raw (which is the default implementation).  Then we
can collect output in a strbuf until it's time to call the function.

The only thing I'm not sure about is how the specification of the filter
function works, given that I don't think we can call a complete Lua
script as a function.  (I'm also assuming that the Lua script will be in
an external file and not stored inline in the CGit config).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 20:36                     ` john
@ 2014-01-10 20:56                       ` bluewind
  2014-01-11  2:37                         ` Jason
  0 siblings, 1 reply; 21+ messages in thread
From: bluewind @ 2014-01-10 20:56 UTC (permalink / raw)


On 10.01.2014 21:36, John Keeping wrote:
>> Looks rather easy to slurp stdin (from http://www.lua.org/pil/21.1.html):
> 
> Interesting.  But I think it will be simpler from both side if the
> interface is just a function call

source_filter could potentially get a rather long input and might not
need it all at once. If it can processes input line by line or similar
it makes sense to support that by writing chunks rather than buffering
everything.

I believe we currently already buffer the file (ui-tree.c print_object()
buf variable) before calling the source_filter, but keeping the
possibility to change that later is good.

Also slurping is not really that much harder than writing the function
header so I don't see a benefit in adding buffering to the cgit code.

Last but not least, it keeps the interface between "exec" and "lua"
filters the same or at least rather similar. If you can call a lua
script as if it was execed (setting argv) that would make the handler
totally transparent, but faster.

> The only thing I'm not sure about is how the specification of the filter
> function works, given that I don't think we can call a complete Lua
> script as a function.  (I'm also assuming that the Lua script will be in
> an external file and not stored inline in the CGit config).

http://www.lua.org/pil/25.2.html


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20140110/15c96233/attachment-0001.asc>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 20:03               ` bluewind
  2014-01-10 20:11                 ` john
@ 2014-01-11  2:34                 ` Jason
  1 sibling, 0 replies; 21+ messages in thread
From: Jason @ 2014-01-11  2:34 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 9:03 PM, Florian Pritz <bluewind at xinu.at> wrote:
> How about using the current fork approach but instead of calling execvp
> use lua. I believe forks are pretty cheap on linux, it's the exec that's
> costly.
>
> If we do it like that we could reuse stdin/stdout, we could pass
> arguments via lua tables (like command line arguments now), but we
> should have little overhead for the script loading/executing.
>

This is a very interesting idea. But I think it defeats a lot of the
benefits of using lua in the first place. The pipe requires copying to
and from the kernel, whereas if we did it in the same process, it's
just a more or less transfer of memory in the same space. Further,
there's no need to fork(), since what we're doing is distinctly
synchronous -- the parent process in such a fork() would simply be
wait()ing on the child to complete. So it's really not even necessary.

I think the best solution is to hook the html_raw function, which
usually calls write, into calling the lua function (referred to as
filter_write(char *data, size_t len) above). This way, the same
approach would allow other types of plugin systems easy -- .so files
or whatever else.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RFE: .so filters
  2014-01-10 20:56                       ` bluewind
@ 2014-01-11  2:37                         ` Jason
  0 siblings, 0 replies; 21+ messages in thread
From: Jason @ 2014-01-11  2:37 UTC (permalink / raw)


On Fri, Jan 10, 2014 at 9:56 PM, Florian Pritz <bluewind at xinu.at> wrote:
> Last but not least, it keeps the interface between "exec" and "lua"
> filters the same or at least rather similar. If you can call a lua
> script as if it was execed (setting argv) that would make the handler
> totally transparent, but faster.

The whole point of this thread has been avoiding forking off new
processes, or instantiating new situations for each use of the filter.
By doing a filter_open, filter_write, and filter_close, all in the
same process, we could benefit from lazy deallocation of filter
resources, and therefore keep one lua runtime and parsed script, for
the duration of several calls to opening and closing a filter.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-01-11  2:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-09 21:34 RFE: .so filters Jason
2014-01-09 22:29 ` mailings
2014-01-09 22:58 ` john
2014-01-10  1:41   ` Jason
2014-01-10  2:11     ` Jason
2014-01-10  4:26       ` Jason
2014-01-10  9:06       ` john
2014-01-10 15:57         ` Jason
2014-01-10 17:12           ` bluewind
2014-01-10 17:20             ` john
2014-01-10 17:43               ` mricon
2014-01-10 18:00                 ` Jason
2014-01-10 18:00               ` Jason
2014-01-10 17:57             ` Jason
2014-01-10 20:03               ` bluewind
2014-01-10 20:11                 ` john
2014-01-10 20:25                   ` bluewind
2014-01-10 20:36                     ` john
2014-01-10 20:56                       ` bluewind
2014-01-11  2:37                         ` Jason
2014-01-11  2:34                 ` Jason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).