url-retrieve parallelism

Gnus development mailing list
 help / color / mirror / Atom feed

* url-retrieve parallelism
@ 2010-12-19  0:45 Lars Magne Ingebrigtsen
  2010-12-19  2:58 ` Philipp Haselwarter
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-12-19  0:45 UTC (permalink / raw)
  To: ding

shr (and gnus-html, I guess) fire off a call to `url-retrieve' for every
<img> it finds.  If a HTML message has 1000 <img>s, then Emacs is going
to do a DOS of the poor image web server.

We obviously want to have more than a single `url-retrieve' call going
at once, but we want to rate-limit this somewhat.  To perhaps 10 at a
time?  So we need some kind of easy callback-ey interface, I think...
But I'm wondering whether to make it a totally general library, that
would be, like:

(defun concurrent (concurrency function callback callback-arguments)
  ...)

So FUNCTION would be required to return a process object (and have a
parameter list based on `url-retrieve', which seems quite sensible), and
CONCURRENT would just maintain a queue of processes, and fire off a new
one (if any) when something returns, and so on...

Would this be useful, or is my master's in Over Engineering showing
again?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  0:45 url-retrieve parallelism Lars Magne Ingebrigtsen
@ 2010-12-19  2:58 ` Philipp Haselwarter
  2010-12-19 15:38   ` Lars Magne Ingebrigtsen
  2010-12-19  8:32 ` Steinar Bang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Philipp Haselwarter @ 2010-12-19  2:58 UTC (permalink / raw)
  To: ding

As discussed here http://www.emacswiki.org/emacs/ConcurrentEmacs ?
That would be bloody ingenious.
Like, you could use this and I could have gnus refresh and get my mail
without emacs locking up? I'd definitely pay you a beer for that :)

-- 
Philipp Haselwarter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  0:45 url-retrieve parallelism Lars Magne Ingebrigtsen
  2010-12-19  2:58 ` Philipp Haselwarter
@ 2010-12-19  8:32 ` Steinar Bang
  2010-12-19  8:38   ` Steinar Bang
  2010-12-19  9:16 ` David Engster
  2010-12-19 16:50 ` Julien Danjou
  3 siblings, 1 reply; 16+ messages in thread
From: Steinar Bang @ 2010-12-19  8:32 UTC (permalink / raw)
  To: ding

>>>>> Lars Magne Ingebrigtsen <larsi@gnus.org>:

> shr (and gnus-html, I guess) fire off a call to `url-retrieve' for every
> <img> it finds.  If a HTML message has 1000 <img>s, then Emacs is going
> to do a DOS of the poor image web server.

> We obviously want to have more than a single `url-retrieve' call going
> at once,

If they are all going to the same server, I think the network friendly
thing is to pipeline them in a single HTTP connection.  Ie. fire off all
GET requests for the images without waiting for a response, and handle
them as they arrive:
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.2.2

That means that there needs to be something that listens to the incoming
byte stream and identifies what is a response header and routes it to
its handlers.

The old w3c libwww did that kind of thing, and curl does as well AFAIK.
I have no idea what (url-retrieve) does.  Hm... it has a callback
argument...?  No meaningful google matches on "url-retrieve pipeline"
though... 

Re: callbacks, the way libwww did it was to let the object representing
the request live in the system, and when a response corresponding to the
request returned, the request header obejct was linked to the response
header object.  The MIME type of the response was used to select a
handler.  Ie. it wasn't the request that determined what handler should
handle the response.  But the request object provided context, typically
so that the handler would know where to put its handled results.

I think where I'm going with this, is that just providing a callback
sounds too simple.  It's OK if you get what you want, but not if what
you get back is something other than what you asked for.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  8:32 ` Steinar Bang
@ 2010-12-19  8:38   ` Steinar Bang
  2010-12-19  9:02     ` Steinar Bang
  0 siblings, 1 reply; 16+ messages in thread
From: Steinar Bang @ 2010-12-19  8:38 UTC (permalink / raw)
  To: ding

>>>>> Steinar Bang <sb@dod.no>:

> The old w3c libwww did that kind of thing, 

Of course, modeling a lisp program on the way a C library did things, is
probably not the right thing.

But I think the way it operated was sound: a request was fired off, and
the caller basically forgot about it.  When responses arrived, a handler
was created and the handled results were put in their right place, or
errors where created, and displayed in an "error place" (eg. a log or an
error window or the minibuffer or whatever).

But handling pipelinging means being more intimate with the TCP
connection than you probably plan to be...?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  8:38   ` Steinar Bang
@ 2010-12-19  9:02     ` Steinar Bang
  2010-12-19 15:39       ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 16+ messages in thread
From: Steinar Bang @ 2010-12-19  9:02 UTC (permalink / raw)
  To: ding

One thing you could do, if curl is installed on the system:
 Start curl in a separate process, giving it all the URLs you want to
 download, and then route the curl output into an emacs buffer and parse
 it for progress information and error messages
	http://curl.haxx.se/docs/manpage.html

curl will do things the network friendly way, and pipeline if it can,
AFAIK.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  0:45 url-retrieve parallelism Lars Magne Ingebrigtsen
  2010-12-19  2:58 ` Philipp Haselwarter
  2010-12-19  8:32 ` Steinar Bang
@ 2010-12-19  9:16 ` David Engster
  2010-12-19 15:41   ` Lars Magne Ingebrigtsen
  2010-12-19 16:50 ` Julien Danjou
  3 siblings, 1 reply; 16+ messages in thread
From: David Engster @ 2010-12-19  9:16 UTC (permalink / raw)
  To: ding

Lars Magne Ingebrigtsen writes:
> But I'm wondering whether to make it a totally general library, that
> would be, like:
>
> (defun concurrent (concurrency function callback callback-arguments)
>   ...)

Maybe the transaction queue library (tq.el, ships with Emacs) could help
you with this?  It's very small and simple; EMMS uses it to
asynchronously communicate with the music player daemon (mpd), and it
works quite well (debugging becomes a nightmare, though).

-David



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  2:58 ` Philipp Haselwarter
@ 2010-12-19 15:38   ` Lars Magne Ingebrigtsen
  2011-01-19 22:20     ` Ted Zlatanov
  0 siblings, 1 reply; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-12-19 15:38 UTC (permalink / raw)
  To: ding

Philipp Haselwarter <philipp.haselwarter@gmx.de> writes:

> Like, you could use this and I could have gnus refresh and get my mail
> without emacs locking up?

No, not really.  This is just about scheduling external
processes/sockets.  Emacs doesn't multitask.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  9:02     ` Steinar Bang
@ 2010-12-19 15:39       ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-12-19 15:39 UTC (permalink / raw)
  To: ding

Steinar Bang <sb@dod.no> writes:

> One thing you could do, if curl is installed on the system:

I'd rather not rely on external programs.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  9:16 ` David Engster
@ 2010-12-19 15:41   ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-12-19 15:41 UTC (permalink / raw)
  To: ding

David Engster <deng@randomsample.de> writes:

> Maybe the transaction queue library (tq.el, ships with Emacs) could help
> you with this?  It's very small and simple; EMMS uses it to
> asynchronously communicate with the music player daemon (mpd), and it
> works quite well (debugging becomes a nightmare, though).

tq seems to be about queueing chatter to an external process.  That's
doesn't quite apply to what `url-retrieve' does.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19  0:45 url-retrieve parallelism Lars Magne Ingebrigtsen
                   ` (2 preceding siblings ...)
  2010-12-19  9:16 ` David Engster
@ 2010-12-19 16:50 ` Julien Danjou
  2010-12-19 17:01   ` Lars Magne Ingebrigtsen
  3 siblings, 1 reply; 16+ messages in thread
From: Julien Danjou @ 2010-12-19 16:50 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 320 bytes --]

On Sun, Dec 19 2010, Lars Magne Ingebrigtsen wrote:

> Would this be useful, or is my master's in Over Engineering showing
> again?

Sounds useful and a bit Over Engineering/paranoiac too, but I do not see
any other use-case than url-retrieve. Would be there?

-- 
Julien Danjou
❱ http://julien.danjou.info

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19 16:50 ` Julien Danjou
@ 2010-12-19 17:01   ` Lars Magne Ingebrigtsen
  2010-12-21  1:22     ` Katsumi Yamaoka
  2011-01-02  6:53     ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-12-19 17:01 UTC (permalink / raw)
  To: ding

Julien Danjou <julien@danjou.info> writes:

> Sounds useful and a bit Over Engineering/paranoiac too, but I do not see
> any other use-case than url-retrieve. Would be there?

That's what I'm wondering...  xargs har -P max-procs, so somebody is
doing something in parallel, but I can't really think of a use case.

So perhaps this should just go into url.el.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19 17:01   ` Lars Magne Ingebrigtsen
@ 2010-12-21  1:22     ` Katsumi Yamaoka
  2010-12-21  1:33       ` Lars Magne Ingebrigtsen
  2011-01-02  6:53     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 16+ messages in thread
From: Katsumi Yamaoka @ 2010-12-21  1:22 UTC (permalink / raw)
  To: ding

Lars Magne Ingebrigtsen wrote:
> So perhaps this should just go into url.el.

Please note that XEmacs' url.el is very old and won't be updated.
Cf. http://article.gmane.org/gmane.emacs.gnus.general/72704

Currently it's not used but I get:

Compiling gnus/lisp/gravatar.el...
While compiling the end of the data in file gnus/lisp/gravatar.el:
  ** the function url-retrieve-synchronously is not known to be defined.
Wrote gnus/lisp/gravatar.elc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-21  1:22     ` Katsumi Yamaoka
@ 2010-12-21  1:33       ` Lars Magne Ingebrigtsen
  2010-12-21  7:52         ` Robert Pluim
  0 siblings, 1 reply; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-12-21  1:33 UTC (permalink / raw)
  To: ding

Katsumi Yamaoka <yamaoka@jpl.org> writes:

> Please note that XEmacs' url.el is very old and won't be updated.

Oh, darn.  I had forgotten about that...

I've been reading the XEmacs mailing list, and it seems like they're
making progress with their conversion to GPLv3, so you'd think they'd be
gearing up to just import stuff like url.el from Emacs now?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-21  1:33       ` Lars Magne Ingebrigtsen
@ 2010-12-21  7:52         ` Robert Pluim
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Pluim @ 2010-12-21  7:52 UTC (permalink / raw)
  To: ding

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>
>> Please note that XEmacs' url.el is very old and won't be updated.
>
> Oh, darn.  I had forgotten about that...
>
> I've been reading the XEmacs mailing list, and it seems like they're
> making progress with their conversion to GPLv3, so you'd think they'd be
> gearing up to just import stuff like url.el from Emacs now?

Yes. There's various bits of code for tls and xml access written by some
larsi guy that are prime candidates for snarfing as well.

Robert




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19 17:01   ` Lars Magne Ingebrigtsen
  2010-12-21  1:22     ` Katsumi Yamaoka
@ 2011-01-02  6:53     ` Lars Magne Ingebrigtsen
  1 sibling, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-01-02  6:53 UTC (permalink / raw)
  To: ding

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> So perhaps this should just go into url.el.

Thinking about this a bit more, I think it might make sense to try to
make this be a general run-in-parallel package, even though url fetching
is the only use case at present.  Especially since this has to be
supported across many {X,}Emacs versions.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: url-retrieve parallelism
  2010-12-19 15:38   ` Lars Magne Ingebrigtsen
@ 2011-01-19 22:20     ` Ted Zlatanov
  0 siblings, 0 replies; 16+ messages in thread
From: Ted Zlatanov @ 2011-01-19 22:20 UTC (permalink / raw)
  To: ding

On Sun, 19 Dec 2010 16:38:55 +0100 Lars Magne Ingebrigtsen <larsi@gnus.org> wrote: 

LMI> Philipp Haselwarter <philipp.haselwarter@gmx.de> writes:
>> Like, you could use this and I could have gnus refresh and get my mail
>> without emacs locking up?

LMI> No, not really.  This is just about scheduling external
LMI> processes/sockets.  Emacs doesn't multitask.

Tom Tromey just committed some prep code for the concurrent branch.  It
may arrive this decade.

Ted




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-01-19 22:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-19  0:45 url-retrieve parallelism Lars Magne Ingebrigtsen
2010-12-19  2:58 ` Philipp Haselwarter
2010-12-19 15:38   ` Lars Magne Ingebrigtsen
2011-01-19 22:20     ` Ted Zlatanov
2010-12-19  8:32 ` Steinar Bang
2010-12-19  8:38   ` Steinar Bang
2010-12-19  9:02     ` Steinar Bang
2010-12-19 15:39       ` Lars Magne Ingebrigtsen
2010-12-19  9:16 ` David Engster
2010-12-19 15:41   ` Lars Magne Ingebrigtsen
2010-12-19 16:50 ` Julien Danjou
2010-12-19 17:01   ` Lars Magne Ingebrigtsen
2010-12-21  1:22     ` Katsumi Yamaoka
2010-12-21  1:33       ` Lars Magne Ingebrigtsen
2010-12-21  7:52         ` Robert Pluim
2011-01-02  6:53     ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).