* Does nnweb with Google work any more? @ 2012-02-25 0:11 Lars Magne Ingebrigtsen 2012-02-25 16:44 ` Reiner Steib 0 siblings, 1 reply; 15+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-02-25 0:11 UTC (permalink / raw) To: ding Has the Google Groups thing where you could request stuff from http://www.google.com/groups?as_umsgid=%s&hl=en&dmode=source totally gone away now? I can't get it to work, at least... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-02-25 0:11 Does nnweb with Google work any more? Lars Magne Ingebrigtsen @ 2012-02-25 16:44 ` Reiner Steib 2012-03-10 0:55 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 15+ messages in thread From: Reiner Steib @ 2012-02-25 16:44 UTC (permalink / raw) To: ding [-- Attachment #1: Type: text/plain, Size: 623 bytes --] On Sat, Feb 25 2012, Lars Magne Ingebrigtsen wrote: > Has the Google Groups thing where you could request stuff from > > http://www.google.com/groups?as_umsgid=%s&hl=en&dmode=source > > totally gone away now? I can't get it to work, at least... I have a modified version of nnweb.el which supports MID searching via http://howardk.freenix.org/ (which now redirects to http://al.howardknight.net/). The code was last modified and tested in 2010, so you might need to adjust it. And I am not sure if all hunks in the diff are relevant, because I didn't have enough spare time to bring my diffs from the old cvs to git. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: rs-nnweb-howardknight.patch --] [-- Type: text/x-diff, Size: 1917 bytes --] --- nnweb.el 2012-01-07 19:25:08.000000000 +0100 +++ nnweb.el 2010-01-16 13:31:22.000000000 +0100 @@ -70,6 +71,11 @@ (address . "http://groups.google.com/groups") (base . "http://groups.google.com") (identifier . nnweb-google-identity)) + (howardk + (id . "http://howardk.freenix.org/msgid.cgi?STYPE=msgid&MSGI=<%s>&GOOGLE=on") + (article . nnweb-howardk-wash-article) + (reference . identity) + (identifier . nnweb-howardk-identity)) (gmane (article . nnweb-gmane-wash-article) (id . "http://gmane.org/view.php?group=%s") @@ -296,7 +308,27 @@ ;;; groups.google.com ;;; +;; Updated for Google's changed interface 2008-11 (defun nnweb-google-wash-article () + (let ((case-fold-search t) url) + (goto-char (point-min)) + (if (or (re-search-forward "The requested message.*could not be found." + nil t) + (re-search-forward + (concat "href=\"\\(/group/[^/]+/msg/[[:alnum:]]+" + "\\?dmode=source\\)\">Show original</a>") nil t)) + (setq url (format "%s%s&output=gplain" + (nnweb-definition 'base) (match-string 1))) + (gnus-message 3 "Requested article not found")) + (gnus-message 9 "URL: %s" url) + (erase-buffer) + (mm-with-unibyte-current-buffer + (mm-url-insert-file-contents url)) + (unless (re-search-forward "^Message-ID:") + (gnus-message 3 "Requested article not found") + (erase-buffer)))) + +(defun nnweb-howardk-wash-article () ;; We have Google's masked e-mail addresses here. :-/ (let ((case-fold-search t) (start-re "<pre>[\r\n ]*") @@ -305,6 +337,7 @@ (if (save-excursion (or (re-search-forward "The requested message.*could not be found." nil t) + (re-search-forward "Couldn't find article" nil t) (not (and (re-search-forward start-re nil t) (re-search-forward end-re nil t))))) ;; FIXME: Don't know how to indicate "not found". [-- Attachment #3: Type: text/plain, Size: 114 bytes --] Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-02-25 16:44 ` Reiner Steib @ 2012-03-10 0:55 ` Lars Magne Ingebrigtsen 2012-03-10 11:24 ` David Engster 0 siblings, 1 reply; 15+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-10 0:55 UTC (permalink / raw) To: ding Reiner Steib <reinersteib+gmane@imap.cc> writes: > I have a modified version of nnweb.el which supports MID searching via > http://howardk.freenix.org/ (which now redirects to > http://al.howardknight.net/). Interesting. So this means that Google still allows looking stuff up by Message-ID, somehow? Or does al.howardknight.net somehow have a back door into Google's index? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-10 0:55 ` Lars Magne Ingebrigtsen @ 2012-03-10 11:24 ` David Engster 2012-03-10 12:11 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 15+ messages in thread From: David Engster @ 2012-03-10 11:24 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: ding Lars Magne Ingebrigtsen writes: > Reiner Steib <reinersteib+gmane@imap.cc> writes: > >> I have a modified version of nnweb.el which supports MID searching via >> http://howardk.freenix.org/ (which now redirects to >> http://al.howardknight.net/). > > Interesting. So this means that Google still allows looking stuff up by > Message-ID, somehow? It seems it just has slightly changed to http://groups.google.com/groups/search?as_umsgid= (You have to provide the message-id without angular brackets.) -David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-10 11:24 ` David Engster @ 2012-03-10 12:11 ` Lars Magne Ingebrigtsen 2012-03-10 13:23 ` David Engster 0 siblings, 1 reply; 15+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-10 12:11 UTC (permalink / raw) To: ding David Engster <deng@randomsample.de> writes: > It seems it just has slightly changed to > > http://groups.google.com/groups/search?as_umsgid= It doesn't seem to give me the message in question. For instance, this should return Robert Bannister's message: http://groups.google.com/groups/search?as_umsgid=9rvfhjFqp7U2%40mid.individual.net But it just gives me a pointer to the entire thread the message appeared in, apparently? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-10 12:11 ` Lars Magne Ingebrigtsen @ 2012-03-10 13:23 ` David Engster 2012-03-10 16:07 ` Andreas Schwab 0 siblings, 1 reply; 15+ messages in thread From: David Engster @ 2012-03-10 13:23 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: ding Lars Magne Ingebrigtsen writes: > David Engster <deng@randomsample.de> writes: > >> It seems it just has slightly changed to >> >> http://groups.google.com/groups/search?as_umsgid= > > It doesn't seem to give me the message in question. For instance, this > should return Robert Bannister's message: > > http://groups.google.com/groups/search?as_umsgid=9rvfhjFqp7U2%40mid.individual.net > > But it just gives me a pointer to the entire thread the message appeared > in, apparently? Yes, but at least it gives you an anchor to the message in question, in this case: http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e which refers to the <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a> in the sources, which is where his message begins. I'm not thrilled, either, but I guess that's all we'll get from them. -David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-10 13:23 ` David Engster @ 2012-03-10 16:07 ` Andreas Schwab 2012-03-10 17:04 ` David Engster 2012-03-14 15:09 ` Lars Magne Ingebrigtsen 0 siblings, 2 replies; 15+ messages in thread From: Andreas Schwab @ 2012-03-10 16:07 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: ding David Engster <deng@randomsample.de> writes: > Yes, but at least it gives you an anchor to the message in > question, in this case: > > http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e > > which refers to the > > <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a> > > in the sources, which is where his message begins. You can then get the original article with <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source> Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-10 16:07 ` Andreas Schwab @ 2012-03-10 17:04 ` David Engster 2012-03-14 15:09 ` Lars Magne Ingebrigtsen 1 sibling, 0 replies; 15+ messages in thread From: David Engster @ 2012-03-10 17:04 UTC (permalink / raw) To: Andreas Schwab; +Cc: Lars Magne Ingebrigtsen, ding Andreas Schwab writes: > David Engster <deng@randomsample.de> writes: > >> Yes, but at least it gives you an anchor to the message in >> question, in this case: >> >> http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e >> >> which refers to the >> >> <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a> >> >> in the sources, which is where his message begins. > > You can then get the original article with > <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source> Yep, thanks. I should've looked at the options panel... -David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-10 16:07 ` Andreas Schwab 2012-03-10 17:04 ` David Engster @ 2012-03-14 15:09 ` Lars Magne Ingebrigtsen 2012-03-14 15:24 ` David Engster ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-14 15:09 UTC (permalink / raw) To: Andreas Schwab; +Cc: ding Andreas Schwab <schwab@linux-m68k.org> writes: > You can then get the original article with > <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source> Using curl on that URL just gives me: <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server. So I guess one would have to emulate the Google sign-on/cookie thing to get this to work. Google sucks these days. But what to do with nnweb? Remove the Google stuff? Replace it with the howardk stuff? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-14 15:09 ` Lars Magne Ingebrigtsen @ 2012-03-14 15:24 ` David Engster 2012-03-14 15:28 ` Lars Magne Ingebrigtsen 2012-03-14 15:33 ` James Cloos 2012-03-14 16:40 ` Andreas Schwab 2 siblings, 1 reply; 15+ messages in thread From: David Engster @ 2012-03-14 15:24 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: Andreas Schwab, ding Lars Magne Ingebrigtsen writes: > Andreas Schwab <schwab@linux-m68k.org> writes: > >> You can then get the original article with >> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source> > > Using curl on that URL just gives me: > > <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server. > > So I guess one would have to emulate the Google sign-on/cookie thing to > get this to work. No, they just don't want to be crawled. A simple "-A foobar" will make it work. Also, adding "&output=gplain" will give raw text. -David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-14 15:24 ` David Engster @ 2012-03-14 15:28 ` Lars Magne Ingebrigtsen 2012-03-14 17:06 ` David Engster 0 siblings, 1 reply; 15+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-14 15:28 UTC (permalink / raw) To: ding David Engster <deng@randomsample.de> writes: > No, they just don't want to be crawled. A simple "-A foobar" will make > it work. Also, adding "&output=gplain" will give raw text. Oh, nice. :-) curl -A foobar 'http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source&output=gplain' works fine. Then the only question is how to get from the Message-ID to the Google ID. Let's see... the first URL had this snippet in the HTML: Michael Stemper wrote: In article<9rt27vF38...@mid.individual.net>, <b>...</b></span><br><span class="a">http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e</span> Will there only be one of these URLs in the output? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-14 15:28 ` Lars Magne Ingebrigtsen @ 2012-03-14 17:06 ` David Engster 2012-03-22 20:40 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 15+ messages in thread From: David Engster @ 2012-03-14 17:06 UTC (permalink / raw) To: ding Lars Magne Ingebrigtsen writes: > David Engster <deng@randomsample.de> writes: > >> No, they just don't want to be crawled. A simple "-A foobar" will make >> it work. Also, adding "&output=gplain" will give raw text. > > Oh, nice. :-) > > curl -A foobar 'http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source&output=gplain' > > works fine. > > Then the only question is how to get from the Message-ID to the Google > ID. Let's see... the first URL had this snippet in the HTML: > > Michael Stemper wrote: In article<9rt27vF38...@mid.individual.net>, <b>...</b></span><br><span class="a">http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e</span> > > Will there only be one of these URLs in the output? No idea. Maybe it would be safer to snarf the q=#eeb018... anchor from the title's target: <div class="g" align=left><a href="http://www.google.com/url?url=http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e%3Fq%3D%23eeb018dcf3c1688e&ei=WM1gT-vtOdTw_Aah-IjTDg&sa=t&ct=res&cd=1&source=groups&usg=AFQjCNGv6gXQ4vTjK4dTlhZQfwmCOamYKw" target="" dir=ltr> Anyway, it's a pretty boring scavenger hunt. -David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-14 17:06 ` David Engster @ 2012-03-22 20:40 ` Lars Magne Ingebrigtsen 0 siblings, 0 replies; 15+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-22 20:40 UTC (permalink / raw) To: ding David Engster <deng@randomsample.de> writes: > Anyway, it's a pretty boring scavenger hunt. Yes, I feel the ennui washing over me every time I consider fixing up this... If anybody else feels like fixing up the Google nnweb implementation... please do! :-) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-14 15:09 ` Lars Magne Ingebrigtsen 2012-03-14 15:24 ` David Engster @ 2012-03-14 15:33 ` James Cloos 2012-03-14 16:40 ` Andreas Schwab 2 siblings, 0 replies; 15+ messages in thread From: James Cloos @ 2012-03-14 15:33 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: Andreas Schwab, ding >>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes: LMI> Andreas Schwab <schwab@linux-m68k.org> writes: >> You can then get the original article with >> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source> LMI> Using curl on that URL just gives me: LMI> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server. Adding --user-agent Mozilla/5.0 works. They must filter some specific user agents. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more? 2012-03-14 15:09 ` Lars Magne Ingebrigtsen 2012-03-14 15:24 ` David Engster 2012-03-14 15:33 ` James Cloos @ 2012-03-14 16:40 ` Andreas Schwab 2 siblings, 0 replies; 15+ messages in thread From: Andreas Schwab @ 2012-03-14 16:40 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: ding Lars Magne Ingebrigtsen <larsi@gnus.org> writes: > Andreas Schwab <schwab@linux-m68k.org> writes: > >> You can then get the original article with >> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source> > > Using curl on that URL just gives me: > > <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server. > > So I guess one would have to emulate the Google sign-on/cookie thing to > get this to work. You just need the right User-Agent. :-) Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-03-22 20:40 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-25 0:11 Does nnweb with Google work any more? Lars Magne Ingebrigtsen 2012-02-25 16:44 ` Reiner Steib 2012-03-10 0:55 ` Lars Magne Ingebrigtsen 2012-03-10 11:24 ` David Engster 2012-03-10 12:11 ` Lars Magne Ingebrigtsen 2012-03-10 13:23 ` David Engster 2012-03-10 16:07 ` Andreas Schwab 2012-03-10 17:04 ` David Engster 2012-03-14 15:09 ` Lars Magne Ingebrigtsen 2012-03-14 15:24 ` David Engster 2012-03-14 15:28 ` Lars Magne Ingebrigtsen 2012-03-14 17:06 ` David Engster 2012-03-22 20:40 ` Lars Magne Ingebrigtsen 2012-03-14 15:33 ` James Cloos 2012-03-14 16:40 ` Andreas Schwab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).