* Does nnweb with Google work any more?
@ 2012-02-25 0:11 Lars Magne Ingebrigtsen
2012-02-25 16:44 ` Reiner Steib
0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-02-25 0:11 UTC (permalink / raw)
To: ding
Has the Google Groups thing where you could request stuff from
http://www.google.com/groups?as_umsgid=%s&hl=en&dmode=source
totally gone away now? I can't get it to work, at least...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-02-25 0:11 Does nnweb with Google work any more? Lars Magne Ingebrigtsen
@ 2012-02-25 16:44 ` Reiner Steib
2012-03-10 0:55 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 15+ messages in thread
From: Reiner Steib @ 2012-02-25 16:44 UTC (permalink / raw)
To: ding
[-- Attachment #1: Type: text/plain, Size: 623 bytes --]
On Sat, Feb 25 2012, Lars Magne Ingebrigtsen wrote:
> Has the Google Groups thing where you could request stuff from
>
> http://www.google.com/groups?as_umsgid=%s&hl=en&dmode=source
>
> totally gone away now? I can't get it to work, at least...
I have a modified version of nnweb.el which supports MID searching via
http://howardk.freenix.org/ (which now redirects to
http://al.howardknight.net/).
The code was last modified and tested in 2010, so you might need to
adjust it. And I am not sure if all hunks in the diff are relevant,
because I didn't have enough spare time to bring my diffs from the old
cvs to git.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rs-nnweb-howardknight.patch --]
[-- Type: text/x-diff, Size: 1917 bytes --]
--- nnweb.el 2012-01-07 19:25:08.000000000 +0100
+++ nnweb.el 2010-01-16 13:31:22.000000000 +0100
@@ -70,6 +71,11 @@
(address . "http://groups.google.com/groups")
(base . "http://groups.google.com")
(identifier . nnweb-google-identity))
+ (howardk
+ (id . "http://howardk.freenix.org/msgid.cgi?STYPE=msgid&MSGI=<%s>&GOOGLE=on")
+ (article . nnweb-howardk-wash-article)
+ (reference . identity)
+ (identifier . nnweb-howardk-identity))
(gmane
(article . nnweb-gmane-wash-article)
(id . "http://gmane.org/view.php?group=%s")
@@ -296,7 +308,27 @@
;;; groups.google.com
;;;
+;; Updated for Google's changed interface 2008-11
(defun nnweb-google-wash-article ()
+ (let ((case-fold-search t) url)
+ (goto-char (point-min))
+ (if (or (re-search-forward "The requested message.*could not be found."
+ nil t)
+ (re-search-forward
+ (concat "href=\"\\(/group/[^/]+/msg/[[:alnum:]]+"
+ "\\?dmode=source\\)\">Show original</a>") nil t))
+ (setq url (format "%s%s&output=gplain"
+ (nnweb-definition 'base) (match-string 1)))
+ (gnus-message 3 "Requested article not found"))
+ (gnus-message 9 "URL: %s" url)
+ (erase-buffer)
+ (mm-with-unibyte-current-buffer
+ (mm-url-insert-file-contents url))
+ (unless (re-search-forward "^Message-ID:")
+ (gnus-message 3 "Requested article not found")
+ (erase-buffer))))
+
+(defun nnweb-howardk-wash-article ()
;; We have Google's masked e-mail addresses here. :-/
(let ((case-fold-search t)
(start-re "<pre>[\r\n ]*")
@@ -305,6 +337,7 @@
(if (save-excursion
(or (re-search-forward "The requested message.*could not be found."
nil t)
+ (re-search-forward "Couldn't find article" nil t)
(not (and (re-search-forward start-re nil t)
(re-search-forward end-re nil t)))))
;; FIXME: Don't know how to indicate "not found".
[-- Attachment #3: Type: text/plain, Size: 114 bytes --]
Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-02-25 16:44 ` Reiner Steib
@ 2012-03-10 0:55 ` Lars Magne Ingebrigtsen
2012-03-10 11:24 ` David Engster
0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-10 0:55 UTC (permalink / raw)
To: ding
Reiner Steib <reinersteib+gmane@imap.cc> writes:
> I have a modified version of nnweb.el which supports MID searching via
> http://howardk.freenix.org/ (which now redirects to
> http://al.howardknight.net/).
Interesting. So this means that Google still allows looking stuff up by
Message-ID, somehow? Or does al.howardknight.net somehow have a
back door into Google's index?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-10 0:55 ` Lars Magne Ingebrigtsen
@ 2012-03-10 11:24 ` David Engster
2012-03-10 12:11 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-10 11:24 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: ding
Lars Magne Ingebrigtsen writes:
> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>
>> I have a modified version of nnweb.el which supports MID searching via
>> http://howardk.freenix.org/ (which now redirects to
>> http://al.howardknight.net/).
>
> Interesting. So this means that Google still allows looking stuff up by
> Message-ID, somehow?
It seems it just has slightly changed to
http://groups.google.com/groups/search?as_umsgid=
(You have to provide the message-id without angular brackets.)
-David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-10 11:24 ` David Engster
@ 2012-03-10 12:11 ` Lars Magne Ingebrigtsen
2012-03-10 13:23 ` David Engster
0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-10 12:11 UTC (permalink / raw)
To: ding
David Engster <deng@randomsample.de> writes:
> It seems it just has slightly changed to
>
> http://groups.google.com/groups/search?as_umsgid=
It doesn't seem to give me the message in question. For instance, this
should return Robert Bannister's message:
http://groups.google.com/groups/search?as_umsgid=9rvfhjFqp7U2%40mid.individual.net
But it just gives me a pointer to the entire thread the message appeared
in, apparently?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-10 12:11 ` Lars Magne Ingebrigtsen
@ 2012-03-10 13:23 ` David Engster
2012-03-10 16:07 ` Andreas Schwab
0 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-10 13:23 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: ding
Lars Magne Ingebrigtsen writes:
> David Engster <deng@randomsample.de> writes:
>
>> It seems it just has slightly changed to
>>
>> http://groups.google.com/groups/search?as_umsgid=
>
> It doesn't seem to give me the message in question. For instance, this
> should return Robert Bannister's message:
>
> http://groups.google.com/groups/search?as_umsgid=9rvfhjFqp7U2%40mid.individual.net
>
> But it just gives me a pointer to the entire thread the message appeared
> in, apparently?
Yes, but at least it gives you an anchor to the message in
question, in this case:
http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e
which refers to the
<a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a>
in the sources, which is where his message begins.
I'm not thrilled, either, but I guess that's all we'll get from them.
-David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-10 13:23 ` David Engster
@ 2012-03-10 16:07 ` Andreas Schwab
2012-03-10 17:04 ` David Engster
2012-03-14 15:09 ` Lars Magne Ingebrigtsen
0 siblings, 2 replies; 15+ messages in thread
From: Andreas Schwab @ 2012-03-10 16:07 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: ding
David Engster <deng@randomsample.de> writes:
> Yes, but at least it gives you an anchor to the message in
> question, in this case:
>
> http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e
>
> which refers to the
>
> <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a>
>
> in the sources, which is where his message begins.
You can then get the original article with
<http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-10 16:07 ` Andreas Schwab
@ 2012-03-10 17:04 ` David Engster
2012-03-14 15:09 ` Lars Magne Ingebrigtsen
1 sibling, 0 replies; 15+ messages in thread
From: David Engster @ 2012-03-10 17:04 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Lars Magne Ingebrigtsen, ding
Andreas Schwab writes:
> David Engster <deng@randomsample.de> writes:
>
>> Yes, but at least it gives you an anchor to the message in
>> question, in this case:
>>
>> http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e
>>
>> which refers to the
>>
>> <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a>
>>
>> in the sources, which is where his message begins.
>
> You can then get the original article with
> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
Yep, thanks. I should've looked at the options panel...
-David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-10 16:07 ` Andreas Schwab
2012-03-10 17:04 ` David Engster
@ 2012-03-14 15:09 ` Lars Magne Ingebrigtsen
2012-03-14 15:24 ` David Engster
` (2 more replies)
1 sibling, 3 replies; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-14 15:09 UTC (permalink / raw)
To: Andreas Schwab; +Cc: ding
Andreas Schwab <schwab@linux-m68k.org> writes:
> You can then get the original article with
> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
Using curl on that URL just gives me:
<p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.
So I guess one would have to emulate the Google sign-on/cookie thing to
get this to work.
Google sucks these days.
But what to do with nnweb? Remove the Google stuff? Replace it with
the howardk stuff?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-14 15:09 ` Lars Magne Ingebrigtsen
@ 2012-03-14 15:24 ` David Engster
2012-03-14 15:28 ` Lars Magne Ingebrigtsen
2012-03-14 15:33 ` James Cloos
2012-03-14 16:40 ` Andreas Schwab
2 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-14 15:24 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: Andreas Schwab, ding
Lars Magne Ingebrigtsen writes:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> You can then get the original article with
>> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
>
> Using curl on that URL just gives me:
>
> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.
>
> So I guess one would have to emulate the Google sign-on/cookie thing to
> get this to work.
No, they just don't want to be crawled. A simple "-A foobar" will make
it work. Also, adding "&output=gplain" will give raw text.
-David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-14 15:24 ` David Engster
@ 2012-03-14 15:28 ` Lars Magne Ingebrigtsen
2012-03-14 17:06 ` David Engster
0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-14 15:28 UTC (permalink / raw)
To: ding
David Engster <deng@randomsample.de> writes:
> No, they just don't want to be crawled. A simple "-A foobar" will make
> it work. Also, adding "&output=gplain" will give raw text.
Oh, nice. :-)
curl -A foobar 'http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source&output=gplain'
works fine.
Then the only question is how to get from the Message-ID to the Google
ID. Let's see... the first URL had this snippet in the HTML:
Michael Stemper wrote: In article<9rt27vF38...@mid.individual.net>, <b>...</b></span><br><span class="a">http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e</span>
Will there only be one of these URLs in the output?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-14 15:09 ` Lars Magne Ingebrigtsen
2012-03-14 15:24 ` David Engster
@ 2012-03-14 15:33 ` James Cloos
2012-03-14 16:40 ` Andreas Schwab
2 siblings, 0 replies; 15+ messages in thread
From: James Cloos @ 2012-03-14 15:33 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: Andreas Schwab, ding
>>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
LMI> Andreas Schwab <schwab@linux-m68k.org> writes:
>> You can then get the original article with
>> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
LMI> Using curl on that URL just gives me:
LMI> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.
Adding --user-agent Mozilla/5.0 works.
They must filter some specific user agents.
-JimC
--
James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-14 15:09 ` Lars Magne Ingebrigtsen
2012-03-14 15:24 ` David Engster
2012-03-14 15:33 ` James Cloos
@ 2012-03-14 16:40 ` Andreas Schwab
2 siblings, 0 replies; 15+ messages in thread
From: Andreas Schwab @ 2012-03-14 16:40 UTC (permalink / raw)
To: Lars Magne Ingebrigtsen; +Cc: ding
Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> You can then get the original article with
>> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
>
> Using curl on that URL just gives me:
>
> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.
>
> So I guess one would have to emulate the Google sign-on/cookie thing to
> get this to work.
You just need the right User-Agent. :-)
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-14 15:28 ` Lars Magne Ingebrigtsen
@ 2012-03-14 17:06 ` David Engster
2012-03-22 20:40 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-14 17:06 UTC (permalink / raw)
To: ding
Lars Magne Ingebrigtsen writes:
> David Engster <deng@randomsample.de> writes:
>
>> No, they just don't want to be crawled. A simple "-A foobar" will make
>> it work. Also, adding "&output=gplain" will give raw text.
>
> Oh, nice. :-)
>
> curl -A foobar 'http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source&output=gplain'
>
> works fine.
>
> Then the only question is how to get from the Message-ID to the Google
> ID. Let's see... the first URL had this snippet in the HTML:
>
> Michael Stemper wrote: In article<9rt27vF38...@mid.individual.net>, <b>...</b></span><br><span class="a">http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e</span>
>
> Will there only be one of these URLs in the output?
No idea. Maybe it would be safer to snarf the q=#eeb018... anchor from the
title's target:
<div class="g" align=left><a href="http://www.google.com/url?url=http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e%3Fq%3D%23eeb018dcf3c1688e&ei=WM1gT-vtOdTw_Aah-IjTDg&sa=t&ct=res&cd=1&source=groups&usg=AFQjCNGv6gXQ4vTjK4dTlhZQfwmCOamYKw"
target="" dir=ltr>
Anyway, it's a pretty boring scavenger hunt.
-David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Does nnweb with Google work any more?
2012-03-14 17:06 ` David Engster
@ 2012-03-22 20:40 ` Lars Magne Ingebrigtsen
0 siblings, 0 replies; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-22 20:40 UTC (permalink / raw)
To: ding
David Engster <deng@randomsample.de> writes:
> Anyway, it's a pretty boring scavenger hunt.
Yes, I feel the ennui washing over me every time I consider fixing up
this... If anybody else feels like fixing up the Google nnweb
implementation... please do! :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-03-22 20:40 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-25 0:11 Does nnweb with Google work any more? Lars Magne Ingebrigtsen
2012-02-25 16:44 ` Reiner Steib
2012-03-10 0:55 ` Lars Magne Ingebrigtsen
2012-03-10 11:24 ` David Engster
2012-03-10 12:11 ` Lars Magne Ingebrigtsen
2012-03-10 13:23 ` David Engster
2012-03-10 16:07 ` Andreas Schwab
2012-03-10 17:04 ` David Engster
2012-03-14 15:09 ` Lars Magne Ingebrigtsen
2012-03-14 15:24 ` David Engster
2012-03-14 15:28 ` Lars Magne Ingebrigtsen
2012-03-14 17:06 ` David Engster
2012-03-22 20:40 ` Lars Magne Ingebrigtsen
2012-03-14 15:33 ` James Cloos
2012-03-14 16:40 ` Andreas Schwab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).