Gnus development mailing list
 help / color / mirror / Atom feed
* Does nnweb with Google work any more?
@ 2012-02-25  0:11 Lars Magne Ingebrigtsen
  2012-02-25 16:44 ` Reiner Steib
  0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-02-25  0:11 UTC (permalink / raw)
  To: ding

Has the Google Groups thing where you could request stuff from

http://www.google.com/groups?as_umsgid=%s&hl=en&dmode=source

totally gone away now?  I can't get it to work, at least...

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-02-25  0:11 Does nnweb with Google work any more? Lars Magne Ingebrigtsen
@ 2012-02-25 16:44 ` Reiner Steib
  2012-03-10  0:55   ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 15+ messages in thread
From: Reiner Steib @ 2012-02-25 16:44 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

On Sat, Feb 25 2012, Lars Magne Ingebrigtsen wrote:

> Has the Google Groups thing where you could request stuff from
>
> http://www.google.com/groups?as_umsgid=%s&hl=en&dmode=source
>
> totally gone away now?  I can't get it to work, at least...

I have a modified version of nnweb.el which supports MID searching via
http://howardk.freenix.org/ (which now redirects to
http://al.howardknight.net/).

The code was last modified and tested in 2010, so you might need to
adjust it.  And I am not sure if all hunks in the diff are relevant,
because I didn't have enough spare time to bring my diffs from the old
cvs to git.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rs-nnweb-howardknight.patch --]
[-- Type: text/x-diff, Size: 1917 bytes --]

--- nnweb.el	2012-01-07 19:25:08.000000000 +0100
+++ nnweb.el	2010-01-16 13:31:22.000000000 +0100
@@ -70,6 +71,11 @@
      (address . "http://groups.google.com/groups")
      (base    . "http://groups.google.com")
      (identifier . nnweb-google-identity))
+    (howardk
+     (id . "http://howardk.freenix.org/msgid.cgi?STYPE=msgid&MSGI=<%s>&GOOGLE=on")
+     (article . nnweb-howardk-wash-article)
+     (reference . identity)
+     (identifier . nnweb-howardk-identity))
     (gmane
      (article . nnweb-gmane-wash-article)
      (id . "http://gmane.org/view.php?group=%s")
@@ -296,7 +308,27 @@
 ;;; groups.google.com
 ;;;
 
+;; Updated for Google's changed interface 2008-11
 (defun nnweb-google-wash-article ()
+  (let ((case-fold-search t) url)
+    (goto-char (point-min))
+    (if (or (re-search-forward "The requested message.*could not be found."
+			       nil t)
+	    (re-search-forward
+	     (concat "href=\"\\(/group/[^/]+/msg/[[:alnum:]]+"
+		     "\\?dmode=source\\)\">Show original</a>") nil t))
+	(setq url (format "%s%s&output=gplain"
+			  (nnweb-definition 'base) (match-string 1)))
+      (gnus-message 3 "Requested article not found"))
+    (gnus-message 9 "URL: %s" url)
+    (erase-buffer)
+    (mm-with-unibyte-current-buffer
+      (mm-url-insert-file-contents url))
+    (unless (re-search-forward "^Message-ID:")
+      (gnus-message 3 "Requested article not found")
+      (erase-buffer))))
+
+(defun nnweb-howardk-wash-article ()
   ;; We have Google's masked e-mail addresses here.  :-/
   (let ((case-fold-search t)
 	(start-re "<pre>[\r\n ]*")
@@ -305,6 +337,7 @@
     (if (save-excursion
 	  (or (re-search-forward "The requested message.*could not be found."
 				 nil t)
+	      (re-search-forward "Couldn't find article" nil t)
 	      (not (and (re-search-forward start-re nil t)
 			(re-search-forward end-re nil t)))))
 	;; FIXME: Don't know how to indicate "not found".

[-- Attachment #3: Type: text/plain, Size: 114 bytes --]


Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-02-25 16:44 ` Reiner Steib
@ 2012-03-10  0:55   ` Lars Magne Ingebrigtsen
  2012-03-10 11:24     ` David Engster
  0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-10  0:55 UTC (permalink / raw)
  To: ding

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> I have a modified version of nnweb.el which supports MID searching via
> http://howardk.freenix.org/ (which now redirects to
> http://al.howardknight.net/).

Interesting.  So this means that Google still allows looking stuff up by
Message-ID, somehow?  Or does al.howardknight.net somehow have a
back door into Google's index?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-10  0:55   ` Lars Magne Ingebrigtsen
@ 2012-03-10 11:24     ` David Engster
  2012-03-10 12:11       ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-10 11:24 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: ding

Lars Magne Ingebrigtsen writes:
> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>
>> I have a modified version of nnweb.el which supports MID searching via
>> http://howardk.freenix.org/ (which now redirects to
>> http://al.howardknight.net/).
>
> Interesting.  So this means that Google still allows looking stuff up by
> Message-ID, somehow?

It seems it just has slightly changed to

http://groups.google.com/groups/search?as_umsgid=

(You have to provide the message-id without angular brackets.)

-David



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-10 11:24     ` David Engster
@ 2012-03-10 12:11       ` Lars Magne Ingebrigtsen
  2012-03-10 13:23         ` David Engster
  0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-10 12:11 UTC (permalink / raw)
  To: ding

David Engster <deng@randomsample.de> writes:

> It seems it just has slightly changed to
>
> http://groups.google.com/groups/search?as_umsgid=

It doesn't seem to give me the message in question.  For instance, this
should return Robert Bannister's message:

http://groups.google.com/groups/search?as_umsgid=9rvfhjFqp7U2%40mid.individual.net

But it just gives me a pointer to the entire thread the message appeared
in, apparently?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-10 12:11       ` Lars Magne Ingebrigtsen
@ 2012-03-10 13:23         ` David Engster
  2012-03-10 16:07           ` Andreas Schwab
  0 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-10 13:23 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: ding

Lars Magne Ingebrigtsen writes:
> David Engster <deng@randomsample.de> writes:
>
>> It seems it just has slightly changed to
>>
>> http://groups.google.com/groups/search?as_umsgid=
>
> It doesn't seem to give me the message in question.  For instance, this
> should return Robert Bannister's message:
>
> http://groups.google.com/groups/search?as_umsgid=9rvfhjFqp7U2%40mid.individual.net
>
> But it just gives me a pointer to the entire thread the message appeared
> in, apparently?

Yes, but at least it gives you an anchor to the message in
question, in this case:

http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e

which refers to the

<a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a>

in the sources, which is where his message begins.

I'm not thrilled, either, but I guess that's all we'll get from them.

-David



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-10 13:23         ` David Engster
@ 2012-03-10 16:07           ` Andreas Schwab
  2012-03-10 17:04             ` David Engster
  2012-03-14 15:09             ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 15+ messages in thread
From: Andreas Schwab @ 2012-03-10 16:07 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: ding

David Engster <deng@randomsample.de> writes:

> Yes, but at least it gives you an anchor to the message in
> question, in this case:
>
> http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e
>
> which refers to the
>
> <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a>
>
> in the sources, which is where his message begins.

You can then get the original article with
<http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-10 16:07           ` Andreas Schwab
@ 2012-03-10 17:04             ` David Engster
  2012-03-14 15:09             ` Lars Magne Ingebrigtsen
  1 sibling, 0 replies; 15+ messages in thread
From: David Engster @ 2012-03-10 17:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Lars Magne Ingebrigtsen, ding

Andreas Schwab writes:
> David Engster <deng@randomsample.de> writes:
>
>> Yes, but at least it gives you an anchor to the message in
>> question, in this case:
>>
>> http://groups.google.com/group/rec.arts.sf.written/browse_thread/thread/d00e330e9c82797a/eeb018dcf3c1688e?q=#eeb018dcf3c1688e
>>
>> which refers to the
>>
>> <a name="eeb018dcf3c1688e" id="anchor_eeb018dcf3c1688e"></a>
>>
>> in the sources, which is where his message begins.
>
> You can then get the original article with
> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>

Yep, thanks. I should've looked at the options panel...

-David




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-10 16:07           ` Andreas Schwab
  2012-03-10 17:04             ` David Engster
@ 2012-03-14 15:09             ` Lars Magne Ingebrigtsen
  2012-03-14 15:24               ` David Engster
                                 ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-14 15:09 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: ding

Andreas Schwab <schwab@linux-m68k.org> writes:

> You can then get the original article with
> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>

Using curl on that URL just gives me:

<p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.

So I guess one would have to emulate the Google sign-on/cookie thing to
get this to work.

Google sucks these days.

But what to do with nnweb?  Remove the Google stuff?  Replace it with
the howardk stuff?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-14 15:09             ` Lars Magne Ingebrigtsen
@ 2012-03-14 15:24               ` David Engster
  2012-03-14 15:28                 ` Lars Magne Ingebrigtsen
  2012-03-14 15:33               ` James Cloos
  2012-03-14 16:40               ` Andreas Schwab
  2 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-14 15:24 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: Andreas Schwab, ding

Lars Magne Ingebrigtsen writes:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> You can then get the original article with
>> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
>
> Using curl on that URL just gives me:
>
> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.
>
> So I guess one would have to emulate the Google sign-on/cookie thing to
> get this to work.

No, they just don't want to be crawled. A simple "-A foobar" will make
it work. Also, adding "&output=gplain" will give raw text.

-David



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-14 15:24               ` David Engster
@ 2012-03-14 15:28                 ` Lars Magne Ingebrigtsen
  2012-03-14 17:06                   ` David Engster
  0 siblings, 1 reply; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-14 15:28 UTC (permalink / raw)
  To: ding

David Engster <deng@randomsample.de> writes:

> No, they just don't want to be crawled. A simple "-A foobar" will make
> it work. Also, adding "&output=gplain" will give raw text.

Oh, nice.  :-)

curl -A foobar 'http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source&output=gplain'

works fine.

Then the only question is how to get from the Message-ID to the Google
ID.  Let's see...  the first URL had this snippet in the HTML:

Michael Stemper wrote: In article&lt;9rt27vF38...@mid.individual.net&gt;, <b>...</b></span><br><span class="a">http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e</span>

Will there only be one of these URLs in the output?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-14 15:09             ` Lars Magne Ingebrigtsen
  2012-03-14 15:24               ` David Engster
@ 2012-03-14 15:33               ` James Cloos
  2012-03-14 16:40               ` Andreas Schwab
  2 siblings, 0 replies; 15+ messages in thread
From: James Cloos @ 2012-03-14 15:33 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: Andreas Schwab, ding

>>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

LMI> Andreas Schwab <schwab@linux-m68k.org> writes:
>> You can then get the original article with
>> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>

LMI> Using curl on that URL just gives me:

LMI> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.

Adding --user-agent Mozilla/5.0 works.

They must filter some specific user agents.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-14 15:09             ` Lars Magne Ingebrigtsen
  2012-03-14 15:24               ` David Engster
  2012-03-14 15:33               ` James Cloos
@ 2012-03-14 16:40               ` Andreas Schwab
  2 siblings, 0 replies; 15+ messages in thread
From: Andreas Schwab @ 2012-03-14 16:40 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: ding

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> You can then get the original article with
>> <http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source>
>
> Using curl on that URL just gives me:
>
> <p>Your client does not have permission to get URL <code>/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source</code> from this server.
>
> So I guess one would have to emulate the Google sign-on/cookie thing to
> get this to work.

You just need the right User-Agent. :-)

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-14 15:28                 ` Lars Magne Ingebrigtsen
@ 2012-03-14 17:06                   ` David Engster
  2012-03-22 20:40                     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 15+ messages in thread
From: David Engster @ 2012-03-14 17:06 UTC (permalink / raw)
  To: ding

Lars Magne Ingebrigtsen writes:
> David Engster <deng@randomsample.de> writes:
>
>> No, they just don't want to be crawled. A simple "-A foobar" will make
>> it work. Also, adding "&output=gplain" will give raw text.
>
> Oh, nice.  :-)
>
> curl -A foobar 'http://groups.google.com/group/rec.arts.sf.written/msg/eeb018dcf3c1688e?dmode=source&output=gplain'
>
> works fine.
>
> Then the only question is how to get from the Message-ID to the Google
> ID.  Let's see...  the first URL had this snippet in the HTML:
>
> Michael Stemper wrote: In article&lt;9rt27vF38...@mid.individual.net&gt;, <b>...</b></span><br><span class="a">http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e</span>
>
> Will there only be one of these URLs in the output?

No idea. Maybe it would be safer to snarf the q=#eeb018... anchor from the
title's target:

<div class="g" align=left><a href="http://www.google.com/url?url=http://groups.google.com/g/0897fef7/t/d00e330e9c82797a/d/eeb018dcf3c1688e%3Fq%3D%23eeb018dcf3c1688e&amp;ei=WM1gT-vtOdTw_Aah-IjTDg&amp;sa=t&amp;ct=res&amp;cd=1&amp;source=groups&amp;usg=AFQjCNGv6gXQ4vTjK4dTlhZQfwmCOamYKw"
target="" dir=ltr>

Anyway, it's a pretty boring scavenger hunt.

-David



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Does nnweb with Google work any more?
  2012-03-14 17:06                   ` David Engster
@ 2012-03-22 20:40                     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 15+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-22 20:40 UTC (permalink / raw)
  To: ding

David Engster <deng@randomsample.de> writes:

> Anyway, it's a pretty boring scavenger hunt.

Yes, I feel the ennui washing over me every time I consider fixing up
this...  If anybody else feels like fixing up the Google nnweb
implementation...  please do!  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-03-22 20:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-25  0:11 Does nnweb with Google work any more? Lars Magne Ingebrigtsen
2012-02-25 16:44 ` Reiner Steib
2012-03-10  0:55   ` Lars Magne Ingebrigtsen
2012-03-10 11:24     ` David Engster
2012-03-10 12:11       ` Lars Magne Ingebrigtsen
2012-03-10 13:23         ` David Engster
2012-03-10 16:07           ` Andreas Schwab
2012-03-10 17:04             ` David Engster
2012-03-14 15:09             ` Lars Magne Ingebrigtsen
2012-03-14 15:24               ` David Engster
2012-03-14 15:28                 ` Lars Magne Ingebrigtsen
2012-03-14 17:06                   ` David Engster
2012-03-22 20:40                     ` Lars Magne Ingebrigtsen
2012-03-14 15:33               ` James Cloos
2012-03-14 16:40               ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).