Gnus development mailing list
 help / color / mirror / Atom feed
* Re: URIs wrapped in angle brackets are not extracted correctly
       [not found] <874phvi0ui.fsf@ID-24456.user.uni-berlin.de>
@ 2007-09-18 11:45 ` Katsumi Yamaoka
  2007-09-18 19:26   ` Christoph Conrad
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Katsumi Yamaoka @ 2007-09-18 11:45 UTC (permalink / raw)
  To: Christoph Conrad; +Cc: bugs, ding

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

(I added Cc: ding.)

>>>>> Christoph Conrad wrote:
> No Gnus v0.7
> GNU Emacs 23.0.50.1 (i686-pc-linux-gnu, GTK+ Version 2.10.13, multi-tty)
>  of 2007-09-03 on brabbelbox
> 200 news.gmane.org InterNetNews NNRP server INN 2.4.1 ready (posting ok).

> Hi,

> according to RFC 3986 Appendix C the following URI should be extracted
> correctly when embedded in angle brackets, containing a line break:

<http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>

> This is not the case in current cvs-Gnus when pressing <RET>
> (widget-button-press). Only the first part of the URI before the line
> break is extracted.

There might be a reason neither `ffap-url-at-point' (ffap.el) nor
`thing-at-point' (thingatpt.el) works with such data.  I tried
making Gnus work (the patch is attached to this message).  However,
it currently does not work with the following ones:

>>>>> Christoph Conrad wrote:
> <http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
> D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>

Katsumi> <http://www.jpl.org/
Katsumi> Katsumi>

I think it is hard to make it work with any type of citations.
Any idea?

> ,----
>| In practice, URIs are delimited in a variety of ways, but usually within
>| double-quotes "http://example.com/", angle brackets
>| <http://example.com/>, or just by using whitespace
>| [...]
>| In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may
>| have to be added to break a long URI across lines. The whitespace should
>| be ignored when the URI is extracted.
> `----

> Best regards,
> Christoph


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Patch to gnus-art.el --]
[-- Type: text/x-patch, Size: 2207 bytes --]

--- gnus-art.el~	2007-08-24 10:22:10 +0000
+++ gnus-art.el	2007-09-18 11:43:14 +0000
@@ -7357,9 +7357,24 @@
 	(setq regexp (eval (car entry)))
 	(goto-char beg)
 	(while (re-search-forward regexp nil t)
-	  (let* ((start (and entry (match-beginning (nth 1 entry))))
-		 (end (and entry (match-end (nth 1 entry))))
+	  (let* ((start (match-beginning (nth 1 entry)))
+		 (end (match-end (nth 1 entry)))
 		 (from (match-beginning 0)))
+	    (when (eq (car entry) 'gnus-button-url-regexp)
+	      (let ((to (match-end 0)))
+		(goto-char end)
+		(when (or (looking-at "[\t\n ]*>")
+			  (progn
+			    (goto-char start)
+			    (and (or (eq (char-before) ?<)
+				     (and (search-backward "<" nil t)
+					  (string-match
+					   "url:[\t\n ]*"
+					   (buffer-substring (match-end 0)
+							     start))))
+				 (re-search-forward "[\t\n ]*>" nil t))))
+		  (setq end (match-beginning 0)))
+		(goto-char to)))
 	    (when (and (or (eq t (nth 2 entry))
 			   (eval (nth 2 entry)))
 		       (not (gnus-button-in-region-p
@@ -7450,6 +7465,27 @@
       (if (looking-at (eval (car entry)))
 	  (setq alist nil)
 	(setq entry nil)))
+    (when (and entry
+	       (eq (car entry) 'gnus-button-url-regexp))
+      (let ((start (point))
+	    end md)
+	(goto-char (match-end (nth 1 entry)))
+	(when (save-match-data
+		(when (or (looking-at "[\t\n ]*>")
+			  (progn
+			    (goto-char start)
+			    (and (or (eq (char-before) ?<)
+				     (and (search-backward "<" nil t)
+					  (string-match
+					   "url:[\t\n ]*"
+					   (buffer-substring (match-end 0)
+							     start))))
+				 (re-search-forward "[\t\n ]*>" nil t))))
+		  (setq end (match-beginning 0))))
+	  (setq md (match-data))
+	  (setcar (nthcdr (1+ (* (nth 1 entry) 2)) md) end)
+	  (set-match-data md))
+	(goto-char start)))
     entry))
 
 (defun gnus-button-push (marker)
@@ -7460,7 +7496,9 @@
 	   (inhibit-point-motion-hooks t)
 	   (fun (nth 3 entry))
 	   (args (mapcar (lambda (group)
-			   (let ((string (match-string group)))
+			   (let ((string (gnus-replace-in-string
+					  (match-string group)
+					  "\\(?:[\t ]*\n\\)+[\t ]*" "")))
 			     (set-text-properties
 			      0 (length string) nil string)
 			     string))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: URIs wrapped in angle brackets are not extracted correctly
  2007-09-18 11:45 ` URIs wrapped in angle brackets are not extracted correctly Katsumi Yamaoka
@ 2007-09-18 19:26   ` Christoph Conrad
  2007-09-18 21:03   ` Elias Oltmanns
  2007-09-29 10:39   ` Reiner Steib
  2 siblings, 0 replies; 7+ messages in thread
From: Christoph Conrad @ 2007-09-18 19:26 UTC (permalink / raw)
  To: Katsumi Yamaoka; +Cc: bugs, ding

Hi Katsumi,

> I tried making Gnus work (the patch is attached to this message).

That works, thank you very much!

> I think it is hard to make it work with any type of citations. Any
> idea?

I think some heuristic method would make that possible, but it could be
hard work to get any kind of citation.

Best regards,
Christoph



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: URIs wrapped in angle brackets are not extracted correctly
  2007-09-18 11:45 ` URIs wrapped in angle brackets are not extracted correctly Katsumi Yamaoka
  2007-09-18 19:26   ` Christoph Conrad
@ 2007-09-18 21:03   ` Elias Oltmanns
  2007-09-29 10:39   ` Reiner Steib
  2 siblings, 0 replies; 7+ messages in thread
From: Elias Oltmanns @ 2007-09-18 21:03 UTC (permalink / raw)
  To: ding

Katsumi Yamaoka <yamaoka@jpl.org> wrote:
> (I added Cc: ding.)
>
>>>>>> Christoph Conrad wrote:
>> No Gnus v0.7
>> GNU Emacs 23.0.50.1 (i686-pc-linux-gnu, GTK+ Version 2.10.13, multi-tty)
>>  of 2007-09-03 on brabbelbox
>> 200 news.gmane.org InterNetNews NNRP server INN 2.4.1 ready (posting ok).
>
>> Hi,
>
>> according to RFC 3986 Appendix C the following URI should be extracted
>> correctly when embedded in angle brackets, containing a line break:
>
> <http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
> D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>
>
>> This is not the case in current cvs-Gnus when pressing <RET>
>> (widget-button-press). Only the first part of the URI before the line
>> break is extracted.
>
> There might be a reason neither `ffap-url-at-point' (ffap.el) nor
> `thing-at-point' (thingatpt.el) works with such data.  I tried
> making Gnus work (the patch is attached to this message).

May I just point out another issue with the button logic in article
buffers in general. Whereas gnus-article-add-buttons does evaluate the
form of each gnus-button-alist entry, i.e., (nth 2 entry), to see
whether that particular entry is applicable or not, gnus-push-button
does not. As a result buttons show up correctly in the article buffer.
But if you press enter on a button for which there are two entries in
gnus-button-alist whose regexp matches but only the form of one of them
evaluates to something non nil, gnus-button-push won't care and apply
the first of those two entries it stumbles upon.

I wonder whether the check for (nth 2 entry) should be moved to
gnus-button-entry or whether it should just be added to gnus-button-push
as well.

Regards,

Elias




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: URIs wrapped in angle brackets are not extracted correctly
  2007-09-18 11:45 ` URIs wrapped in angle brackets are not extracted correctly Katsumi Yamaoka
  2007-09-18 19:26   ` Christoph Conrad
  2007-09-18 21:03   ` Elias Oltmanns
@ 2007-09-29 10:39   ` Reiner Steib
  2007-09-29 10:50     ` Christoph Conrad
  2007-09-30 23:55     ` Katsumi Yamaoka
  2 siblings, 2 replies; 7+ messages in thread
From: Reiner Steib @ 2007-09-29 10:39 UTC (permalink / raw)
  To: Katsumi Yamaoka; +Cc: Christoph Conrad, bugs, ding

On Tue, Sep 18 2007, Katsumi Yamaoka wrote:

>>>>>> Christoph Conrad wrote:
>> according to RFC 3986 Appendix C the following URI should be extracted
>> correctly when embedded in angle brackets, containing a line break:
[
<http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>
]
> I tried making Gnus work (the patch is attached to this message).

I don't see this patch in CVS.  Is there any problem with it?

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: URIs wrapped in angle brackets are not extracted correctly
  2007-09-29 10:39   ` Reiner Steib
@ 2007-09-29 10:50     ` Christoph Conrad
  2007-09-30 23:55     ` Katsumi Yamaoka
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Conrad @ 2007-09-29 10:50 UTC (permalink / raw)
  To: Katsumi Yamaoka; +Cc: bugs, ding

Hi Reiner,

> I don't see this patch in CVS.  Is there any problem with it?

Not for me - it works.

Best regards,
Christoph



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: URIs wrapped in angle brackets are not extracted correctly
  2007-09-29 10:39   ` Reiner Steib
  2007-09-29 10:50     ` Christoph Conrad
@ 2007-09-30 23:55     ` Katsumi Yamaoka
  2007-10-11 22:09       ` Katsumi Yamaoka
  1 sibling, 1 reply; 7+ messages in thread
From: Katsumi Yamaoka @ 2007-09-30 23:55 UTC (permalink / raw)
  To: ding; +Cc: Christoph Conrad, bugs

>>>>> Reiner Steib wrote:

> <http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
> D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>

> I don't see this patch in CVS.  Is there any problem with it?

That does not support wrapped and cited urls like the one above.
But I have a plan to improve it.  It will hopefully work also with
such one:

ky> <http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
ky> D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>

I'm going to do it in the near future (after the next emacs-w3m
release?).

Regards,



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: URIs wrapped in angle brackets are not extracted correctly
  2007-09-30 23:55     ` Katsumi Yamaoka
@ 2007-10-11 22:09       ` Katsumi Yamaoka
  0 siblings, 0 replies; 7+ messages in thread
From: Katsumi Yamaoka @ 2007-10-11 22:09 UTC (permalink / raw)
  To: ding; +Cc: Christoph Conrad, bugs

>>>>> Katsumi Yamaoka wrote:
>>>>>> Reiner Steib wrote:

>> <http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E990108D4B
>> D2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>

>> I don't see this patch in CVS.  Is there any problem with it?

> That does not support wrapped and cited urls like the one above.

I've installed the improved version in the Gnus CVS trunk.  This
works with:

> Please visit <http://www.faz.net/s/Rub560251485DC24AF181BBEF83
> E12CA16E/Doc~E990108D4BD2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Sc
> ontent.html>.

ky> See "URL:http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12
ky>     CA16E/Doc~E990108D4BD2
ky>         D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html"

but does not work with:

http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~E99
0108D4BD2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html

> <http://www.faz.net/s/Rub560251485DC24AF181BBEF83E12CA16E/Doc~
>
> E990108D4BD2D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html>

ky> You may not see "URL:http://www.faz.net/s/Rub560251485DC24AF
yk>     181BBEF83E12CA16E/Doc~E990108D4BD2
ky>         D4C29A2BA6460BE3F10EC~ATpl~Ecommon~Scontent.html".

Regards,



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-10-11 22:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <874phvi0ui.fsf@ID-24456.user.uni-berlin.de>
2007-09-18 11:45 ` URIs wrapped in angle brackets are not extracted correctly Katsumi Yamaoka
2007-09-18 19:26   ` Christoph Conrad
2007-09-18 21:03   ` Elias Oltmanns
2007-09-29 10:39   ` Reiner Steib
2007-09-29 10:50     ` Christoph Conrad
2007-09-30 23:55     ` Katsumi Yamaoka
2007-10-11 22:09       ` Katsumi Yamaoka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).