Gnus development mailing list
 help / color / mirror / Atom feed
* Fwd: nnrss and some (partially redundant) RSS feeds.
@ 2006-03-31 17:28 Micha
  2006-04-11 15:57 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 4+ messages in thread
From: Micha @ 2006-03-31 17:28 UTC (permalink / raw)



[-- Attachment #1.1: Type: text/plain, Size: 3775 bytes --]


Some people told me to follow up this message in the ding mailing
list, « bugs » being overflowed by spam. Thanks in advance.


Michael Cadilhac <michael.cadilhac@lrde.org> writes:

>   Hi !
>
>   I've some issues with an RSS feed that comes from a Trac (project
>   managing tool).
>
> 1) The feed [1] has the following entries :
>
> //////////////////////////////////////////////////////////////////////
>
>    <item>
>     <title>Ticket #13 (defect) created by pouchet@lrde.epita.fr</title>
>     <author>pouchet@lrde.epita.fr</author>
>     <pubDate>Mon, 27 Mar 2006 10:00:52 GMT</pubDate>
>     <link>http://vaucanson.lrde.org/trac.cgi/ticket/13</link>
>     <description>Correction on homepage</description>
>    </item>
>
>    <item>
>     <title>Ticket #13 (defect) closed by cadilh_m</title>
>     <author>michael.cadilhac@lrde.org</author>
>     <pubDate>Mon, 27 Mar 2006 11:21:43 GMT</pubDate>
>     <link>http://vaucanson.lrde.org/trac.cgi/ticket/13</link>
>     <description>Fixed.</description>
>    </item>
>
> //////////////////////////////////////////////////////////////////////
>
>   As you can see, a ticket has been opened then closed.
>
> 2) The nnrss.el (I'm using CVS Gnus) code looks like:
>
>   Check if an item is already stored:
>
> 		 (if (setq url (nnrss-decode-entities-string
> 				(nnrss-node-text rss-ns 'link (cddr item))))
> 		     (not (gethash url nnrss-group-hashtb))
> 		   (setq extra (or (nnrss-node-text content-ns 'encoded item)
> 				   (nnrss-node-text rss-ns 'description item)))
> 		   (not (gethash extra nnrss-group-hashtb))))
>
>   Here,  the  hash  table  is  indexed  by, first,  the  URL,  and  as
>   a fallback, by the « encoded » or « description » field.
>
>
> 1 with 2)
>
>   Gosh  ! Both messages  have the  same «  link »  (URL) !  So they're
>   hashed by the same index and  the first message will be in the group
>   while the other one will never appear !
>
> 3) Then why not hash by URL _AND_ Description ?
>
>   In the RSS  field [1], we also have entries with  the same URL *and*
>   the same  description (only the  title and the date  differ). Beside
>   that, the description could be a large message.
>
> 4) So what would be a good hash index ?
>
>   What about the  concatenation between « date » (or  « pubdate ») and
>   « link » (or its fallback) ? I find that meaningful because a ticket
>   (here, in my case) couldn't be edited twice at the same time.
>
>   Alternatively, an even better hash  index would be the md5sum of the
>   whole  entry  from  XML   ;  the  drawback  being,  obviously,  the
>   computation time of this thing.
>
>   If needed, patches attached ; comments welcome :-)
>
>
> Thanks in advance.
>
> Footnotes: 
> [1]  http://vaucanson.lrde.org/trac.cgi/timeline?milestone=on&ticket=on&changeset=on&wiki=on&max=50&daysback=90&format=rss
>
>
> ----
> Note on the patches :
>
>   For the first patch :
>
>     I  haven't kept  a back-compatibility  for el-rss  files :  in the
>     current code,  if the « date  » field is empty  (well, it's rarely
>     the case,  but it  could be), it  is set  to the current  time and
>     that's OK.
>
>     But we  compute the hash  index from the  original « date  » field
>     (i.e. the one from the RSS feed) ;  so I add to store it in the el
>     file and  additionally in the  `nnrss-group-data' list as  the 4th
>     element of each elements in order to recompute it rightly on group
>     loading.
>
>   For the second one :
>
>     Same thing,  but I preferred to  store the md5sum as  the 9th (and
>     last) element  of each elements of  `nnrss-group-data' directly in
>     order to avoid (an hard) re-computation.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: Patch 1: with the date field. --]
[-- Type: text/x-patch, Size: 7857 bytes --]

Index: ChangeLog
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/ChangeLog,v
retrieving revision 7.1099
diff -c -r7.1099 ChangeLog
*** ChangeLog	27 Mar 2006 09:42:59 -0000	7.1099
--- ChangeLog	27 Mar 2006 16:06:35 -0000
***************
*** 1,3 ****
--- 1,12 ----
+ 2006-03-27  Michael Cadilhac  <michael.cadilhac@lrde.org>  (tiny change)
+  
+ 	* nnrss.el (nnrss-check-group): Hash messages with the `date'
+ 	field together with the previous criteria. Store the original
+ 	`date' field in `nnrss-group-data'.
+ 	(nnrss-read-group-data): Update accordingly.
+ 	(nnrss-retrieve-headers): Update access to `nnrss-group-data' elements.
+ 	(nnrss-request-article): Likewise.
+ 
  2006-03-26  Andreas Seltenreich  <uwi7@rz.uni-karlsruhe.de>  (tiny change)
  
  	* message.el (message-resend): Bind message-generate-hashcash to
Index: nnrss.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/nnrss.el,v
retrieving revision 7.43
diff -c -r7.43 nnrss.el
*** nnrss.el	16 Jan 2006 22:57:40 -0000	7.43
--- nnrss.el	27 Mar 2006 16:06:36 -0000
***************
*** 125,131 ****
  		    (or (nth 4 e) "(nobody)")
  		    "\t"
  		    ;; date
! 		    (or (nth 5 e) "")
  		    "\t"
  		    ;; id
  		    (format "<%d@%s.nnrss>" (car e) group)
--- 125,131 ----
  		    (or (nth 4 e) "(nobody)")
  		    "\t"
  		    ;; date
! 		    (or (nth 6 e) "")
  		    "\t"
  		    ;; id
  		    (format "<%d@%s.nnrss>" (car e) group)
***************
*** 138,149 ****
  		    "-1" "\t"
  		    ;; Xref
  		    "" "\t"
! 		    (if (and (nth 6 e)
  			     (memq nnrss-description-field
  				   nnmail-extra-headers))
  			(concat (symbol-name nnrss-description-field)
  				": "
! 				(nnrss-format-string (nth 6 e))
  				"\t")
  		      "")
  		    (if (and (nth 2 e)
--- 138,149 ----
  		    "-1" "\t"
  		    ;; Xref
  		    "" "\t"
! 		    (if (and (nth 7 e)
  			     (memq nnrss-description-field
  				   nnmail-extra-headers))
  			(concat (symbol-name nnrss-description-field)
  				": "
! 				(nnrss-format-string (nth 7 e))
  				"\t")
  		      "")
  		    (if (and (nth 2 e)
***************
*** 198,210 ****
  	    (insert "Subject: " (nth 3 e) "\n"))
  	(if (nth 4 e)
  	    (insert "From: " (nth 4 e) "\n"))
! 	(if (nth 5 e)
! 	    (insert "Date: " (nnrss-format-string (nth 5 e)) "\n"))
  	(let ((header (buffer-string))
! 	      (text (nth 6 e))
  	      (link (nth 2 e))
! 	      (enclosure (nth 7 e))
! 	      (comments (nth 8 e))
  	      ;; Enable encoding of Newsgroups header in XEmacs.
  	      (default-enable-multibyte-characters t)
  	      (rfc2047-header-encoding-alist
--- 198,210 ----
  	    (insert "Subject: " (nth 3 e) "\n"))
  	(if (nth 4 e)
  	    (insert "From: " (nth 4 e) "\n"))
! 	(if (nth 6 e)
! 	    (insert "Date: " (nnrss-format-string (nth 6 e)) "\n"))
  	(let ((header (buffer-string))
! 	      (text (nth 7 e))
  	      (link (nth 2 e))
! 	      (enclosure (nth 8 e))
! 	      (comments (nth 9 e))
  	      ;; Enable encoding of Newsgroups header in XEmacs.
  	      (default-enable-multibyte-characters t)
  	      (rfc2047-header-encoding-alist
***************
*** 576,582 ****
  	  (insert-file-contents file)
  	  (eval-region (point-min) (point-max))))
        (dolist (e nnrss-group-data)
! 	(puthash (or (nth 2 e) (nth 6 e)) t nnrss-group-hashtb)
  	(when (and (car e) (> nnrss-group-min (car e)))
  	  (setq nnrss-group-min (car e)))
  	(when (and (car e) (< nnrss-group-max (car e)))
--- 576,582 ----
  	  (insert-file-contents file)
  	  (eval-region (point-min) (point-max))))
        (dolist (e nnrss-group-data)
! 	(puthash (concat (nth 5 e) (or (nth 2 e) (nth 7 e))) t nnrss-group-hashtb)
  	(when (and (car e) (> nnrss-group-min (car e)))
  	  (setq nnrss-group-min (car e)))
  	(when (and (car e) (< nnrss-group-max (car e)))
***************
*** 657,663 ****
  ;;; Snarf functions
  
  (defun nnrss-check-group (group server)
!   (let (file xml subject url extra changed author date feed-subject
  	     enclosure comments rss-ns rdf-ns content-ns dc-ns)
      (if (and nnrss-use-local
  	     (file-exists-p (setq file (expand-file-name
--- 657,663 ----
  ;;; Snarf functions
  
  (defun nnrss-check-group (group server)
!   (let (file xml subject url extra changed author date-field date feed-subject
  	     enclosure comments rss-ns rdf-ns content-ns dc-ns)
      (if (and nnrss-use-local
  	     (file-exists-p (setq file (expand-file-name
***************
*** 690,701 ****
      (dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
        (when (and (listp item)
  		 (string= (concat rss-ns "item") (car item))
! 		 (if (setq url (nnrss-decode-entities-string
! 				(nnrss-node-text rss-ns 'link (cddr item))))
! 		     (not (gethash url nnrss-group-hashtb))
! 		   (setq extra (or (nnrss-node-text content-ns 'encoded item)
! 				   (nnrss-node-text rss-ns 'description item)))
! 		   (not (gethash extra nnrss-group-hashtb))))
  	(setq subject (nnrss-node-text rss-ns 'title item))
  	(setq extra (or extra
  			(nnrss-node-text content-ns 'encoded item)
--- 690,705 ----
      (dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
        (when (and (listp item)
  		 (string= (concat rss-ns "item") (car item))
! 		 (progn
! 		   (setq date-field (or (nnrss-node-text dc-ns 'date item)
! 					(nnrss-node-text rss-ns 'pubDate item)
! 					""))
! 		   (if (setq url (nnrss-decode-entities-string
! 				  (nnrss-node-text rss-ns 'link (cddr item))))
! 		       (not (gethash (concat date-field url) nnrss-group-hashtb))
! 		     (setq extra (or (nnrss-node-text content-ns 'encoded item)
! 				     (nnrss-node-text rss-ns 'description item)))
! 		     (not (gethash (concat date-field extra) nnrss-group-hashtb)))))
  	(setq subject (nnrss-node-text rss-ns 'title item))
  	(setq extra (or extra
  			(nnrss-node-text content-ns 'encoded item)
***************
*** 705,713 ****
  	(setq author (or (nnrss-node-text rss-ns 'author item)
  			 (nnrss-node-text dc-ns 'creator item)
  			 (nnrss-node-text dc-ns 'contributor item)))
! 	(setq date (nnrss-normalize-date
! 		    (or (nnrss-node-text dc-ns 'date item)
! 			(nnrss-node-text rss-ns 'pubDate item))))
  	(setq comments (nnrss-node-text rss-ns 'comments item))
  	(when (setq enclosure (cadr (assq (intern (concat rss-ns "enclosure")) item)))
  	  (let ((url (cdr (assq 'url enclosure)))
--- 709,715 ----
  	(setq author (or (nnrss-node-text rss-ns 'author item)
  			 (nnrss-node-text dc-ns 'creator item)
  			 (nnrss-node-text dc-ns 'contributor item)))
! 	(setq date (nnrss-normalize-date date-field))
  	(setq comments (nnrss-node-text rss-ns 'comments item))
  	(when (setq enclosure (cadr (assq (intern (concat rss-ns "enclosure")) item)))
  	  (let ((url (cdr (assq 'url enclosure)))
***************
*** 737,748 ****
  	  url
  	  (and subject (nnrss-mime-encode-string subject))
  	  (and author (nnrss-mime-encode-string author))
  	  date
  	  (and extra (nnrss-decode-entities-string extra))
  	  enclosure
  	  comments)
  	 nnrss-group-data)
! 	(puthash (or url extra) t nnrss-group-hashtb)
  	(setq changed t))
        (setq extra nil))
      (when changed
--- 739,751 ----
  	  url
  	  (and subject (nnrss-mime-encode-string subject))
  	  (and author (nnrss-mime-encode-string author))
+ 	  date-field
  	  date
  	  (and extra (nnrss-decode-entities-string extra))
  	  enclosure
  	  comments)
  	 nnrss-group-data)
! 	(puthash (concat date-field (or url extra)) t nnrss-group-hashtb)
  	(setq changed t))
        (setq extra nil))
      (when changed

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.3: Patch 2: with md5sum. --]
[-- Type: text/x-patch, Size: 4443 bytes --]

Index: ChangeLog
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/ChangeLog,v
retrieving revision 7.1099
diff -c -r7.1099 ChangeLog
*** ChangeLog	27 Mar 2006 09:42:59 -0000	7.1099
--- ChangeLog	27 Mar 2006 16:54:36 -0000
***************
*** 1,3 ****
--- 1,9 ----
+ 2006-03-27  Michael Cadilhac  <micha@mahaena.lrde> (tiny change)
+ 
+ 	* nnrss.el (nnrss-check-group): Use the md5sum of the whole RSS
+ 	item as its hash index. Store this hash in `nnrss-group-data'.
+ 	(nnrss-read-group-data): Update accordingly.
+ 
  2006-03-26  Andreas Seltenreich  <uwi7@rz.uni-karlsruhe.de>  (tiny change)
  
  	* message.el (message-resend): Bind message-generate-hashcash to
Index: nnrss.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/nnrss.el,v
retrieving revision 7.43
diff -c -r7.43 nnrss.el
*** nnrss.el	16 Jan 2006 22:57:40 -0000	7.43
--- nnrss.el	27 Mar 2006 16:54:36 -0000
***************
*** 576,582 ****
  	  (insert-file-contents file)
  	  (eval-region (point-min) (point-max))))
        (dolist (e nnrss-group-data)
! 	(puthash (or (nth 2 e) (nth 6 e)) t nnrss-group-hashtb)
  	(when (and (car e) (> nnrss-group-min (car e)))
  	  (setq nnrss-group-min (car e)))
  	(when (and (car e) (< nnrss-group-max (car e)))
--- 576,582 ----
  	  (insert-file-contents file)
  	  (eval-region (point-min) (point-max))))
        (dolist (e nnrss-group-data)
! 	(puthash (nth 9 e) t nnrss-group-hashtb)
  	(when (and (car e) (> nnrss-group-min (car e)))
  	  (setq nnrss-group-min (car e)))
  	(when (and (car e) (< nnrss-group-max (car e)))
***************
*** 658,664 ****
  
  (defun nnrss-check-group (group server)
    (let (file xml subject url extra changed author date feed-subject
! 	     enclosure comments rss-ns rdf-ns content-ns dc-ns)
      (if (and nnrss-use-local
  	     (file-exists-p (setq file (expand-file-name
  					(nnrss-translate-file-chars
--- 658,664 ----
  
  (defun nnrss-check-group (group server)
    (let (file xml subject url extra changed author date feed-subject
! 	     enclosure comments rss-ns rdf-ns content-ns dc-ns hash-index)
      (if (and nnrss-use-local
  	     (file-exists-p (setq file (expand-file-name
  					(nnrss-translate-file-chars
***************
*** 690,704 ****
      (dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
        (when (and (listp item)
  		 (string= (concat rss-ns "item") (car item))
! 		 (if (setq url (nnrss-decode-entities-string
! 				(nnrss-node-text rss-ns 'link (cddr item))))
! 		     (not (gethash url nnrss-group-hashtb))
! 		   (setq extra (or (nnrss-node-text content-ns 'encoded item)
! 				   (nnrss-node-text rss-ns 'description item)))
! 		   (not (gethash extra nnrss-group-hashtb))))
  	(setq subject (nnrss-node-text rss-ns 'title item))
! 	(setq extra (or extra
! 			(nnrss-node-text content-ns 'encoded item)
  			(nnrss-node-text rss-ns 'description item)))
  	(if (setq feed-subject (nnrss-node-text dc-ns 'subject item))
  	    (setq extra (concat feed-subject "<br /><br />" extra)))
--- 690,701 ----
      (dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
        (when (and (listp item)
  		 (string= (concat rss-ns "item") (car item))
! 		 (progn (setq hash-index (md5 (prin1-to-string item)))
! 			(not (gethash hash-index nnrss-group-hashtb))))
  	(setq subject (nnrss-node-text rss-ns 'title item))
! 	(setq url (nnrss-decode-entities-string
! 		   (nnrss-node-text rss-ns 'link (cddr item))))
! 	(setq extra (or (nnrss-node-text content-ns 'encoded item)
  			(nnrss-node-text rss-ns 'description item)))
  	(if (setq feed-subject (nnrss-node-text dc-ns 'subject item))
  	    (setq extra (concat feed-subject "<br /><br />" extra)))
***************
*** 740,748 ****
  	  date
  	  (and extra (nnrss-decode-entities-string extra))
  	  enclosure
! 	  comments)
  	 nnrss-group-data)
! 	(puthash (or url extra) t nnrss-group-hashtb)
  	(setq changed t))
        (setq extra nil))
      (when changed
--- 737,746 ----
  	  date
  	  (and extra (nnrss-decode-entities-string extra))
  	  enclosure
! 	  comments
! 	  hash-index)
  	 nnrss-group-data)
! 	(puthash hash-index t nnrss-group-hashtb)
  	(setq changed t))
        (setq extra nil))
      (when changed

[-- Attachment #1.4: Type: text/plain, Size: 303 bytes --]


-- 
|  Mieux vaut se taire                        Michaël 'Micha' Cadilhac |
|   Que de parler trop fort.               cadilh_m - Epita 2007 - CSI |
|           -- As de trèfle                     JID: micha@amessage.be |
`--  -  -                                                        - - --'

[-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: nnrss and some (partially redundant) RSS feeds.
  2006-03-31 17:28 Fwd: nnrss and some (partially redundant) RSS feeds Micha
@ 2006-04-11 15:57 ` Lars Magne Ingebrigtsen
  2006-04-11 16:27   ` assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.) Reiner Steib
  0 siblings, 1 reply; 4+ messages in thread
From: Lars Magne Ingebrigtsen @ 2006-04-11 15:57 UTC (permalink / raw)


Micha <micha@lrde.epita.fr> writes:

>>   What about the  concatenation between « date » (or  « pubdate ») and
>>   « link » (or its fallback) ? I find that meaningful because a ticket
>>   (here, in my case) couldn't be edited twice at the same time.

Sounds like a good idea to me.

>>   If needed, patches attached ; comments welcome :-)

Do you have copyright assignment papers on file with the FSF for
Emacs/Gnus?

(Everybody: Has the Emacs people finally made the copyright assignment
list easily available, by any chance?)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 4+ messages in thread

* assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.)
  2006-04-11 15:57 ` Lars Magne Ingebrigtsen
@ 2006-04-11 16:27   ` Reiner Steib
  2006-04-11 16:39     ` assignment papers Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 4+ messages in thread
From: Reiner Steib @ 2006-04-11 16:27 UTC (permalink / raw)


On Tue, Apr 11 2006, Lars Magne Ingebrigtsen wrote:

> Do you have copyright assignment papers on file with the FSF for
> Emacs/Gnus?
>
> (Everybody: Has the Emacs people finally made the copyright assignment
> list easily available, by any chance?)

I asked Richard for access to the list back in January; within a few
days the admins arranged an account.  The list isn't available in
public because of privacy issues.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: assignment papers
  2006-04-11 16:27   ` assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.) Reiner Steib
@ 2006-04-11 16:39     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 4+ messages in thread
From: Lars Magne Ingebrigtsen @ 2006-04-11 16:39 UTC (permalink / raw)


Reiner Steib <reinersteib+gmane@imap.cc> writes:

> I asked Richard for access to the list back in January; within a few
> days the admins arranged an account.  The list isn't available in
> public because of privacy issues.

Ok; I'll poke, er, ask Richard.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-04-11 16:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-31 17:28 Fwd: nnrss and some (partially redundant) RSS feeds Micha
2006-04-11 15:57 ` Lars Magne Ingebrigtsen
2006-04-11 16:27   ` assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.) Reiner Steib
2006-04-11 16:39     ` assignment papers Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).