* Fwd: nnrss and some (partially redundant) RSS feeds.
@ 2006-03-31 17:28 Micha
2006-04-11 15:57 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 4+ messages in thread
From: Micha @ 2006-03-31 17:28 UTC (permalink / raw)
[-- Attachment #1.1: Type: text/plain, Size: 3775 bytes --]
Some people told me to follow up this message in the ding mailing
list, « bugs » being overflowed by spam. Thanks in advance.
Michael Cadilhac <michael.cadilhac@lrde.org> writes:
> Hi !
>
> I've some issues with an RSS feed that comes from a Trac (project
> managing tool).
>
> 1) The feed [1] has the following entries :
>
> //////////////////////////////////////////////////////////////////////
>
> <item>
> <title>Ticket #13 (defect) created by pouchet@lrde.epita.fr</title>
> <author>pouchet@lrde.epita.fr</author>
> <pubDate>Mon, 27 Mar 2006 10:00:52 GMT</pubDate>
> <link>http://vaucanson.lrde.org/trac.cgi/ticket/13</link>
> <description>Correction on homepage</description>
> </item>
>
> <item>
> <title>Ticket #13 (defect) closed by cadilh_m</title>
> <author>michael.cadilhac@lrde.org</author>
> <pubDate>Mon, 27 Mar 2006 11:21:43 GMT</pubDate>
> <link>http://vaucanson.lrde.org/trac.cgi/ticket/13</link>
> <description>Fixed.</description>
> </item>
>
> //////////////////////////////////////////////////////////////////////
>
> As you can see, a ticket has been opened then closed.
>
> 2) The nnrss.el (I'm using CVS Gnus) code looks like:
>
> Check if an item is already stored:
>
> (if (setq url (nnrss-decode-entities-string
> (nnrss-node-text rss-ns 'link (cddr item))))
> (not (gethash url nnrss-group-hashtb))
> (setq extra (or (nnrss-node-text content-ns 'encoded item)
> (nnrss-node-text rss-ns 'description item)))
> (not (gethash extra nnrss-group-hashtb))))
>
> Here, the hash table is indexed by, first, the URL, and as
> a fallback, by the « encoded » or « description » field.
>
>
> 1 with 2)
>
> Gosh ! Both messages have the same « link » (URL) ! So they're
> hashed by the same index and the first message will be in the group
> while the other one will never appear !
>
> 3) Then why not hash by URL _AND_ Description ?
>
> In the RSS field [1], we also have entries with the same URL *and*
> the same description (only the title and the date differ). Beside
> that, the description could be a large message.
>
> 4) So what would be a good hash index ?
>
> What about the concatenation between « date » (or « pubdate ») and
> « link » (or its fallback) ? I find that meaningful because a ticket
> (here, in my case) couldn't be edited twice at the same time.
>
> Alternatively, an even better hash index would be the md5sum of the
> whole entry from XML ; the drawback being, obviously, the
> computation time of this thing.
>
> If needed, patches attached ; comments welcome :-)
>
>
> Thanks in advance.
>
> Footnotes:
> [1] http://vaucanson.lrde.org/trac.cgi/timeline?milestone=on&ticket=on&changeset=on&wiki=on&max=50&daysback=90&format=rss
>
>
> ----
> Note on the patches :
>
> For the first patch :
>
> I haven't kept a back-compatibility for el-rss files : in the
> current code, if the « date » field is empty (well, it's rarely
> the case, but it could be), it is set to the current time and
> that's OK.
>
> But we compute the hash index from the original « date » field
> (i.e. the one from the RSS feed) ; so I add to store it in the el
> file and additionally in the `nnrss-group-data' list as the 4th
> element of each elements in order to recompute it rightly on group
> loading.
>
> For the second one :
>
> Same thing, but I preferred to store the md5sum as the 9th (and
> last) element of each elements of `nnrss-group-data' directly in
> order to avoid (an hard) re-computation.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: Patch 1: with the date field. --]
[-- Type: text/x-patch, Size: 7857 bytes --]
Index: ChangeLog
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/ChangeLog,v
retrieving revision 7.1099
diff -c -r7.1099 ChangeLog
*** ChangeLog 27 Mar 2006 09:42:59 -0000 7.1099
--- ChangeLog 27 Mar 2006 16:06:35 -0000
***************
*** 1,3 ****
--- 1,12 ----
+ 2006-03-27 Michael Cadilhac <michael.cadilhac@lrde.org> (tiny change)
+
+ * nnrss.el (nnrss-check-group): Hash messages with the `date'
+ field together with the previous criteria. Store the original
+ `date' field in `nnrss-group-data'.
+ (nnrss-read-group-data): Update accordingly.
+ (nnrss-retrieve-headers): Update access to `nnrss-group-data' elements.
+ (nnrss-request-article): Likewise.
+
2006-03-26 Andreas Seltenreich <uwi7@rz.uni-karlsruhe.de> (tiny change)
* message.el (message-resend): Bind message-generate-hashcash to
Index: nnrss.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/nnrss.el,v
retrieving revision 7.43
diff -c -r7.43 nnrss.el
*** nnrss.el 16 Jan 2006 22:57:40 -0000 7.43
--- nnrss.el 27 Mar 2006 16:06:36 -0000
***************
*** 125,131 ****
(or (nth 4 e) "(nobody)")
"\t"
;; date
! (or (nth 5 e) "")
"\t"
;; id
(format "<%d@%s.nnrss>" (car e) group)
--- 125,131 ----
(or (nth 4 e) "(nobody)")
"\t"
;; date
! (or (nth 6 e) "")
"\t"
;; id
(format "<%d@%s.nnrss>" (car e) group)
***************
*** 138,149 ****
"-1" "\t"
;; Xref
"" "\t"
! (if (and (nth 6 e)
(memq nnrss-description-field
nnmail-extra-headers))
(concat (symbol-name nnrss-description-field)
": "
! (nnrss-format-string (nth 6 e))
"\t")
"")
(if (and (nth 2 e)
--- 138,149 ----
"-1" "\t"
;; Xref
"" "\t"
! (if (and (nth 7 e)
(memq nnrss-description-field
nnmail-extra-headers))
(concat (symbol-name nnrss-description-field)
": "
! (nnrss-format-string (nth 7 e))
"\t")
"")
(if (and (nth 2 e)
***************
*** 198,210 ****
(insert "Subject: " (nth 3 e) "\n"))
(if (nth 4 e)
(insert "From: " (nth 4 e) "\n"))
! (if (nth 5 e)
! (insert "Date: " (nnrss-format-string (nth 5 e)) "\n"))
(let ((header (buffer-string))
! (text (nth 6 e))
(link (nth 2 e))
! (enclosure (nth 7 e))
! (comments (nth 8 e))
;; Enable encoding of Newsgroups header in XEmacs.
(default-enable-multibyte-characters t)
(rfc2047-header-encoding-alist
--- 198,210 ----
(insert "Subject: " (nth 3 e) "\n"))
(if (nth 4 e)
(insert "From: " (nth 4 e) "\n"))
! (if (nth 6 e)
! (insert "Date: " (nnrss-format-string (nth 6 e)) "\n"))
(let ((header (buffer-string))
! (text (nth 7 e))
(link (nth 2 e))
! (enclosure (nth 8 e))
! (comments (nth 9 e))
;; Enable encoding of Newsgroups header in XEmacs.
(default-enable-multibyte-characters t)
(rfc2047-header-encoding-alist
***************
*** 576,582 ****
(insert-file-contents file)
(eval-region (point-min) (point-max))))
(dolist (e nnrss-group-data)
! (puthash (or (nth 2 e) (nth 6 e)) t nnrss-group-hashtb)
(when (and (car e) (> nnrss-group-min (car e)))
(setq nnrss-group-min (car e)))
(when (and (car e) (< nnrss-group-max (car e)))
--- 576,582 ----
(insert-file-contents file)
(eval-region (point-min) (point-max))))
(dolist (e nnrss-group-data)
! (puthash (concat (nth 5 e) (or (nth 2 e) (nth 7 e))) t nnrss-group-hashtb)
(when (and (car e) (> nnrss-group-min (car e)))
(setq nnrss-group-min (car e)))
(when (and (car e) (< nnrss-group-max (car e)))
***************
*** 657,663 ****
;;; Snarf functions
(defun nnrss-check-group (group server)
! (let (file xml subject url extra changed author date feed-subject
enclosure comments rss-ns rdf-ns content-ns dc-ns)
(if (and nnrss-use-local
(file-exists-p (setq file (expand-file-name
--- 657,663 ----
;;; Snarf functions
(defun nnrss-check-group (group server)
! (let (file xml subject url extra changed author date-field date feed-subject
enclosure comments rss-ns rdf-ns content-ns dc-ns)
(if (and nnrss-use-local
(file-exists-p (setq file (expand-file-name
***************
*** 690,701 ****
(dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
(when (and (listp item)
(string= (concat rss-ns "item") (car item))
! (if (setq url (nnrss-decode-entities-string
! (nnrss-node-text rss-ns 'link (cddr item))))
! (not (gethash url nnrss-group-hashtb))
! (setq extra (or (nnrss-node-text content-ns 'encoded item)
! (nnrss-node-text rss-ns 'description item)))
! (not (gethash extra nnrss-group-hashtb))))
(setq subject (nnrss-node-text rss-ns 'title item))
(setq extra (or extra
(nnrss-node-text content-ns 'encoded item)
--- 690,705 ----
(dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
(when (and (listp item)
(string= (concat rss-ns "item") (car item))
! (progn
! (setq date-field (or (nnrss-node-text dc-ns 'date item)
! (nnrss-node-text rss-ns 'pubDate item)
! ""))
! (if (setq url (nnrss-decode-entities-string
! (nnrss-node-text rss-ns 'link (cddr item))))
! (not (gethash (concat date-field url) nnrss-group-hashtb))
! (setq extra (or (nnrss-node-text content-ns 'encoded item)
! (nnrss-node-text rss-ns 'description item)))
! (not (gethash (concat date-field extra) nnrss-group-hashtb)))))
(setq subject (nnrss-node-text rss-ns 'title item))
(setq extra (or extra
(nnrss-node-text content-ns 'encoded item)
***************
*** 705,713 ****
(setq author (or (nnrss-node-text rss-ns 'author item)
(nnrss-node-text dc-ns 'creator item)
(nnrss-node-text dc-ns 'contributor item)))
! (setq date (nnrss-normalize-date
! (or (nnrss-node-text dc-ns 'date item)
! (nnrss-node-text rss-ns 'pubDate item))))
(setq comments (nnrss-node-text rss-ns 'comments item))
(when (setq enclosure (cadr (assq (intern (concat rss-ns "enclosure")) item)))
(let ((url (cdr (assq 'url enclosure)))
--- 709,715 ----
(setq author (or (nnrss-node-text rss-ns 'author item)
(nnrss-node-text dc-ns 'creator item)
(nnrss-node-text dc-ns 'contributor item)))
! (setq date (nnrss-normalize-date date-field))
(setq comments (nnrss-node-text rss-ns 'comments item))
(when (setq enclosure (cadr (assq (intern (concat rss-ns "enclosure")) item)))
(let ((url (cdr (assq 'url enclosure)))
***************
*** 737,748 ****
url
(and subject (nnrss-mime-encode-string subject))
(and author (nnrss-mime-encode-string author))
date
(and extra (nnrss-decode-entities-string extra))
enclosure
comments)
nnrss-group-data)
! (puthash (or url extra) t nnrss-group-hashtb)
(setq changed t))
(setq extra nil))
(when changed
--- 739,751 ----
url
(and subject (nnrss-mime-encode-string subject))
(and author (nnrss-mime-encode-string author))
+ date-field
date
(and extra (nnrss-decode-entities-string extra))
enclosure
comments)
nnrss-group-data)
! (puthash (concat date-field (or url extra)) t nnrss-group-hashtb)
(setq changed t))
(setq extra nil))
(when changed
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.3: Patch 2: with md5sum. --]
[-- Type: text/x-patch, Size: 4443 bytes --]
Index: ChangeLog
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/ChangeLog,v
retrieving revision 7.1099
diff -c -r7.1099 ChangeLog
*** ChangeLog 27 Mar 2006 09:42:59 -0000 7.1099
--- ChangeLog 27 Mar 2006 16:54:36 -0000
***************
*** 1,3 ****
--- 1,9 ----
+ 2006-03-27 Michael Cadilhac <micha@mahaena.lrde> (tiny change)
+
+ * nnrss.el (nnrss-check-group): Use the md5sum of the whole RSS
+ item as its hash index. Store this hash in `nnrss-group-data'.
+ (nnrss-read-group-data): Update accordingly.
+
2006-03-26 Andreas Seltenreich <uwi7@rz.uni-karlsruhe.de> (tiny change)
* message.el (message-resend): Bind message-generate-hashcash to
Index: nnrss.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/nnrss.el,v
retrieving revision 7.43
diff -c -r7.43 nnrss.el
*** nnrss.el 16 Jan 2006 22:57:40 -0000 7.43
--- nnrss.el 27 Mar 2006 16:54:36 -0000
***************
*** 576,582 ****
(insert-file-contents file)
(eval-region (point-min) (point-max))))
(dolist (e nnrss-group-data)
! (puthash (or (nth 2 e) (nth 6 e)) t nnrss-group-hashtb)
(when (and (car e) (> nnrss-group-min (car e)))
(setq nnrss-group-min (car e)))
(when (and (car e) (< nnrss-group-max (car e)))
--- 576,582 ----
(insert-file-contents file)
(eval-region (point-min) (point-max))))
(dolist (e nnrss-group-data)
! (puthash (nth 9 e) t nnrss-group-hashtb)
(when (and (car e) (> nnrss-group-min (car e)))
(setq nnrss-group-min (car e)))
(when (and (car e) (< nnrss-group-max (car e)))
***************
*** 658,664 ****
(defun nnrss-check-group (group server)
(let (file xml subject url extra changed author date feed-subject
! enclosure comments rss-ns rdf-ns content-ns dc-ns)
(if (and nnrss-use-local
(file-exists-p (setq file (expand-file-name
(nnrss-translate-file-chars
--- 658,664 ----
(defun nnrss-check-group (group server)
(let (file xml subject url extra changed author date feed-subject
! enclosure comments rss-ns rdf-ns content-ns dc-ns hash-index)
(if (and nnrss-use-local
(file-exists-p (setq file (expand-file-name
(nnrss-translate-file-chars
***************
*** 690,704 ****
(dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
(when (and (listp item)
(string= (concat rss-ns "item") (car item))
! (if (setq url (nnrss-decode-entities-string
! (nnrss-node-text rss-ns 'link (cddr item))))
! (not (gethash url nnrss-group-hashtb))
! (setq extra (or (nnrss-node-text content-ns 'encoded item)
! (nnrss-node-text rss-ns 'description item)))
! (not (gethash extra nnrss-group-hashtb))))
(setq subject (nnrss-node-text rss-ns 'title item))
! (setq extra (or extra
! (nnrss-node-text content-ns 'encoded item)
(nnrss-node-text rss-ns 'description item)))
(if (setq feed-subject (nnrss-node-text dc-ns 'subject item))
(setq extra (concat feed-subject "<br /><br />" extra)))
--- 690,701 ----
(dolist (item (nreverse (nnrss-find-el (intern (concat rss-ns "item")) xml)))
(when (and (listp item)
(string= (concat rss-ns "item") (car item))
! (progn (setq hash-index (md5 (prin1-to-string item)))
! (not (gethash hash-index nnrss-group-hashtb))))
(setq subject (nnrss-node-text rss-ns 'title item))
! (setq url (nnrss-decode-entities-string
! (nnrss-node-text rss-ns 'link (cddr item))))
! (setq extra (or (nnrss-node-text content-ns 'encoded item)
(nnrss-node-text rss-ns 'description item)))
(if (setq feed-subject (nnrss-node-text dc-ns 'subject item))
(setq extra (concat feed-subject "<br /><br />" extra)))
***************
*** 740,748 ****
date
(and extra (nnrss-decode-entities-string extra))
enclosure
! comments)
nnrss-group-data)
! (puthash (or url extra) t nnrss-group-hashtb)
(setq changed t))
(setq extra nil))
(when changed
--- 737,746 ----
date
(and extra (nnrss-decode-entities-string extra))
enclosure
! comments
! hash-index)
nnrss-group-data)
! (puthash hash-index t nnrss-group-hashtb)
(setq changed t))
(setq extra nil))
(when changed
[-- Attachment #1.4: Type: text/plain, Size: 303 bytes --]
--
| Mieux vaut se taire Michaël 'Micha' Cadilhac |
| Que de parler trop fort. cadilh_m - Epita 2007 - CSI |
| -- As de trèfle JID: micha@amessage.be |
`-- - - - - --'
[-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Fwd: nnrss and some (partially redundant) RSS feeds.
2006-03-31 17:28 Fwd: nnrss and some (partially redundant) RSS feeds Micha
@ 2006-04-11 15:57 ` Lars Magne Ingebrigtsen
2006-04-11 16:27 ` assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.) Reiner Steib
0 siblings, 1 reply; 4+ messages in thread
From: Lars Magne Ingebrigtsen @ 2006-04-11 15:57 UTC (permalink / raw)
Micha <micha@lrde.epita.fr> writes:
>> What about the concatenation between « date » (or « pubdate ») and
>> « link » (or its fallback) ? I find that meaningful because a ticket
>> (here, in my case) couldn't be edited twice at the same time.
Sounds like a good idea to me.
>> If needed, patches attached ; comments welcome :-)
Do you have copyright assignment papers on file with the FSF for
Emacs/Gnus?
(Everybody: Has the Emacs people finally made the copyright assignment
list easily available, by any chance?)
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 4+ messages in thread
* assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.)
2006-04-11 15:57 ` Lars Magne Ingebrigtsen
@ 2006-04-11 16:27 ` Reiner Steib
2006-04-11 16:39 ` assignment papers Lars Magne Ingebrigtsen
0 siblings, 1 reply; 4+ messages in thread
From: Reiner Steib @ 2006-04-11 16:27 UTC (permalink / raw)
On Tue, Apr 11 2006, Lars Magne Ingebrigtsen wrote:
> Do you have copyright assignment papers on file with the FSF for
> Emacs/Gnus?
>
> (Everybody: Has the Emacs people finally made the copyright assignment
> list easily available, by any chance?)
I asked Richard for access to the list back in January; within a few
days the admins arranged an account. The list isn't available in
public because of privacy issues.
Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: assignment papers
2006-04-11 16:27 ` assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.) Reiner Steib
@ 2006-04-11 16:39 ` Lars Magne Ingebrigtsen
0 siblings, 0 replies; 4+ messages in thread
From: Lars Magne Ingebrigtsen @ 2006-04-11 16:39 UTC (permalink / raw)
Reiner Steib <reinersteib+gmane@imap.cc> writes:
> I asked Richard for access to the list back in January; within a few
> days the admins arranged an account. The list isn't available in
> public because of privacy issues.
Ok; I'll poke, er, ask Richard.
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-04-11 16:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-31 17:28 Fwd: nnrss and some (partially redundant) RSS feeds Micha
2006-04-11 15:57 ` Lars Magne Ingebrigtsen
2006-04-11 16:27 ` assignment papers (was: Fwd: nnrss and some (partially redundant) RSS feeds.) Reiner Steib
2006-04-11 16:39 ` assignment papers Lars Magne Ingebrigtsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).