Some people told me to follow up this message in the ding mailing list, « bugs » being overflowed by spam. Thanks in advance. Michael Cadilhac writes: > Hi ! > > I've some issues with an RSS feed that comes from a Trac (project > managing tool). > > 1) The feed [1] has the following entries : > > ////////////////////////////////////////////////////////////////////// > > > Ticket #13 (defect) created by pouchet@lrde.epita.fr > pouchet@lrde.epita.fr > Mon, 27 Mar 2006 10:00:52 GMT > http://vaucanson.lrde.org/trac.cgi/ticket/13 > Correction on homepage > > > > Ticket #13 (defect) closed by cadilh_m > michael.cadilhac@lrde.org > Mon, 27 Mar 2006 11:21:43 GMT > http://vaucanson.lrde.org/trac.cgi/ticket/13 > Fixed. > > > ////////////////////////////////////////////////////////////////////// > > As you can see, a ticket has been opened then closed. > > 2) The nnrss.el (I'm using CVS Gnus) code looks like: > > Check if an item is already stored: > > (if (setq url (nnrss-decode-entities-string > (nnrss-node-text rss-ns 'link (cddr item)))) > (not (gethash url nnrss-group-hashtb)) > (setq extra (or (nnrss-node-text content-ns 'encoded item) > (nnrss-node-text rss-ns 'description item))) > (not (gethash extra nnrss-group-hashtb)))) > > Here, the hash table is indexed by, first, the URL, and as > a fallback, by the « encoded » or « description » field. > > > 1 with 2) > > Gosh ! Both messages have the same « link » (URL) ! So they're > hashed by the same index and the first message will be in the group > while the other one will never appear ! > > 3) Then why not hash by URL _AND_ Description ? > > In the RSS field [1], we also have entries with the same URL *and* > the same description (only the title and the date differ). Beside > that, the description could be a large message. > > 4) So what would be a good hash index ? > > What about the concatenation between « date » (or « pubdate ») and > « link » (or its fallback) ? I find that meaningful because a ticket > (here, in my case) couldn't be edited twice at the same time. > > Alternatively, an even better hash index would be the md5sum of the > whole entry from XML ; the drawback being, obviously, the > computation time of this thing. > > If needed, patches attached ; comments welcome :-) > > > Thanks in advance. > > Footnotes: > [1] http://vaucanson.lrde.org/trac.cgi/timeline?milestone=on&ticket=on&changeset=on&wiki=on&max=50&daysback=90&format=rss > > > ---- > Note on the patches : > > For the first patch : > > I haven't kept a back-compatibility for el-rss files : in the > current code, if the « date » field is empty (well, it's rarely > the case, but it could be), it is set to the current time and > that's OK. > > But we compute the hash index from the original « date » field > (i.e. the one from the RSS feed) ; so I add to store it in the el > file and additionally in the `nnrss-group-data' list as the 4th > element of each elements in order to recompute it rightly on group > loading. > > For the second one : > > Same thing, but I preferred to store the md5sum as the 9th (and > last) element of each elements of `nnrss-group-data' directly in > order to avoid (an hard) re-computation.