* ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) [not found] ` <g69hcy4t8g0.fsf@lifelogs.com> @ 2006-10-16 21:42 ` Reiner Steib 2006-10-16 22:47 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka 2006-10-17 14:58 ` Ted Zlatanov 0 siblings, 2 replies; 11+ messages in thread From: Reiner Steib @ 2006-10-16 21:42 UTC (permalink / raw) Cc: bugs, Ted Zlatanov [ Let's shift this to ding. MFT set to ding. ] On Mon, Oct 16 2006, Ted Zlatanov wrote: > On 9 Oct 2006, damien@repose.cx wrote: [...] >> spam-check-BBDB uses the following code to derive the address portion >> of an email: >> >> (setq who (nth 1 (gnus-extract-address-components who))) >> >> On an address like: >> >> "front@kumamoto.ark-hotel.co.jp"<front@kumamoto.ark-hotel.co.jp> >> >> this returns >> >> "front@kumamoto.ark-hotel.co.jp >> >> (note the leading quote) >> >> I fixed the problem by using the following line instead: >> >> (setq who (car (ietf-drums-parse-address who))) > > Thanks very much, Damien. I comitted the fix to spam.el, take a look > and let us know if it works better for you. We have `gnus-extract-address-components' (which uses the variable `gnus-extract-address-components'), `mail-extract-address-components' and `ietf-drums-parse-address': ,----[ <f1> v gnus-extract-address-components RET ] | gnus-extract-address-components is a variable defined in `gnus'. | Its value is | gnus-extract-address-components | | Documentation: | *Function for extracting address components from a From header. | Two pre-defined function exist: `gnus-extract-address-components', | which is the default, quite fast, and too simplistic solution, and | `mail-extract-address-components', which works much better, but is | slower. `---- I don't know what's the difference between `mail-extract-address-components' and `ietf-drums-parse-address' (speed, reliability, ...)? Anyone? It looks like the "@" in the `FULL-NAME' confused `gnus-extract-address-components'. Maybe we can also add a simple fix in `gnus-extract-address-components' for this quite common case. So I'm not sure if using `ietf-drums-parse-address' is the correct fix. Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-16 21:42 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) Reiner Steib @ 2006-10-16 22:47 ` Katsumi Yamaoka 2006-10-16 23:06 ` Miles Bader 2006-10-17 14:58 ` Ted Zlatanov 1 sibling, 1 reply; 11+ messages in thread From: Katsumi Yamaoka @ 2006-10-16 22:47 UTC (permalink / raw) >>>>> In <v98xjgvsxr.fsf_-_@marauder.physik.uni-ulm.de> >>>>> Reiner Steib wrote: > I don't know what's the difference between > `mail-extract-address-components' and `ietf-drums-parse-address' > (speed, reliability, ...)? Anyone? `ietf-drums-parse-address' doesn't work with non-ASCII names but `gnus-extract-address-components' does. So does `mail-extract-address-components' but it also performs a voodoo ceremony (which is currently disabled for Japanese names by default). (ietf-drums-parse-address "王ヶ頭ホテル <ougatou@example.com>") => ("ougatou@example.com") (gnus-extract-address-components "王ヶ頭ホテル <ougatou@example.com>") => ("王ヶ頭ホテル" "ougatou@example.com") (let ((mail-extr-disable-voodoo nil)) ;; Enable voodoo. (mail-extract-address-components "王ヶ頭ホテル <ougatou@example.com>")) => ("王. ヶ. 頭. ホテル" "ougatou@example.com") ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-16 22:47 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka @ 2006-10-16 23:06 ` Miles Bader 2006-10-16 23:58 ` Katsumi Yamaoka 0 siblings, 1 reply; 11+ messages in thread From: Miles Bader @ 2006-10-16 23:06 UTC (permalink / raw) Katsumi Yamaoka <yamaoka@jpl.org> writes: > (let ((mail-extr-disable-voodoo nil)) ;; Enable voodoo. > (mail-extract-address-components "王ヶ頭ホテル <ougatou@example.com>")) > => ("王. ヶ. 頭. ホテル" "ougatou@example.com") Why would one _want_ to enable the "voodoo" for Japanese names? [Indeed, why does this "voodoo" exist in the first place??? Even for western names, it seems kind of stupid... but clearly somebody went to a lot of trouble to implement it.] -Miles -- (\(\ (^.^) (")") *This is the cute bunny virus, please copy this into your sig so it can spread. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-16 23:06 ` Miles Bader @ 2006-10-16 23:58 ` Katsumi Yamaoka 0 siblings, 0 replies; 11+ messages in thread From: Katsumi Yamaoka @ 2006-10-16 23:58 UTC (permalink / raw) >>>>> In <8764ej273p.fsf@catnip.gol.com> >>>>> Miles Bader <miles@gnu.org> wrote: > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> (let ((mail-extr-disable-voodoo nil)) ;; Enable voodoo. >> (mail-extract-address-components "王ヶ頭ホテル <ougatou@example.com>")) >> => ("王. ヶ. 頭. ホテル" "ougatou@example.com") > Why would one _want_ to enable the "voodoo" for Japanese names? It was always enabled until the `mail-extr-disable-voodoo' variable was introduced two years ago, so it is easy to imagine that not a few people tripped on it. > [Indeed, why does this "voodoo" exist in the first place??? Even for > western names, it seems kind of stupid... but clearly somebody went to a > lot of trouble to implement it.] I believe such a basic function should be simple. It was once discussed in emacs-devel[1] (though at least Richard seemed not to be interested so much). [1] http://news.gmane.org/group/gmane.emacs.devel/thread=25937 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-16 21:42 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) Reiner Steib 2006-10-16 22:47 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka @ 2006-10-17 14:58 ` Ted Zlatanov 2006-10-28 10:22 ` Reiner Steib 1 sibling, 1 reply; 11+ messages in thread From: Ted Zlatanov @ 2006-10-17 14:58 UTC (permalink / raw) On 16 Oct 2006, reinersteib+from-uce@imap.cc wrote: > We have `gnus-extract-address-components' (which uses the variable > `gnus-extract-address-components'), `mail-extract-address-components' and > `ietf-drums-parse-address': ... > I don't know what's the difference between > `mail-extract-address-components' and `ietf-drums-parse-address' > (speed, reliability, ...)? Anyone? I looked at ietf-drums-parse-address and it looked right. But as you note, there are three ways to do the same thing. So I would unify everything under gnus-extract-address-components with an optional 'mail or 'ietf parameter or something like that (user should be able to override it, I think). Otherwise we'll just run into the same problem next year. > It looks like the "@" in the `FULL-NAME' confused > `gnus-extract-address-components'. Maybe we can also add a simple fix > in `gnus-extract-address-components' for this quite common case. That's fine, but it's still quick-and-dirty and for spam-check-BBDB for instance it's OK to be a little slower; getting bad results is much worse. Let me know. I honestly don't have a preference as long as the bug is fixed and everyone is happy. Ted ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-17 14:58 ` Ted Zlatanov @ 2006-10-28 10:22 ` Reiner Steib 2006-10-30 3:23 ` Katsumi Yamaoka 0 siblings, 1 reply; 11+ messages in thread From: Reiner Steib @ 2006-10-28 10:22 UTC (permalink / raw) Cc: Ted Zlatanov On Tue, Oct 17 2006, Ted Zlatanov wrote: > On 16 Oct 2006, reinersteib+from-uce@imap.cc wrote: >> It looks like the "@" in the `FULL-NAME' confused >> `gnus-extract-address-components'. Maybe we can also add a simple fix >> in `gnus-extract-address-components' for this quite common case. Here's a suggestion for such a fix. Does anyone see a problem with it? --8<---------------cut here---------------start------------->8--- --- gnus-util.el 04 Oct 2006 12:52:12 +0200 7.54 +++ gnus-util.el 28 Oct 2006 12:13:37 +0200 @@ -170,11 +170,16 @@ solution than `mail-extract-address-components', which works much better, but is slower." (let (name address) - ;; First find the address - the thing with the @ in it. This may - ;; not be accurate in mail addresses, but does the trick most of - ;; the time in news messages. - (when (string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from) - (setq address (substring from (match-beginning 0) (match-end 0)))) + ;; Special case: "foo@bar" <foo@bar>, i.e. one @ in the comment and one in + ;; the address. + (cond ((string-match "[ \t]*\"[^\"]+@[^\"]+\"" from) + (setq address (substring from (match-end 0)) + name "")) + ;; First find the address - the thing with the @ in it. This may + ;; not be accurate in mail addresses, but does the trick most of + ;; the time in news messages. + ((string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from) + (setq address (substring from (match-beginning 0) (match-end 0))))) ;; Then we check whether the "name <address>" format is used. (and address ;; Linear white space is not required. --8<---------------cut here---------------end--------------->8--- > That's fine, but it's still quick-and-dirty and for spam-check-BBDB > for instance it's OK to be a little slower; getting bad results is > much worse. > > Let me know. I honestly don't have a preference as long as the bug is > fixed and everyone is happy. If I understand correctly, using `ietf-drums-parse-address' works "correct" for the example in the bug report, but it worsens it in many other situations (sorry for abusing your name in the examples): ,---- | ELISP> (ietf-drums-parse-address "王ヶ頭ホテル <ougatou@ex.invalid>") | ("ougatou@ex.invalid") | ELISP> (ietf-drums-parse-address "Теодор Златанов <loc@ex.invalid>") | ("loc@ex.invalid") | ELISP> (ietf-drums-parse-address "Тéödór Äpfélmuß <loc@ex.invalid>") | ("loc@ex.invalid" . "dór pfélmuß") | ELISP> (gnus-extract-address-components "王ヶ頭ホテル <ougatou@ex.invalid>") | ("王ヶ頭ホテル" "ougatou@ex.invalid") | ELISP> (gnus-extract-address-components "Теодор Златанов <loc@ex.invalid>") | ("Теодор Златанов" "loc@ex.invalid") | ELISP> (gnus-extract-address-components "Тéödór Äpfélmuß <loc@ex.invalid>") | ("Тéödór Äpfélmuß" "loc@ex.invalid") `---- I'd suggest to install my change to `gnus-extract-address-components' and switch back to `gnus-extract-address-components' in `spam-*-BBDB'. Or make a defcustom in `spam.el' allowing the user to customize the function to use. Bye, Reiner. P.S.: I've fixed (capitalization and sentence end) some of your recent ChangeLogs entries [1]. Could you please fix the other entries (in trunk and v5-10) as well? Could you also fix the following problematic entry? ,---- | 2004-04-22 Teodor Zlatanov <tzz@lifelogs.com> | | FIXME: Make separate entries for each person. | | From Dan Christensen <jdc@uwo.ca>, asjo@koldfront.dk (Adam | Sjøgren), Wes Hardaker <wes@hardakers.net>, and Michael Shields | <shields@msrl.com>: `---- [1] http://article.gmane.org/gmane.emacs.gnus.commits/4848 -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-28 10:22 ` Reiner Steib @ 2006-10-30 3:23 ` Katsumi Yamaoka 2006-10-30 11:56 ` Reiner Steib 0 siblings, 1 reply; 11+ messages in thread From: Katsumi Yamaoka @ 2006-10-30 3:23 UTC (permalink / raw) Cc: Ted Zlatanov >>>>> In <v93b98oi3m.fsf@marauder.physik.uni-ulm.de> > Here's a suggestion for such a fix. Does anyone see a problem with > it? It doesn't strip brackets and whitespace: (gnus-extract-address-components "\"foo@bar\" <bar@baz>") => (nil " <bar@baz>") In addition, "foo@bar" might be a valid (nick)name of the sender. How about the following? --8<---------------cut here---------------start------------->8--- --- gnus-util.el~ 2006-10-03 06:40:00 +0000 +++ gnus-util.el 2006-10-30 03:20:42 +0000 @@ -173,8 +173,12 @@ ;; First find the address - the thing with the @ in it. This may ;; not be accurate in mail addresses, but does the trick most of ;; the time in news messages. - (when (string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from) - (setq address (substring from (match-beginning 0) (match-end 0)))) + (cond (;; Special case: "foo@bar" <foo@bar>, i.e. one @ in the comment + ;; and one in the address. + (string-match "<\\([^@ \t<>]+[!@][^@ \t<>]+\\)>" from) + (setq address (substring from (match-beginning 1) (match-end 1)))) + ((string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from) + (setq address (substring from (match-beginning 0) (match-end 0))))) ;; Then we check whether the "name <address>" format is used. (and address ;; Linear white space is not required. --8<---------------cut here---------------end--------------->8--- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-30 3:23 ` Katsumi Yamaoka @ 2006-10-30 11:56 ` Reiner Steib 2006-10-30 19:00 ` Ted Zlatanov 2006-11-14 17:21 ` Reiner Steib 0 siblings, 2 replies; 11+ messages in thread From: Reiner Steib @ 2006-10-30 11:56 UTC (permalink / raw) [ Resent. The first try (<news:v9ejsq16u9.fsf@marauder.physik.uni-ulm.de> posted via Gmane) didn't appear on gnus.org nor gmane.org yet. ] On Mon, Oct 30 2006, Katsumi Yamaoka wrote: >>>>>> In <v93b98oi3m.fsf@marauder.physik.uni-ulm.de> > >> Here's a suggestion for such a fix. Does anyone see a problem with >> it? > > It doesn't strip brackets and whitespace: > > (gnus-extract-address-components "\"foo@bar\" <bar@baz>") > => (nil " <bar@baz>") Oops. Thanks for checking. > In addition, "foo@bar" might be a valid (nick)name of the sender. > How about the following? Beside the comment (see [1]), it looks fine. I'd suggest to install it in trunk and v5-10 if nobody finds a problem with it. With this change to `gnus-extract-address-components', can we revert spam.el to revision 7.82 or do we need to include parts of the changes from revision 7.82 to 7.85? If so, we probably also need to backport these to v5-10. Bye, Reiner. [1] This comment... > + (cond (;; Special case: "foo@bar" <foo@bar>, i.e. one @ in the comment > + ;; and one in the address. ... doesn't match the code: > + (string-match "<\\([^@ \t<>]+[!@][^@ \t<>]+\\)>" from) > + (setq address (substring from (match-beginning 1) (match-end 1)))) -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-30 11:56 ` Reiner Steib @ 2006-10-30 19:00 ` Ted Zlatanov 2006-11-14 17:21 ` Reiner Steib 1 sibling, 0 replies; 11+ messages in thread From: Ted Zlatanov @ 2006-10-30 19:00 UTC (permalink / raw) On 30 Oct 2006, reinersteib+gmane@imap.cc wrote: > With this change to `gnus-extract-address-components', can we revert spam.el > to revision 7.82 or do we need to include parts of the changes from revision > 7.82 to 7.85? If so, we probably also need to backport these to v5-10. I would rather revert to the original version. Using g-e-a-c makes a lot more sense, as it was originally. Also, I had to build extra precautions because ietf-drums-parse-address broke on many addresses, which is just annoying. Ted ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-10-30 11:56 ` Reiner Steib 2006-10-30 19:00 ` Ted Zlatanov @ 2006-11-14 17:21 ` Reiner Steib 2006-11-14 23:41 ` Katsumi Yamaoka 1 sibling, 1 reply; 11+ messages in thread From: Reiner Steib @ 2006-11-14 17:21 UTC (permalink / raw) On Mon, Oct 30 2006, Reiner Steib wrote: > Beside the comment (see [1]), it looks fine. I'd suggest to install it in > trunk and v5-10 if nobody finds a problem with it. > > With this change to `gnus-extract-address-components', can we revert spam.el > to revision 7.82 or do we need to include parts of the changes from revision > 7.82 to 7.85? If so, we probably also need to backport these to v5-10. Now that Ted has reverted spam.el, I think we should also add your patch for `gnus-extract-address-components' (or did you encounter any problems with it in the meantime?). Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components 2006-11-14 17:21 ` Reiner Steib @ 2006-11-14 23:41 ` Katsumi Yamaoka 0 siblings, 0 replies; 11+ messages in thread From: Katsumi Yamaoka @ 2006-11-14 23:41 UTC (permalink / raw) >>>>> In <v9irhi3pwq.fsf@marauder.physik.uni-ulm.de> >>>>> Reiner Steib wrote: > Now that Ted has reverted spam.el, I think we should also add your > patch for `gnus-extract-address-components' (or did you encounter any > problems with it in the meantime?). I've installed the change of <b9ypscazduf.fsf@jpl.org>. I think it can be regarded as a simple bug fix, so I've done in the v5-10 branch as well. Regards, ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-11-14 23:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <87mz851t59.fsf@mobile.repose.cx> [not found] ` <g69hcy4t8g0.fsf@lifelogs.com> 2006-10-16 21:42 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) Reiner Steib 2006-10-16 22:47 ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka 2006-10-16 23:06 ` Miles Bader 2006-10-16 23:58 ` Katsumi Yamaoka 2006-10-17 14:58 ` Ted Zlatanov 2006-10-28 10:22 ` Reiner Steib 2006-10-30 3:23 ` Katsumi Yamaoka 2006-10-30 11:56 ` Reiner Steib 2006-10-30 19:00 ` Ted Zlatanov 2006-11-14 17:21 ` Reiner Steib 2006-11-14 23:41 ` Katsumi Yamaoka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).