Gnus development mailing list
 help / color / mirror / Atom feed
* ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB)
       [not found] ` <g69hcy4t8g0.fsf@lifelogs.com>
@ 2006-10-16 21:42   ` Reiner Steib
  2006-10-16 22:47     ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka
  2006-10-17 14:58     ` Ted Zlatanov
  0 siblings, 2 replies; 11+ messages in thread
From: Reiner Steib @ 2006-10-16 21:42 UTC (permalink / raw)
  Cc: bugs, Ted Zlatanov

[ Let's shift this to ding.  MFT set to ding. ]

On Mon, Oct 16 2006, Ted Zlatanov wrote:

> On  9 Oct 2006, damien@repose.cx wrote:
[...]
>> spam-check-BBDB uses the following code to derive the address portion
>> of an email:
>>
>> 	  (setq who (nth 1 (gnus-extract-address-components who)))
>>
>> On an address like:
>>
>> "front@kumamoto.ark-hotel.co.jp"<front@kumamoto.ark-hotel.co.jp>
>>
>> this returns
>>
>> "front@kumamoto.ark-hotel.co.jp
>>
>> (note the leading quote)
>>
>> I fixed the problem by using the following line instead:
>>
>> 	  (setq who (car (ietf-drums-parse-address who)))
>
> Thanks very much, Damien.  I comitted the fix to spam.el, take a look
> and let us know if it works better for you.

We have `gnus-extract-address-components' (which uses the variable
`gnus-extract-address-components'), `mail-extract-address-components' and
`ietf-drums-parse-address':

,----[ <f1> v gnus-extract-address-components RET ]
| gnus-extract-address-components is a variable defined in `gnus'.
| Its value is 
| gnus-extract-address-components
| 
| Documentation:
| *Function for extracting address components from a From header.
| Two pre-defined function exist: `gnus-extract-address-components',
| which is the default, quite fast, and too simplistic solution, and
| `mail-extract-address-components', which works much better, but is
| slower.
`----

I don't know what's the difference between
`mail-extract-address-components' and `ietf-drums-parse-address'
(speed, reliability, ...)?  Anyone?

It looks like the "@" in the `FULL-NAME' confused
`gnus-extract-address-components'.  Maybe we can also add a simple fix
in `gnus-extract-address-components' for this quite common case.

So I'm not sure if using `ietf-drums-parse-address' is the correct
fix.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-16 21:42   ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) Reiner Steib
@ 2006-10-16 22:47     ` Katsumi Yamaoka
  2006-10-16 23:06       ` Miles Bader
  2006-10-17 14:58     ` Ted Zlatanov
  1 sibling, 1 reply; 11+ messages in thread
From: Katsumi Yamaoka @ 2006-10-16 22:47 UTC (permalink / raw)


>>>>> In <v98xjgvsxr.fsf_-_@marauder.physik.uni-ulm.de>
>>>>>	Reiner Steib wrote:

> I don't know what's the difference between
> `mail-extract-address-components' and `ietf-drums-parse-address'
> (speed, reliability, ...)?  Anyone?

`ietf-drums-parse-address' doesn't work with non-ASCII names but
`gnus-extract-address-components' does.  So does
`mail-extract-address-components' but it also performs a voodoo
ceremony (which is currently disabled for Japanese names by
default).

(ietf-drums-parse-address "王ヶ頭ホテル <ougatou@example.com>")
 => ("ougatou@example.com")

(gnus-extract-address-components "王ヶ頭ホテル <ougatou@example.com>")
 => ("王ヶ頭ホテル" "ougatou@example.com")

(let ((mail-extr-disable-voodoo nil)) ;; Enable voodoo.
  (mail-extract-address-components "王ヶ頭ホテル <ougatou@example.com>"))
 => ("王. ヶ. 頭. ホテル" "ougatou@example.com")



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-16 22:47     ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka
@ 2006-10-16 23:06       ` Miles Bader
  2006-10-16 23:58         ` Katsumi Yamaoka
  0 siblings, 1 reply; 11+ messages in thread
From: Miles Bader @ 2006-10-16 23:06 UTC (permalink / raw)


Katsumi Yamaoka <yamaoka@jpl.org> writes:
> (let ((mail-extr-disable-voodoo nil)) ;; Enable voodoo.
>   (mail-extract-address-components "王ヶ頭ホテル <ougatou@example.com>"))
>  => ("王. ヶ. 頭. ホテル" "ougatou@example.com")

Why would one _want_ to enable the "voodoo" for Japanese names?

[Indeed, why does this "voodoo" exist in the first place???  Even for
western names, it seems kind of stupid... but clearly somebody went to a
lot of trouble to implement it.]

-Miles

-- 
(\(\
(^.^)
(")")
*This is the cute bunny virus, please copy this into your sig so it can spread.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-16 23:06       ` Miles Bader
@ 2006-10-16 23:58         ` Katsumi Yamaoka
  0 siblings, 0 replies; 11+ messages in thread
From: Katsumi Yamaoka @ 2006-10-16 23:58 UTC (permalink / raw)


>>>>> In <8764ej273p.fsf@catnip.gol.com>
>>>>>	Miles Bader <miles@gnu.org> wrote:

> Katsumi Yamaoka <yamaoka@jpl.org> writes:

>> (let ((mail-extr-disable-voodoo nil)) ;; Enable voodoo.
>>   (mail-extract-address-components "王ヶ頭ホテル <ougatou@example.com>"))
>>  => ("王. ヶ. 頭. ホテル" "ougatou@example.com")

> Why would one _want_ to enable the "voodoo" for Japanese names?

It was always enabled until the `mail-extr-disable-voodoo'
variable was introduced two years ago, so it is easy to imagine
that not a few people tripped on it.

> [Indeed, why does this "voodoo" exist in the first place???  Even for
> western names, it seems kind of stupid... but clearly somebody went to a
> lot of trouble to implement it.]

I believe such a basic function should be simple.  It was once
discussed in emacs-devel[1] (though at least Richard seemed not
to be interested so much).

[1] http://news.gmane.org/group/gmane.emacs.devel/thread=25937



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-16 21:42   ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) Reiner Steib
  2006-10-16 22:47     ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka
@ 2006-10-17 14:58     ` Ted Zlatanov
  2006-10-28 10:22       ` Reiner Steib
  1 sibling, 1 reply; 11+ messages in thread
From: Ted Zlatanov @ 2006-10-17 14:58 UTC (permalink / raw)


On 16 Oct 2006, reinersteib+from-uce@imap.cc wrote:

> We have `gnus-extract-address-components' (which uses the variable
> `gnus-extract-address-components'), `mail-extract-address-components' and
> `ietf-drums-parse-address':
...
> I don't know what's the difference between
> `mail-extract-address-components' and `ietf-drums-parse-address'
> (speed, reliability, ...)?  Anyone?

I looked at ietf-drums-parse-address and it looked right.  But as you
note, there are three ways to do the same thing.  So I would unify
everything under gnus-extract-address-components with an optional
'mail or 'ietf parameter or something like that (user should be able
to override it, I think).  Otherwise we'll just run into the same
problem next year.

> It looks like the "@" in the `FULL-NAME' confused
> `gnus-extract-address-components'.  Maybe we can also add a simple fix
> in `gnus-extract-address-components' for this quite common case.

That's fine, but it's still quick-and-dirty and for spam-check-BBDB
for instance it's OK to be a little slower; getting bad results is
much worse.

Let me know.  I honestly don't have a preference as long as the bug is
fixed and everyone is happy.

Ted



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-17 14:58     ` Ted Zlatanov
@ 2006-10-28 10:22       ` Reiner Steib
  2006-10-30  3:23         ` Katsumi Yamaoka
  0 siblings, 1 reply; 11+ messages in thread
From: Reiner Steib @ 2006-10-28 10:22 UTC (permalink / raw)
  Cc: Ted Zlatanov

On Tue, Oct 17 2006, Ted Zlatanov wrote:

> On 16 Oct 2006, reinersteib+from-uce@imap.cc wrote:
>> It looks like the "@" in the `FULL-NAME' confused
>> `gnus-extract-address-components'.  Maybe we can also add a simple fix
>> in `gnus-extract-address-components' for this quite common case.

Here's a suggestion for such a fix.  Does anyone see a problem with
it?

--8<---------------cut here---------------start------------->8---
--- gnus-util.el	04 Oct 2006 12:52:12 +0200	7.54
+++ gnus-util.el	28 Oct 2006 12:13:37 +0200	
@@ -170,11 +170,16 @@
 solution than `mail-extract-address-components', which works much better, but
 is slower."
   (let (name address)
-    ;; First find the address - the thing with the @ in it.  This may
-    ;; not be accurate in mail addresses, but does the trick most of
-    ;; the time in news messages.
-    (when (string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from)
-      (setq address (substring from (match-beginning 0) (match-end 0))))
+    ;; Special case: "foo@bar" <foo@bar>, i.e. one @ in the comment and one in
+    ;; the address.
+    (cond ((string-match "[ \t]*\"[^\"]+@[^\"]+\"" from)
+	   (setq address (substring from (match-end 0))
+		 name ""))
+	  ;; First find the address - the thing with the @ in it.  This may
+	  ;; not be accurate in mail addresses, but does the trick most of
+	  ;; the time in news messages.
+	  ((string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from)
+	   (setq address (substring from (match-beginning 0) (match-end 0)))))
     ;; Then we check whether the "name <address>" format is used.
     (and address
 	 ;; Linear white space is not required.
--8<---------------cut here---------------end--------------->8---

> That's fine, but it's still quick-and-dirty and for spam-check-BBDB
> for instance it's OK to be a little slower; getting bad results is
> much worse.
>
> Let me know.  I honestly don't have a preference as long as the bug is
> fixed and everyone is happy.

If I understand correctly, using `ietf-drums-parse-address' works
"correct" for the example in the bug report, but it worsens it in many
other situations (sorry for abusing your name in the examples):

,----
| ELISP> (ietf-drums-parse-address "王ヶ頭ホテル <ougatou@ex.invalid>")
| ("ougatou@ex.invalid")
| ELISP> (ietf-drums-parse-address "Теодор Златанов <loc@ex.invalid>")
| ("loc@ex.invalid")
| ELISP> (ietf-drums-parse-address "Тéödór Äpfélmuß <loc@ex.invalid>")
| ("loc@ex.invalid" . "dór pfélmuß")
| ELISP> (gnus-extract-address-components "王ヶ頭ホテル <ougatou@ex.invalid>")
| ("王ヶ頭ホテル" "ougatou@ex.invalid")
| ELISP> (gnus-extract-address-components "Теодор Златанов <loc@ex.invalid>")
| ("Теодор Златанов" "loc@ex.invalid")
| ELISP> (gnus-extract-address-components "Тéödór Äpfélmuß <loc@ex.invalid>")
| ("Тéödór Äpfélmuß" "loc@ex.invalid")
`----

I'd suggest to install my change to `gnus-extract-address-components'
and switch back to `gnus-extract-address-components' in `spam-*-BBDB'.
Or make a defcustom in `spam.el' allowing the user to customize the
function to use.

Bye, Reiner.

P.S.: I've fixed (capitalization and sentence end) some of your recent
      ChangeLogs entries [1].  Could you please fix the other entries
      (in trunk and v5-10) as well?  Could you also fix the following
      problematic entry?

,----
| 2004-04-22  Teodor Zlatanov  <tzz@lifelogs.com>
| 
| 	FIXME: Make separate entries for each person.
| 
| 	From Dan Christensen <jdc@uwo.ca>, asjo@koldfront.dk (Adam
| 	Sjøgren), Wes Hardaker <wes@hardakers.net>, and Michael Shields
| 	<shields@msrl.com>:
`----

[1] http://article.gmane.org/gmane.emacs.gnus.commits/4848
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-28 10:22       ` Reiner Steib
@ 2006-10-30  3:23         ` Katsumi Yamaoka
  2006-10-30 11:56           ` Reiner Steib
  0 siblings, 1 reply; 11+ messages in thread
From: Katsumi Yamaoka @ 2006-10-30  3:23 UTC (permalink / raw)
  Cc: Ted Zlatanov

>>>>> In <v93b98oi3m.fsf@marauder.physik.uni-ulm.de>

> Here's a suggestion for such a fix.  Does anyone see a problem with
> it?

It doesn't strip brackets and whitespace:

(gnus-extract-address-components "\"foo@bar\" <bar@baz>")
 => (nil " <bar@baz>")

In addition, "foo@bar" might be a valid (nick)name of the sender.
How about the following?

--8<---------------cut here---------------start------------->8---
--- gnus-util.el~	2006-10-03 06:40:00 +0000
+++ gnus-util.el	2006-10-30 03:20:42 +0000
@@ -173,8 +173,12 @@
     ;; First find the address - the thing with the @ in it.  This may
     ;; not be accurate in mail addresses, but does the trick most of
     ;; the time in news messages.
-    (when (string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from)
-      (setq address (substring from (match-beginning 0) (match-end 0))))
+    (cond (;; Special case: "foo@bar" <foo@bar>, i.e. one @ in the comment
+	   ;; and one in the address.
+	   (string-match "<\\([^@ \t<>]+[!@][^@ \t<>]+\\)>" from)
+	   (setq address (substring from (match-beginning 1) (match-end 1))))
+	  ((string-match "\\b[^@ \t<>]+[!@][^@ \t<>]+\\b" from)
+	   (setq address (substring from (match-beginning 0) (match-end 0)))))
     ;; Then we check whether the "name <address>" format is used.
     (and address
 	 ;; Linear white space is not required.
--8<---------------cut here---------------end--------------->8---



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-30  3:23         ` Katsumi Yamaoka
@ 2006-10-30 11:56           ` Reiner Steib
  2006-10-30 19:00             ` Ted Zlatanov
  2006-11-14 17:21             ` Reiner Steib
  0 siblings, 2 replies; 11+ messages in thread
From: Reiner Steib @ 2006-10-30 11:56 UTC (permalink / raw)


[ Resent.  The first try
  (<news:v9ejsq16u9.fsf@marauder.physik.uni-ulm.de> posted via Gmane)
  didn't appear on gnus.org nor gmane.org yet. ]

On Mon, Oct 30 2006, Katsumi Yamaoka wrote:

>>>>>> In <v93b98oi3m.fsf@marauder.physik.uni-ulm.de>
>
>> Here's a suggestion for such a fix.  Does anyone see a problem with
>> it?
>
> It doesn't strip brackets and whitespace:
>
> (gnus-extract-address-components "\"foo@bar\" <bar@baz>")
>  => (nil " <bar@baz>")

Oops.  Thanks for checking.

> In addition, "foo@bar" might be a valid (nick)name of the sender.
> How about the following?

Beside the comment (see [1]), it looks fine.  I'd suggest to install it in
trunk and v5-10 if nobody finds a problem with it.

With this change to `gnus-extract-address-components', can we revert spam.el
to revision 7.82 or do we need to include parts of the changes from revision
7.82 to 7.85?  If so, we probably also need to backport these to v5-10.

Bye, Reiner.

[1] This comment...

> +    (cond (;; Special case: "foo@bar" <foo@bar>, i.e. one @ in the comment
> +	   ;; and one in the address.

... doesn't match the code:

> +	   (string-match "<\\([^@ \t<>]+[!@][^@ \t<>]+\\)>" from)
> +	   (setq address (substring from (match-beginning 1) (match-end 1))))
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-30 11:56           ` Reiner Steib
@ 2006-10-30 19:00             ` Ted Zlatanov
  2006-11-14 17:21             ` Reiner Steib
  1 sibling, 0 replies; 11+ messages in thread
From: Ted Zlatanov @ 2006-10-30 19:00 UTC (permalink / raw)


On 30 Oct 2006, reinersteib+gmane@imap.cc wrote:

> With this change to `gnus-extract-address-components', can we revert spam.el
> to revision 7.82 or do we need to include parts of the changes from revision
> 7.82 to 7.85?  If so, we probably also need to backport these to v5-10.

I would rather revert to the original version.  Using g-e-a-c makes a
lot more sense, as it was originally.  Also, I had to build extra
precautions because ietf-drums-parse-address broke on many addresses,
which is just annoying.

Ted



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-10-30 11:56           ` Reiner Steib
  2006-10-30 19:00             ` Ted Zlatanov
@ 2006-11-14 17:21             ` Reiner Steib
  2006-11-14 23:41               ` Katsumi Yamaoka
  1 sibling, 1 reply; 11+ messages in thread
From: Reiner Steib @ 2006-11-14 17:21 UTC (permalink / raw)


On Mon, Oct 30 2006, Reiner Steib wrote:

> Beside the comment (see [1]), it looks fine.  I'd suggest to install it in
> trunk and v5-10 if nobody finds a problem with it.
>
> With this change to `gnus-extract-address-components', can we revert spam.el
> to revision 7.82 or do we need to include parts of the changes from revision
> 7.82 to 7.85?  If so, we probably also need to backport these to v5-10.

Now that Ted has reverted spam.el, I think we should also add your
patch for `gnus-extract-address-components' (or did you encounter any
problems with it in the meantime?).

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components
  2006-11-14 17:21             ` Reiner Steib
@ 2006-11-14 23:41               ` Katsumi Yamaoka
  0 siblings, 0 replies; 11+ messages in thread
From: Katsumi Yamaoka @ 2006-11-14 23:41 UTC (permalink / raw)


>>>>> In <v9irhi3pwq.fsf@marauder.physik.uni-ulm.de>
>>>>>	Reiner Steib wrote:

> Now that Ted has reverted spam.el, I think we should also add your
> patch for `gnus-extract-address-components' (or did you encounter any
> problems with it in the meantime?).

I've installed the change of <b9ypscazduf.fsf@jpl.org>.  I think
it can be regarded as a simple bug fix, so I've done in the v5-10
branch as well.

Regards,



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-11-14 23:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87mz851t59.fsf@mobile.repose.cx>
     [not found] ` <g69hcy4t8g0.fsf@lifelogs.com>
2006-10-16 21:42   ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components (was: bug in spam-check-BBDB) Reiner Steib
2006-10-16 22:47     ` ietf-drums-parse-address, gnus-extract-address-components, mail-extract-address-components Katsumi Yamaoka
2006-10-16 23:06       ` Miles Bader
2006-10-16 23:58         ` Katsumi Yamaoka
2006-10-17 14:58     ` Ted Zlatanov
2006-10-28 10:22       ` Reiner Steib
2006-10-30  3:23         ` Katsumi Yamaoka
2006-10-30 11:56           ` Reiner Steib
2006-10-30 19:00             ` Ted Zlatanov
2006-11-14 17:21             ` Reiner Steib
2006-11-14 23:41               ` Katsumi Yamaoka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).