Gnus development mailing list
 help / color / mirror / Atom feed
* scoring on subject
@ 2004-06-17 13:43 Dirk Meyer
  2004-06-17 17:05 ` Ted Zlatanov
  2004-06-17 20:22 ` Jesper Harder
  0 siblings, 2 replies; 5+ messages in thread
From: Dirk Meyer @ 2004-06-17 13:43 UTC (permalink / raw)


Hi,

I want to filter some spam from Japan in newsgroups. Even if it's not
spam, I can't read it. I could score on the content-type:

("head"
  ("Content-Type: text/plain; charset=\"ISO-2022-JP\"" -1000 nil S))

That works, but because of the "head" scoring becomes _very_
slow. Scoring on the subject is easy. As example, a subject like this: 

Subject: =?iso-2022-jp?b?GyRCN2MwQhsoQlBDGyRCJT0lVSVIRExITiEjNHw0VjhCGyhC?=
	=?iso-2022-jp?b?GyRCRGokSSRsJEckYhsoQjUbJEJLXCRHGyhCMTUwMDA=?=
	=?iso-2022-jp?b?GyRCISobKEI=?=

can be detected as spam by scanning for iso-2022-jp. But Gnus
translated it into the correct character set, so scoring doesn't work
anymore. How can I score on the subject as it is in the mail?


Dischi

-- 
panic ("No CPUs found.  System halted.\n");
	2.4.3 linux/arch/parisc/kernel/setup.c





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scoring on subject
  2004-06-17 13:43 scoring on subject Dirk Meyer
@ 2004-06-17 17:05 ` Ted Zlatanov
  2004-06-17 20:22 ` Jesper Harder
  1 sibling, 0 replies; 5+ messages in thread
From: Ted Zlatanov @ 2004-06-17 17:05 UTC (permalink / raw)
  Cc: ding

On Thu, 17 Jun 2004, dischi@tzi.de wrote:

> Subject: =?iso-2022-jp?b?GyRCN2MwQhsoQlBDGyRCJT0lVSVIRExITiEjNHw0VjhCGyhC?=
> 	=?iso-2022-jp?b?GyRCRGokSSRsJEckYhsoQjUbJEJLXCRHGyhCMTUwMDA=?=
> 	=?iso-2022-jp?b?GyRCISobKEI=?=
> 
> can be detected as spam by scanning for iso-2022-jp. But Gnus
> translated it into the correct character set, so scoring doesn't work
> anymore. How can I score on the subject as it is in the mail?

Maybe what you really need is to find out the character set of a
subject string, and that should be the scoring test
(subject-encoding)?  I think Gnus should keep presenting the subject
string in its decoded form, because the only thing you really need out
of the example above is the encoding.  The rest is useless without
decoding.

Ted



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scoring on subject
  2004-06-17 13:43 scoring on subject Dirk Meyer
  2004-06-17 17:05 ` Ted Zlatanov
@ 2004-06-17 20:22 ` Jesper Harder
  2004-06-17 23:10   ` Katsumi Yamaoka
  1 sibling, 1 reply; 5+ messages in thread
From: Jesper Harder @ 2004-06-17 20:22 UTC (permalink / raw)


Dirk Meyer <dischi@tzi.de> writes:

> Subject: =?iso-2022-jp?b?GyRCN2MwQhsoQlBDGyRCJT0lVSVIRExITiEjNHw0VjhCGyhC?=
> 	=?iso-2022-jp?b?GyRCRGokSSRsJEckYhsoQjUbJEJLXCRHGyhCMTUwMDA=?=
> 	=?iso-2022-jp?b?GyRCISobKEI=?=
>
> can be detected as spam by scanning for iso-2022-jp. But Gnus
> translated it into the correct character set, so scoring doesn't work
> anymore. How can I score on the subject as it is in the mail?

Toggle `nnmail-mail-splitting-decodes'.  Or maybe better use regexp
categories, e.g. \cj matches Japanese.

-- 
Jesper Harder                                <http://purl.org/harder/>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scoring on subject
  2004-06-17 20:22 ` Jesper Harder
@ 2004-06-17 23:10   ` Katsumi Yamaoka
  2004-06-18 14:44     ` Dirk Meyer
  0 siblings, 1 reply; 5+ messages in thread
From: Katsumi Yamaoka @ 2004-06-17 23:10 UTC (permalink / raw)


>>>>> In <m3llim53z4.fsf@defun.localdomain>
>>>>>	Jesper Harder <harder@ifa.au.dk> wrote:

> Dirk Meyer <dischi@tzi.de> writes:

>> Subject: =?iso-2022-jp?b?GyRCN2MwQhsoQlBDGyRCJT0lVSVIRExITiEjNHw0VjhCGyhC?=
>> 	=?iso-2022-jp?b?GyRCRGokSSRsJEckYhsoQjUbJEJLXCRHGyhCMTUwMDA=?=
>> 	=?iso-2022-jp?b?GyRCISobKEI=?=

That's really a spam, selling illegally copied softwares.

> Toggle `nnmail-mail-splitting-decodes'.  Or maybe better use regexp
> categories, e.g. \cj matches Japanese.

And also \cc, \ch and \cy can be used for Chinese, Korean and
Cyrillic respectively (note that \ should be quoted as "\\").
You can find them in the international/characters.el file.

BTW, since I cannot read those languages similarly, I'm using
the following procmail filters:

:0
* ^Content-Type:.+charset=\"?(\
default_charset\
|euc-kr\
|gb2312(_charset)?\
|ks_c_5601-1987\
|windows-125[12]\
)\"?
/dev/null

:0
* (^From:|^Subject:).*=\?(\
big5\
|euc-kr\
|gb2312\
|ks_c_5601-1987\
|windows-125[12]\
)\?
/dev/null
-- 
Katsumi Yamaoka <yamaoka@jpl.org>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: scoring on subject
  2004-06-17 23:10   ` Katsumi Yamaoka
@ 2004-06-18 14:44     ` Dirk Meyer
  0 siblings, 0 replies; 5+ messages in thread
From: Dirk Meyer @ 2004-06-18 14:44 UTC (permalink / raw)


Katsumi Yamaoka wrote:
>>>>>> In <m3llim53z4.fsf@defun.localdomain>
>>>>>>	Jesper Harder <harder@ifa.au.dk> wrote:
>
>> Dirk Meyer <dischi@tzi.de> writes:
>
>>> Subject: =?iso-2022-jp?b?GyRCN2MwQhsoQlBDGyRCJT0lVSVIRExITiEjNHw0VjhCGyhC?=
>>> 	=?iso-2022-jp?b?GyRCRGokSSRsJEckYhsoQjUbJEJLXCRHGyhCMTUwMDA=?=
>>> 	=?iso-2022-jp?b?GyRCISobKEI=?=
>
> That's really a spam, selling illegally copied softwares.
>
>> Toggle `nnmail-mail-splitting-decodes'.  Or maybe better use regexp
>> categories, e.g. \cj matches Japanese.
>
> And also \cc, \ch and \cy can be used for Chinese, Korean and
> Cyrillic respectively (note that \ should be quoted as "\\").
> You can find them in the international/characters.el file.

Thanks, that's what I was looking for. Works great. 

> BTW, since I cannot read those languages similarly, I'm using
> the following procmail filters:

I have procmail filters, too, but I need some _fast_ filter for
newsgroups. 


Dischi

-- 
Monday is an awful way to spend 1/7th of your life.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-06-18 14:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-17 13:43 scoring on subject Dirk Meyer
2004-06-17 17:05 ` Ted Zlatanov
2004-06-17 20:22 ` Jesper Harder
2004-06-17 23:10   ` Katsumi Yamaoka
2004-06-18 14:44     ` Dirk Meyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).