Gnus development mailing list
 help / color / mirror / Atom feed
* Korean chracters with nnmail-split-methods
@ 2002-01-15 11:57 Jinhyok Heo
  2002-01-15 14:08 ` Kai Großjohann
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jinhyok Heo @ 2002-01-15 11:57 UTC (permalink / raw)


----
(setq nnmail-split-methods
      `(
	("SPAM"                  "^Subject:.*광고")
        ("SPAM"                  "^Subject:.*홍보")))
----

In short, above split rule doesn't work.

광고 and 홍보 is Korean words, which mean advertisement, spam. :(

I tried to filter out those spams with above split rule, but failed.
Of cource, the spams' Subject headers were not encoded with character
set.

-- 
| Jinhyok Heo (novembre @ ournature.org || http://ournature.org/~novembre/)
|--------------------------------------------------------------------------
| "We are still reaching for the sky. In the developed countries people
|  are coming back down, saying, `It's empty up there.'" --- a Ladakhi monk



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 11:57 Korean chracters with nnmail-split-methods Jinhyok Heo
@ 2002-01-15 14:08 ` Kai Großjohann
  2002-01-15 14:20   ` Jinhyok Heo
  2002-01-15 14:14 ` Oystein Viggen
  2002-01-19 23:22 ` Lars Magne Ingebrigtsen
  2 siblings, 1 reply; 19+ messages in thread
From: Kai Großjohann @ 2002-01-15 14:08 UTC (permalink / raw)
  Cc: ding

Jinhyok Heo <novembreN0$PAM@ournature.org> writes:

> I tried to filter out those spams with above split rule, but failed.
> Of cource, the spams' Subject headers were not encoded with character
> set.

Gnus doesn't decode before splitting for performance reasons.  Can
you put in variants for the various encodings?

kai
-- 
Simplification good!  Oversimplification bad!  (Larry Wall)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 11:57 Korean chracters with nnmail-split-methods Jinhyok Heo
  2002-01-15 14:08 ` Kai Großjohann
@ 2002-01-15 14:14 ` Oystein Viggen
  2002-01-15 14:38   ` Kai Großjohann
  2002-01-19 23:22 ` Lars Magne Ingebrigtsen
  2 siblings, 1 reply; 19+ messages in thread
From: Oystein Viggen @ 2002-01-15 14:14 UTC (permalink / raw)
  Cc: ding

* [Jinhyok Heo] 

> ----
> (setq nnmail-split-methods
>       `(
> 	("SPAM"                  "^Subject:.*±¤°í")
>         ("SPAM"                  "^Subject:.*È«º¸")))
> ----

I've had great success with the following procmail recipe:

:0 fhw
^Subject:.*=\?ks_c_5601-1987\?
| formail -A"X-Dustemikler: spam"

Then I can split on "X-Dustemikler: spam" in nnmail-split-methods.

Oystein
-- 
When in doubt: Flaunt.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:08 ` Kai Großjohann
@ 2002-01-15 14:20   ` Jinhyok Heo
  2002-01-15 14:31     ` Henrik Enberg
  2002-01-15 14:33     ` Niklas Morberg
  0 siblings, 2 replies; 19+ messages in thread
From: Jinhyok Heo @ 2002-01-15 14:20 UTC (permalink / raw)


>>>>> "KG" == Kai Gro^[-Aß^[$)Cjohann <Kai.Grossjohann@CS.Uni-Dortmund.DE> writes:

    KG> Gnus doesn't decode before splitting for performance reasons.
    KG> Can you put in variants for the various encodings?

Some misunderstanding.

Subjects of spams are not usually encoded, so Gnus doesn't have to
decode them before splitting, I think.

-- 
| Jinhyok Heo (novembre @ ournature.org || http://ournature.org/~novembre/)
|--------------------------------------------------------------------------
| "We are still reaching for the sky. In the developed countries people
|  are coming back down, saying, `It's empty up there.'" --- a Ladakhi monk



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:20   ` Jinhyok Heo
@ 2002-01-15 14:31     ` Henrik Enberg
  2002-01-15 14:33     ` Niklas Morberg
  1 sibling, 0 replies; 19+ messages in thread
From: Henrik Enberg @ 2002-01-15 14:31 UTC (permalink / raw)
  Cc: ding

Jinhyok Heo <novembreN0$PAM@ournature.org> writes:

> Subjects of spams are not usually encoded, so Gnus doesn't have to
> decode them before splitting, I think.

I use the following nnmail-split-fancy rule with success.

        ("from\\|subject" ".*\\?ks_c_5601-1987" "mail.spam")

Henrik
-- 
I know the human being and fish can coexist peacefully.
		-- George W. Bush



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:20   ` Jinhyok Heo
  2002-01-15 14:31     ` Henrik Enberg
@ 2002-01-15 14:33     ` Niklas Morberg
  2002-01-19 23:18       ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 19+ messages in thread
From: Niklas Morberg @ 2002-01-15 14:33 UTC (permalink / raw)
  Cc: ding

I'm also having problems with this, see:

<http://www.gnus.org/list-archives/ding/200111/msg00526.html>
and
<http://www.gnus.org/list-archives/ding/200112/msg00420.html>

I've sent Lars the output from the lisp debugger (I had to
use breakpoints since the debugger was never entered --
maybe `Search failed' is not considered an error?).

Niklas





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:14 ` Oystein Viggen
@ 2002-01-15 14:38   ` Kai Großjohann
  2002-01-15 14:52     ` Henrik Enberg
  0 siblings, 1 reply; 19+ messages in thread
From: Kai Großjohann @ 2002-01-15 14:38 UTC (permalink / raw)
  Cc: Jinhyok Heo, ding

Oystein Viggen <oysteivi@tihlde.org> writes:

> I've had great success with the following procmail recipe:
>
> :0 fhw
> ^Subject:.*=\?ks_c_5601-1987\?
> | formail -A"X-Dustemikler: spam"

I have this vague feeling that a Korean like Jinhyok might get
legitimate mails with Korean subjects.  Therefore, it is probably not
a good idea for him (?) to delete all Korean mails.

kai
-- 
Simplification good!  Oversimplification bad!  (Larry Wall)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:38   ` Kai Großjohann
@ 2002-01-15 14:52     ` Henrik Enberg
  2002-01-19  4:51       ` Jinhyok Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Enberg @ 2002-01-15 14:52 UTC (permalink / raw)
  Cc: Oystein Viggen, Jinhyok Heo, ding

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Oystein Viggen <oysteivi@tihlde.org> writes:
>
>> I've had great success with the following procmail recipe:
>>
>> :0 fhw
>> ^Subject:.*=\?ks_c_5601-1987\?
>> | formail -A"X-Dustemikler: spam"
>
> I have this vague feeling that a Korean like Jinhyok might get
> legitimate mails with Korean subjects.  Therefore, it is probably not
> a good idea for him (?) to delete all Korean mails.

But doesn't legitimate senders normally encode their mail properly?  It
seems that most spammers fail to do so.

Henrik
-- 
If a person doesn't have the capacity that we all want that person to
have, I suspect hope is in the far distant future, if at all.
		-- George W. Bush




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:52     ` Henrik Enberg
@ 2002-01-19  4:51       ` Jinhyok Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Jinhyok Heo @ 2002-01-19  4:51 UTC (permalink / raw)


>>>>> "HE" == Henrik Enberg <henrik@enberg.org> writes:

    HE> But doesn't legitimate senders normally encode their mail
    HE> properly?  It seems that most spammers fail to do so.

You're right in case of me. Most spamers who bug me don't encode their
"Subject"s.

-- 
| Jinhyok Heo (novembre @ ournature.org || http://ournature.org/~novembre/)
|--------------------------------------------------------------------------
| "We are still reaching for the sky. In the developed countries people
|  are coming back down, saying, `It's empty up there.'" --- a Ladakhi monk



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 14:33     ` Niklas Morberg
@ 2002-01-19 23:18       ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 19+ messages in thread
From: Lars Magne Ingebrigtsen @ 2002-01-19 23:18 UTC (permalink / raw)


Niklas Morberg <niklas.morberg@axis.com> writes:

> I've sent Lars the output from the lisp debugger (I had to
> use breakpoints since the debugger was never entered --
> maybe `Search failed' is not considered an error?).

You probably have to set `debug-ignored-errors' to nil for it to give
you a backtrace on all errors.

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-15 11:57 Korean chracters with nnmail-split-methods Jinhyok Heo
  2002-01-15 14:08 ` Kai Großjohann
  2002-01-15 14:14 ` Oystein Viggen
@ 2002-01-19 23:22 ` Lars Magne Ingebrigtsen
  2002-01-20  9:37   ` Jinhyok Heo
  2 siblings, 1 reply; 19+ messages in thread
From: Lars Magne Ingebrigtsen @ 2002-01-19 23:22 UTC (permalink / raw)


Jinhyok Heo <novembreN0$PAM@ournature.org> writes:

> ----
> (setq nnmail-split-methods
>       `(
> 	("SPAM"                  "^Subject:.*광고")
>         ("SPAM"                  "^Subject:.*홍보")))
> ----
>
> In short, above split rule doesn't work.

If you put one of these spams into a buffer and try to
`(re-search-forward "^Subject:.*광고")', do you get any matches?

My guess is that Gnus has inhibited all the external-to-internal
parsing things, so "광고" won't match anything.  For it to match, Gnus
would have to (in addition to doing the MIME stuff, which (in the case
of spammers isn't usually necessary)), Gnus has to do the MULE thing.

I'm surprised that nobody has seen this bug before.  Or perhaps they
have, and are as non-plussed as to what to do about it as I am.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-19 23:22 ` Lars Magne Ingebrigtsen
@ 2002-01-20  9:37   ` Jinhyok Heo
  2002-01-20 15:36     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Jinhyok Heo @ 2002-01-20  9:37 UTC (permalink / raw)


>>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

    LMI> If you put one of these spams into a buffer and try to
    LMI> `(re-search-forward "^Subject:.*광고")', do you get any matches?

Yes, that finds "광고".

Any idea?

-- 
| Jinhyok Heo (novembre @ ournature.org || http://ournature.org/~novembre/)
|--------------------------------------------------------------------------
| "We are still reaching for the sky. In the developed countries people
|  are coming back down, saying, `It's empty up there.'" --- a Ladakhi monk



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-20  9:37   ` Jinhyok Heo
@ 2002-01-20 15:36     ` Lars Magne Ingebrigtsen
  2002-01-21  2:37       ` Jinhyok Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Lars Magne Ingebrigtsen @ 2002-01-20 15:36 UTC (permalink / raw)


Jinhyok Heo <novembreN0$PAM@ournature.org> writes:

>     LMI> If you put one of these spams into a buffer and try to
>     LMI> `(re-search-forward "^Subject:.*광고")', do you get any matches?
>
> Yes, that finds "광고".
>
> Any idea?

You could try setting `nnmail-incoming-coding-system' to a coding
system that will correctly read/convert Korean external encoding into
internal encoding.  That would be `euc-kr', I guess...

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-20 15:36     ` Lars Magne Ingebrigtsen
@ 2002-01-21  2:37       ` Jinhyok Heo
  2002-01-21  2:41         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Jinhyok Heo @ 2002-01-21  2:37 UTC (permalink / raw)


>>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

    LMI> You could try setting `nnmail-incoming-coding-system' to a
    LMI> coding system that will correctly read/convert Korean
    LMI> external encoding into internal encoding.  That would be
    LMI> `euc-kr', I guess...

When I set n-i-c-s to euc-kr as you have suggested, my Korean mails
are saved in a wrong way. Characters are currupted.

All Korean letters are saved all with '~'.

-- 
| Jinhyok Heo (novembre @ ournature.org || http://ournature.org/~novembre/)
|--------------------------------------------------------------------------
| "We are still reaching for the sky. In the developed countries people
|  are coming back down, saying, `It's empty up there.'" --- a Ladakhi monk



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-21  2:37       ` Jinhyok Heo
@ 2002-01-21  2:41         ` Lars Magne Ingebrigtsen
  2002-01-21  2:55           ` Jinhyok Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Lars Magne Ingebrigtsen @ 2002-01-21  2:41 UTC (permalink / raw)


Jinhyok Heo <novembreN0$PAM@ournature.org> writes:

> When I set n-i-c-s to euc-kr as you have suggested, my Korean mails
> are saved in a wrong way. Characters are currupted.

So that's definitely not the way to go.  Hm.  How about if you used
the external encoding in the split rules instead?

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-21  2:41         ` Lars Magne Ingebrigtsen
@ 2002-01-21  2:55           ` Jinhyok Heo
  2002-01-21 19:44             ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Jinhyok Heo @ 2002-01-21  2:55 UTC (permalink / raw)


>>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

    LMI> Jinhyok Heo <novembreN0$PAM@ournature.org> writes:
    >  When I set n-i-c-s to euc-kr as you have suggested, my Korean
    >  mails are saved in a wrong way. Characters are currupted.

    LMI> So that's definitely not the way to go.  Hm.  How about if
    LMI> you used the external encoding in the split rules instead?

Could you tell me more how I can do that?

-- 
| Jinhyok Heo (novembre @ ournature.org || http://ournature.org/~novembre/)
|--------------------------------------------------------------------------
| "We are still reaching for the sky. In the developed countries people
|  are coming back down, saying, `It's empty up there.'" --- a Ladakhi monk



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-21  2:55           ` Jinhyok Heo
@ 2002-01-21 19:44             ` Lars Magne Ingebrigtsen
  2002-01-22  9:40               ` Kai Großjohann
  0 siblings, 1 reply; 19+ messages in thread
From: Lars Magne Ingebrigtsen @ 2002-01-21 19:44 UTC (permalink / raw)


Jinhyok Heo <novembreN0$PAM@ournature.org> writes:

>     LMI> So that's definitely not the way to go.  Hm.  How about if
>     LMI> you used the external encoding in the split rules instead?
>
> Could you tell me more how I can do that?

If you open a file containing that word without doing any decoding of
the file, you'll get something pretty binary-looking that you can put
in the split rules.  I think.

But that's probably the wrong approach, anyway.

nnmail does this when doing the splitting:

	;; Copy the headers into the work buffer.
	(insert-buffer-substring obuf beg end)
	;; Fold continuation lines.
	(goto-char (point-min))
	(while (re-search-forward "\\(\r?\n[ \t]+\\)+" nil t)
	  (replace-match " " t t))
       ...

So if we did optional parsing/decoding of the headers after they were
put into the work buffer, that should probably do the trick?  The
headers should be rfc2047-decoded and default charset stuff should be
allowed to be established.

Anybody got any comments on this?  Is this a reasonable approach?
          
-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-21 19:44             ` Lars Magne Ingebrigtsen
@ 2002-01-22  9:40               ` Kai Großjohann
  2002-01-26 22:09                 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 19+ messages in thread
From: Kai Großjohann @ 2002-01-22  9:40 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> So if we did optional parsing/decoding of the headers after they were
> put into the work buffer, that should probably do the trick?

I think that would DTRT.  Not that I'm an expert on this...

kai
-- 
Simplification good!  Oversimplification bad!  (Larry Wall)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Korean chracters with nnmail-split-methods
  2002-01-22  9:40               ` Kai Großjohann
@ 2002-01-26 22:09                 ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 19+ messages in thread
From: Lars Magne Ingebrigtsen @ 2002-01-26 22:09 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

>> So if we did optional parsing/decoding of the headers after they were
>> put into the work buffer, that should probably do the trick?
>
> I think that would DTRT.  Not that I'm an expert on this...

Ok; I've now added this.  Beware!  If your mail ends up in the wrong
place, give a shout.

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-01-26 22:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-01-15 11:57 Korean chracters with nnmail-split-methods Jinhyok Heo
2002-01-15 14:08 ` Kai Großjohann
2002-01-15 14:20   ` Jinhyok Heo
2002-01-15 14:31     ` Henrik Enberg
2002-01-15 14:33     ` Niklas Morberg
2002-01-19 23:18       ` Lars Magne Ingebrigtsen
2002-01-15 14:14 ` Oystein Viggen
2002-01-15 14:38   ` Kai Großjohann
2002-01-15 14:52     ` Henrik Enberg
2002-01-19  4:51       ` Jinhyok Heo
2002-01-19 23:22 ` Lars Magne Ingebrigtsen
2002-01-20  9:37   ` Jinhyok Heo
2002-01-20 15:36     ` Lars Magne Ingebrigtsen
2002-01-21  2:37       ` Jinhyok Heo
2002-01-21  2:41         ` Lars Magne Ingebrigtsen
2002-01-21  2:55           ` Jinhyok Heo
2002-01-21 19:44             ` Lars Magne Ingebrigtsen
2002-01-22  9:40               ` Kai Großjohann
2002-01-26 22:09                 ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).