Gnus development mailing list
 help / color / mirror / Atom feed
* Stack overflow: Stack overflow in regexp matcher
@ 2010-07-31 14:03 Adam Sjøgren
  2010-07-31 22:02 ` Adam Sjøgren
  2010-08-01  5:25 ` Adam Sjøgren
  0 siblings, 2 replies; 11+ messages in thread
From: Adam Sjøgren @ 2010-07-31 14:03 UTC (permalink / raw)
  To: xemacs-beta; +Cc: ding

  Hi.


The last couple of days I have been getting this message from XEmacs:

  "Stack overflow: Stack overflow in regexp matcher"

Usually when checking for new email in Gnus.

If I just try (g) again, the message doesn't re-appear.

It is rather curious. Has anyone else experienced this? I don't think I
have changed my Gnus-configuration recently.


Versions:

 XEmacs 21.5  (beta29) "garbanzo" f3eca926258e+ [Lucid] (x86_64-pc-linux, Mule) of Mon Jul 26 2010 on topper
 No Gnus v0.11 (f24a49038102c4566d5f8ba1ce1ff2156cff6ed5)


  Best regards,

    Adam

-- 
 "Omvejviser"                                                 Adam Sjøgren
                                                         asjo@koldfront.dk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Stack overflow: Stack overflow in regexp matcher
  2010-07-31 14:03 Stack overflow: Stack overflow in regexp matcher Adam Sjøgren
@ 2010-07-31 22:02 ` Adam Sjøgren
  2010-08-01  5:25 ` Adam Sjøgren
  1 sibling, 0 replies; 11+ messages in thread
From: Adam Sjøgren @ 2010-07-31 22:02 UTC (permalink / raw)
  To: xemacs-beta; +Cc: ding

On Sat, 31 Jul 2010 16:03:14 +0200, Adam wrote:

> The last couple of days I have been getting this message from XEmacs:

>   "Stack overflow: Stack overflow in regexp matcher"

Also, I just noticed that the XEmacs process had grown to use 3.2 GB of
RAM (since this morning).


  Best regards,

    Adam

-- 
 "I hate dancing. So, this, to me, is a living                Adam Sjøgren
  embodyment of hell."                                   asjo@koldfront.dk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Stack overflow: Stack overflow in regexp matcher
  2010-07-31 14:03 Stack overflow: Stack overflow in regexp matcher Adam Sjøgren
  2010-07-31 22:02 ` Adam Sjøgren
@ 2010-08-01  5:25 ` Adam Sjøgren
  2010-08-01  7:55   ` Double encoding in ~/Mail/active Adam Sjøgren
  1 sibling, 1 reply; 11+ messages in thread
From: Adam Sjøgren @ 2010-08-01  5:25 UTC (permalink / raw)
  To: xemacs-beta; +Cc: ding

On Sat, 31 Jul 2010 16:03:14 +0200, Adam wrote:

> The last couple of days I have been getting this message from XEmacs:

>   "Stack overflow: Stack overflow in regexp matcher"

> Usually when checking for new email in Gnus.

I just saw that my ~/Mail/active file had grown to 259KB - mostly
because of the line for my nnml:hystenstræde group, which was repeating

  hyskenstrÃÂÂÂ...‚‚¦de 25 1 y

for a total line length of 131090 characters.

I have tried removing the line from the active file and recreating it by
regenerating the nnml: server, but I haven't succeeded in reproducing
the problem.

I guess if anything it is related to a recent change in Gnus, so
removing xemacs-beta Followup-To.


  Best regards,

    Adam

-- 
 "We get our thursdays from a banana."                        Adam Sjøgren
                                                         asjo@koldfront.dk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Double encoding in ~/Mail/active
  2010-08-01  5:25 ` Adam Sjøgren
@ 2010-08-01  7:55   ` Adam Sjøgren
  2010-08-01 10:58     ` Katsumi Yamaoka
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Sjøgren @ 2010-08-01  7:55 UTC (permalink / raw)
  To: ding; +Cc: Katsumi Yamaoka

Ok, I have gotten further with this. After a regenerate on the nnml:
server, my nnml:hyskenstræde group is listed like this in ~/Mail/active:

  $ grep hysken Mail/active | hexdump -C
  00000000  68 79 73 6b 65 6e 73 74  72 c3 a6 64 65 20 32 36  |hyskenstr..de 26|
  00000010  20 31 20 79 0a                                    | 1 y.|
  00000015
  $

UTF-8 encoded 'æ'; looks correct.

After receiving an email in Gnus, this happens:

  $ grep hysken Mail/active | hexdump -C
  00000000  68 79 73 6b 65 6e 73 74  72 c3 83 c2 a6 64 65 20  |hyskenstr....de |
  00000010  32 36 20 31 20 79 0a                              |26 1 y.|
  00000017
  $ 

Double encoding!

Everytime I 'g'et new email, I get an extra level of encoding. I.e. two
fetches of email later, this is what it looks like:

  $ grep hysken Mail/active | hexdump -C
  00000000  68 79 73 6b 65 6e 73 74  72 c3 83 c2 83 c3 82 c2  |hyskenstr.......|
  00000010  83 c3 83 c2 82 c3 82 c2  a6 64 65 20 32 36 20 31  |.........de 26 1|
  00000020  20 79 0a                                          | y.|
  00000023
  $ 

This is with XEmacs 21.5 (beta29) "garbanzo" bc6a5c7f4128+ and No Gnus
v0.11 (latest git).

If I revert f24a49038102c4566d5f8ba1ce1ff2156cff6ed5 the double, triple,
etc. encoding doesn't happen.

It looks like something is amiss with that latest commit?


  Best regards,

    Adam

-- 
 "We get our thursdays from a banana."                        Adam Sjøgren
                                                         asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-01  7:55   ` Double encoding in ~/Mail/active Adam Sjøgren
@ 2010-08-01 10:58     ` Katsumi Yamaoka
  2010-08-01 11:13       ` Adam Sjøgren
  0 siblings, 1 reply; 11+ messages in thread
From: Katsumi Yamaoka @ 2010-08-01 10:58 UTC (permalink / raw)
  To: ding

asjo@koldfront.dk (Adam Sjøgren) wrote:
[...]
> Double encoding!
[...]
> It looks like something is amiss with that latest commit?

Sorry for enbugging.  I'm going to fix it asap.  Maybe I have to
make it encode nnml group names conditionally.  Thread concerning
the change I made last begins with:

http://groups.google.co.jp/group/gnu.emacs.gnus/browse_thread/thread/13545d8dfc4f0b47



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-01 10:58     ` Katsumi Yamaoka
@ 2010-08-01 11:13       ` Adam Sjøgren
  2010-08-02  0:14         ` Katsumi Yamaoka
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Sjøgren @ 2010-08-01 11:13 UTC (permalink / raw)
  To: ding

On Sun, 01 Aug 2010 19:58:26 +0900, Katsumi wrote:

> Sorry for enbugging.  I'm going to fix it asap.

Great!

> Maybe I have to make it encode nnml group names conditionally.

I didn't quite understand the changes you made (but that is not saying
much, I am a elisp newbie still).

Doesn nnmail-active-file-coding-system come in to play here somehow?


  Best regards,

    Adam

-- 
 "The laws of perspective have been repealed!                 Adam Sjøgren
  Objects no longer diminish in size with distance!"     asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-01 11:13       ` Adam Sjøgren
@ 2010-08-02  0:14         ` Katsumi Yamaoka
  2010-08-02 21:43           ` Adam Sjøgren
  0 siblings, 1 reply; 11+ messages in thread
From: Katsumi Yamaoka @ 2010-08-02  0:14 UTC (permalink / raw)
  To: ding

Adam Sjøgren wrote:
> On Sun, 01 Aug 2010 19:58:26 +0900, Katsumi wrote:
>> Sorry for enbugging.  I'm going to fix it asap.
> Great!
>> Maybe I have to make it encode nnml group names conditionally.

Fixed.  I must apologize to XEmacs users who keep track of the
latest Gnus.  If you have the nnml groups of which the names
contain non-ASCII characters, you want to fix the ~/Mail/active
file, and possibly the ~/.newsrc.eld file, manually.  Beat me!

The purpose of the change I did last was to make sure the names
of groups, that nnmail-split creates[1], in nnml-group-alist are
encoded.  However, it encoded not only newly created group names
but also existing ones.  It happens only in XEmacs.  In Emacs,
the double encoding is not happen even if using encode-coding-string
twice.

(decode-coding-string
 (encode-coding-string
  (encode-coding-string "hyskenstræde" 'utf-8)
  'utf-8)
 'utf-8)
 => "hyskenstræde"

This is why I didn't notice the ill effect of the last change.

[1] The case where the regexps in the splitting rules have
 something like "\\1" that is replaced with the actual text.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-02  0:14         ` Katsumi Yamaoka
@ 2010-08-02 21:43           ` Adam Sjøgren
  2010-08-02 23:21             ` Katsumi Yamaoka
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Sjøgren @ 2010-08-02 21:43 UTC (permalink / raw)
  To: ding

On Mon, 02 Aug 2010 09:14:08 +0900, Katsumi wrote:

> It happens only in XEmacs. In Emacs, the double encoding is not happen
> even if using encode-coding-string twice.

Is this a GNU Emacs or an XEmacs bug? I guess it should be reported?

> (decode-coding-string
>  (encode-coding-string
>   (encode-coding-string "hyskenstræde" 'utf-8)
>   'utf-8)
>  'utf-8)
>  => "hyskenstræde"

> This is why I didn't notice the ill effect of the last change.

That was a tricky one; thanks for fixing it!


  Best regards,

     Adam

-- 
 "The laws of perspective have been repealed!                 Adam Sjøgren
  Objects no longer diminish in size with distance!"     asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-02 21:43           ` Adam Sjøgren
@ 2010-08-02 23:21             ` Katsumi Yamaoka
  2010-08-03  5:54               ` Adam Sjøgren
  0 siblings, 1 reply; 11+ messages in thread
From: Katsumi Yamaoka @ 2010-08-02 23:21 UTC (permalink / raw)
  To: ding

Adam Sjøgren wrote:
> On Mon, 02 Aug 2010 09:14:08 +0900, Katsumi wrote:
>> It happens only in XEmacs. In Emacs, the double encoding is not happen
>> even if using encode-coding-string twice.
> Is this a GNU Emacs or an XEmacs bug? I guess it should be reported?

No, it's my fault. ;-p
I forgot the existence of the `nnmail-group-names-not-encoded-p'
variable that I made for group names nnmail-split creates.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-02 23:21             ` Katsumi Yamaoka
@ 2010-08-03  5:54               ` Adam Sjøgren
  2010-08-04  0:38                 ` Katsumi Yamaoka
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Sjøgren @ 2010-08-03  5:54 UTC (permalink / raw)
  To: ding

On Tue, 03 Aug 2010 08:21:17 +0900, Katsumi wrote:

> Adam Sjøgren wrote:

>> On Mon, 02 Aug 2010 09:14:08 +0900, Katsumi wrote:

>>> It happens only in XEmacs. In Emacs, the double encoding is not
>>> happen even if using encode-coding-string twice.

>> Is this a GNU Emacs or an XEmacs bug? I guess it should be reported?

> No, it's my fault. ;-p

But isn't it inconsistent "in a bad way" that GNU Emacs won't double
encode a string and XEmacs will? Wouldn't it be better if both emacsen
behaved the same?

I.e. your example:

 (decode-coding-string
  (encode-coding-string
   (encode-coding-string "hyskenstræde" 'utf-8)
   'utf-8)
  'utf-8)

ought to give either "hyskenstræde" or "hyskenstræde" on both emacsen?

(I don't know which of the behaviors would be correct, though).


  Best regards,

    Adam

-- 
 "The laws of perspective have been repealed!                 Adam Sjøgren
  Objects no longer diminish in size with distance!"     asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Double encoding in ~/Mail/active
  2010-08-03  5:54               ` Adam Sjøgren
@ 2010-08-04  0:38                 ` Katsumi Yamaoka
  0 siblings, 0 replies; 11+ messages in thread
From: Katsumi Yamaoka @ 2010-08-04  0:38 UTC (permalink / raw)
  To: ding

Adam Sjøgren wrote:
> On Tue, 03 Aug 2010 08:21:17 +0900, Katsumi wrote:
[...]
>>> Is this a GNU Emacs or an XEmacs bug? I guess it should be reported?
>> No, it's my fault. ;-p

> But isn't it inconsistent "in a bad way" that GNU Emacs won't double
> encode a string and XEmacs will? Wouldn't it be better if both emacsen
> behaved the same?

> I.e. your example:

>  (decode-coding-string
>   (encode-coding-string
>    (encode-coding-string "hyskenstræde" 'utf-8)
>    'utf-8)
>   'utf-8)

> ought to give either "hyskenstræde" or "hyskenstræde" on both emacsen?

First of all, I believe what is bad is a source code that does
the double encoding (I made ;-).

I don't know exactly how Emacs decides whether to encode an encoded
string but I guess the point is the multibyteness or other.  In
Emacs, an encoded string is always a unibyte string, that is not
worth encoding.  Though the double encoding happens in Emacs 21
and 22, and the following form returns the one that is the same
as that of XEmacs:

(decode-coding-string
 (encode-coding-string
  (string-make-multibyte
   (encode-coding-string "hyskenstræde" 'utf-8))
  'utf-8)
 'utf-8)

But as for Emacs 23 and greater (i.e. Unicode Emacsen), the double
encoding doesn't happen even for that form.  Those Emacsen may
examine what the data are.

OTOH, there is no concept of the multibyteness in XEmacs.  Making
XEmacs behave like the recent Emacsen may possibly mean making it
follow a long journey that Emacs trudged along.  And I think it
goes for nothing.  Again, the bad was me. ;-)



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-04  0:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-31 14:03 Stack overflow: Stack overflow in regexp matcher Adam Sjøgren
2010-07-31 22:02 ` Adam Sjøgren
2010-08-01  5:25 ` Adam Sjøgren
2010-08-01  7:55   ` Double encoding in ~/Mail/active Adam Sjøgren
2010-08-01 10:58     ` Katsumi Yamaoka
2010-08-01 11:13       ` Adam Sjøgren
2010-08-02  0:14         ` Katsumi Yamaoka
2010-08-02 21:43           ` Adam Sjøgren
2010-08-02 23:21             ` Katsumi Yamaoka
2010-08-03  5:54               ` Adam Sjøgren
2010-08-04  0:38                 ` Katsumi Yamaoka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).