Gnus development mailing list
 help / color / mirror / Atom feed
From: Reiner Steib <reinersteib+gmane@imap.cc>
Cc: emacs-pretest-bug@gnu.org, Alexandre Oliva <lxoliva@fsfla.org>,
	ding@gnus.org
Subject: Re: Slow operations on buffers of tens of megabytes
Date: Mon, 06 Nov 2006 10:21:39 +0100	[thread overview]
Message-ID: <v9psc1q6ak.fsf@marauder.physik.uni-ulm.de> (raw)
In-Reply-To: <b4mmz75gljp.fsf@jpl.org> (Katsumi Yamaoka's message of "Mon\, 06 Nov 2006 15\:02\:18 +0900")

On Mon, Nov 06 2006, Katsumi Yamaoka wrote:

>>>>>> In <E1GgwcW-0000OJ-Pp@fencepost.gnu.org> Richard Stallman wrote:
>
>>     Scoring of the messages closer to the beginning of the buffer is fast,
>>     but as we move to higher-numbered messages, that are closer to the end
>>     of such big files/buffers, gnus will only score 2-3 messages per
>>     minute, and that's what kills performance.
[...]
> (setq gnus-article-button-face nil
>       gnus-signature-face nil
>       gnus-summary-selected-face nil
>       gnus-treat-highlight-citation nil
>       gnus-treat-emphasize nil)
>
> If it makes Gnus fast, improving the performance will be worth
> trying.  However, I didn't feel any difference, though it might
> be because I don't have huge mail folders.

I don't think this matches the problem description.  When scanning big
mbox files, article display isn't involved.  Or am I missing
something?

My guess is that it's problem with case-fold-search when searching for
"X-Gnus-Article-Number" in mbox files in Emacs 22 as analyzed by Elias
Oltmanns back in June:

,----[ http://thread.gmane.org/gmane.emacs.devel/53901/focus=54013 ]
| From: Elias Oltmanns <oltmanns <at> uni-bonn.de>
| Subject: Re: New buffer-case-table makes search_buffer painfully slow
| Newsgroups: gmane.emacs.devel
| Date: 2006-05-06 19:10:08 GMT
| 
| Elias Oltmanns <oltmanns <at> uni-bonn.de> wrote:
| > Hi all,
| >
| > switching from emacs 21 to emacs 22 has a very significant performance
| > impact on packages that make heavy use of search_buffer. An example
| > that actually made me aware of this problem is gnus processing large
| > mbox files. Further analysis of this problem revealed that in emacs 22
| > an "i" in the search string makes search_buffer use simple_search()
| > instead of boyer_moore(). 
| 
| Emacs 22's EQUIVALENCES table relates i, and thus I as well, to two
| more characters with character codes 331857 and 331856. On
| www.unicode.org the character look up engine couldn't find a match for
| U+51051 or U+51050 saying that most likely those codes weren't
| assigned to any characters yet.
| 
| So, here is a plain question: Is there a bug in the case-table in
| emacs 22 or does the search engine on www.unicode.org for some reason
| miss certain character ranges? Slightly biassed, I'm disregarding the
| possibility of me being unable to use www.unicode.org properly, which,
| in fact, might well be the reason for my confusion.
| 
| Second question: If the case-table was right, what would be the right
| way to tacle the problem described in my original post? For me the
| following snippet in .emacs solves the problem:
| --- ~/.emacs ---
| (unless (< emacs-major-version 22)
|   (set-case-syntax 331856 "w" (standard-case-table))
|   (set-case-syntax 331857 "w" (standard-case-table)))
| --- ~/.emacs ---
| 
| This, of course, is a durty hack and I'm wondering whether emacs
| should provide a feature to "clean up" the EQUIVALENCES table in the
| ascii range in order to avoid falling back to a slow search
| algorithm when we are searching for pure ascii strings. Or do you
| think that packages like gnus which make heavy use of
| re-search-forward should handle these performance issues
| themselves---or indeed the users.
`----

Alexandre, could you please try if the hack suggested by Elias makes
your problem go away?

Richard proposed a fix for this, but AFAICS, this has not been
implemented:

,----[ http://thread.gmane.org/gmane.emacs.devel/53901/focus=54025 ]
| From: Richard Stallman <rms <at> gnu.org>
| Subject: Re: New buffer-case-table makes search_buffer painfully slow
| Newsgroups: gmane.emacs.devel
| Date: 2006-05-07 05:01:27 GMT
|
| I think this has to do with the special characters for Turkish,
| lower-case i without dot and upper-case I with dot.  In Turkish,
| upcasing and downcasing preserve the dot, or the absence of the dot.
| 
| I think these lines in characters.el are the cause of the problem.
| 
|   (set-downcase-syntax  ?? ?i tbl)
|   (set-upcase-syntax    ?I ?? tbl)
| 
| They set up only half of what Turkish needs.
| They make dotless-i upcase into I, and they make
| I-with-dot downcase into i.  They can't do vice versa
| because that would break things for other languages.
| So they are not really useful.  We could simply delete them.
| 
| We could also add a minor mode to set up the case table all the way
| for Turkish.
| 
| Would someone like to do that?
`----

Looking at the ChangeLog, it seems that the relevant code in
`characters.el' ...

,----[ international/characters.el ]
| ;; In some languages, U+0049 LATIN CAPITAL LETTER I and U+0131 LATIN
| ;; SMALL LETTER DOTLESS I make a case pair, and so do U+0130 LATIN
| ;; CAPITAL LETTER I WITH DOT ABOVE and U+0069 LATIN SMALL LETTER I.
| ;; Thus we have to check language-environment to handle casing
| ;; correctly.  Currently only I<->i is available.
| [...] 
|   (set-downcase-syntax  ?İ ?i tbl)
|   (set-upcase-syntax    ?I ?ı tbl)
`----

... has been changed back and forth several times:

,----[ ChangeLog ]
| 2005-04-01  Kenichi Handa  <handa@m17n.org>
| 
| 	* international/characters.el: Enable the correct case setting for
| 	dotless-i and dotted-I.
| 
| 2005-02-02  Kenichi Handa  <handa@m17n.org>
| 
| 	* international/characters.el: Cancel previous change for
| 	I-WITH-DOT-ABOVE and DOTLESS-i.
| 
| 2005-02-02  Kenichi Handa  <handa@m17n.org>
| 
| 	* international/latin-5.el (tbl): Setup cases of I-WITH-DOT-ABOVE,
| 	DOTLESS-i.
| 
| 	* international/characters.el: Setup cases of GREEK-FINAL-SIGMA,
| 	Y-WITH-DIAERESIS, I-WITH-DOT-ABOVE, DOTLESS-i.
`----

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

  reply	other threads:[~2006-11-06  9:21 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-05  5:37 Alexandre Oliva
2006-11-06  5:02 ` Richard Stallman
2006-11-06  6:02   ` Katsumi Yamaoka
2006-11-06  9:21     ` Reiner Steib [this message]
2006-11-06 20:00       ` Alexandre Oliva
2006-11-07 14:13         ` Reiner Steib
2006-11-08 14:43           ` Reiner Steib
2006-11-09 22:00             ` Alexandre Oliva
2006-11-10 18:42               ` Richard Stallman
2006-11-11  0:37                 ` Reiner Steib
2006-11-13 16:40                   ` Kevin Rodgers
2006-11-14 12:26                     ` Richard Stallman
2006-11-13 17:28               ` Reiner Steib
2006-11-19  9:49                 ` Elias Oltmanns
2006-11-20 12:59                   ` Richard Stallman
2006-11-20 18:22                     ` Elias Oltmanns
2006-11-21  7:47                       ` Richard Stallman
2006-11-21  8:18                         ` Kenichi Handa
2006-11-22 13:15                           ` Richard Stallman
2006-11-12  5:14       ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=v9psc1q6ak.fsf@marauder.physik.uni-ulm.de \
    --to=reinersteib+gmane@imap.cc \
    --cc=Reiner.Steib@gmx.de \
    --cc=ding@gnus.org \
    --cc=emacs-pretest-bug@gnu.org \
    --cc=lxoliva@fsfla.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).