From: Reiner Steib <reinersteib+gmane@imap.cc>
Cc: emacs-pretest-bug@gnu.org, Alexandre Oliva <lxoliva@fsfla.org>,
ding@gnus.org
Subject: Re: Slow operations on buffers of tens of megabytes
Date: Mon, 06 Nov 2006 10:21:39 +0100 [thread overview]
Message-ID: <v9psc1q6ak.fsf@marauder.physik.uni-ulm.de> (raw)
In-Reply-To: <b4mmz75gljp.fsf@jpl.org> (Katsumi Yamaoka's message of "Mon\, 06 Nov 2006 15\:02\:18 +0900")
On Mon, Nov 06 2006, Katsumi Yamaoka wrote:
>>>>>> In <E1GgwcW-0000OJ-Pp@fencepost.gnu.org> Richard Stallman wrote:
>
>> Scoring of the messages closer to the beginning of the buffer is fast,
>> but as we move to higher-numbered messages, that are closer to the end
>> of such big files/buffers, gnus will only score 2-3 messages per
>> minute, and that's what kills performance.
[...]
> (setq gnus-article-button-face nil
> gnus-signature-face nil
> gnus-summary-selected-face nil
> gnus-treat-highlight-citation nil
> gnus-treat-emphasize nil)
>
> If it makes Gnus fast, improving the performance will be worth
> trying. However, I didn't feel any difference, though it might
> be because I don't have huge mail folders.
I don't think this matches the problem description. When scanning big
mbox files, article display isn't involved. Or am I missing
something?
My guess is that it's problem with case-fold-search when searching for
"X-Gnus-Article-Number" in mbox files in Emacs 22 as analyzed by Elias
Oltmanns back in June:
,----[ http://thread.gmane.org/gmane.emacs.devel/53901/focus=54013 ]
| From: Elias Oltmanns <oltmanns <at> uni-bonn.de>
| Subject: Re: New buffer-case-table makes search_buffer painfully slow
| Newsgroups: gmane.emacs.devel
| Date: 2006-05-06 19:10:08 GMT
|
| Elias Oltmanns <oltmanns <at> uni-bonn.de> wrote:
| > Hi all,
| >
| > switching from emacs 21 to emacs 22 has a very significant performance
| > impact on packages that make heavy use of search_buffer. An example
| > that actually made me aware of this problem is gnus processing large
| > mbox files. Further analysis of this problem revealed that in emacs 22
| > an "i" in the search string makes search_buffer use simple_search()
| > instead of boyer_moore().
|
| Emacs 22's EQUIVALENCES table relates i, and thus I as well, to two
| more characters with character codes 331857 and 331856. On
| www.unicode.org the character look up engine couldn't find a match for
| U+51051 or U+51050 saying that most likely those codes weren't
| assigned to any characters yet.
|
| So, here is a plain question: Is there a bug in the case-table in
| emacs 22 or does the search engine on www.unicode.org for some reason
| miss certain character ranges? Slightly biassed, I'm disregarding the
| possibility of me being unable to use www.unicode.org properly, which,
| in fact, might well be the reason for my confusion.
|
| Second question: If the case-table was right, what would be the right
| way to tacle the problem described in my original post? For me the
| following snippet in .emacs solves the problem:
| --- ~/.emacs ---
| (unless (< emacs-major-version 22)
| (set-case-syntax 331856 "w" (standard-case-table))
| (set-case-syntax 331857 "w" (standard-case-table)))
| --- ~/.emacs ---
|
| This, of course, is a durty hack and I'm wondering whether emacs
| should provide a feature to "clean up" the EQUIVALENCES table in the
| ascii range in order to avoid falling back to a slow search
| algorithm when we are searching for pure ascii strings. Or do you
| think that packages like gnus which make heavy use of
| re-search-forward should handle these performance issues
| themselves---or indeed the users.
`----
Alexandre, could you please try if the hack suggested by Elias makes
your problem go away?
Richard proposed a fix for this, but AFAICS, this has not been
implemented:
,----[ http://thread.gmane.org/gmane.emacs.devel/53901/focus=54025 ]
| From: Richard Stallman <rms <at> gnu.org>
| Subject: Re: New buffer-case-table makes search_buffer painfully slow
| Newsgroups: gmane.emacs.devel
| Date: 2006-05-07 05:01:27 GMT
|
| I think this has to do with the special characters for Turkish,
| lower-case i without dot and upper-case I with dot. In Turkish,
| upcasing and downcasing preserve the dot, or the absence of the dot.
|
| I think these lines in characters.el are the cause of the problem.
|
| (set-downcase-syntax ?? ?i tbl)
| (set-upcase-syntax ?I ?? tbl)
|
| They set up only half of what Turkish needs.
| They make dotless-i upcase into I, and they make
| I-with-dot downcase into i. They can't do vice versa
| because that would break things for other languages.
| So they are not really useful. We could simply delete them.
|
| We could also add a minor mode to set up the case table all the way
| for Turkish.
|
| Would someone like to do that?
`----
Looking at the ChangeLog, it seems that the relevant code in
`characters.el' ...
,----[ international/characters.el ]
| ;; In some languages, U+0049 LATIN CAPITAL LETTER I and U+0131 LATIN
| ;; SMALL LETTER DOTLESS I make a case pair, and so do U+0130 LATIN
| ;; CAPITAL LETTER I WITH DOT ABOVE and U+0069 LATIN SMALL LETTER I.
| ;; Thus we have to check language-environment to handle casing
| ;; correctly. Currently only I<->i is available.
| [...]
| (set-downcase-syntax ?İ ?i tbl)
| (set-upcase-syntax ?I ?ı tbl)
`----
... has been changed back and forth several times:
,----[ ChangeLog ]
| 2005-04-01 Kenichi Handa <handa@m17n.org>
|
| * international/characters.el: Enable the correct case setting for
| dotless-i and dotted-I.
|
| 2005-02-02 Kenichi Handa <handa@m17n.org>
|
| * international/characters.el: Cancel previous change for
| I-WITH-DOT-ABOVE and DOTLESS-i.
|
| 2005-02-02 Kenichi Handa <handa@m17n.org>
|
| * international/latin-5.el (tbl): Setup cases of I-WITH-DOT-ABOVE,
| DOTLESS-i.
|
| * international/characters.el: Setup cases of GREEK-FINAL-SIGMA,
| Y-WITH-DIAERESIS, I-WITH-DOT-ABOVE, DOTLESS-i.
`----
Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
next prev parent reply other threads:[~2006-11-06 9:21 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-05 5:37 Alexandre Oliva
2006-11-06 5:02 ` Richard Stallman
2006-11-06 6:02 ` Katsumi Yamaoka
2006-11-06 9:21 ` Reiner Steib [this message]
2006-11-06 20:00 ` Alexandre Oliva
2006-11-07 14:13 ` Reiner Steib
2006-11-08 14:43 ` Reiner Steib
2006-11-09 22:00 ` Alexandre Oliva
2006-11-10 18:42 ` Richard Stallman
2006-11-11 0:37 ` Reiner Steib
2006-11-13 16:40 ` Kevin Rodgers
2006-11-14 12:26 ` Richard Stallman
2006-11-13 17:28 ` Reiner Steib
2006-11-19 9:49 ` Elias Oltmanns
2006-11-20 12:59 ` Richard Stallman
2006-11-20 18:22 ` Elias Oltmanns
2006-11-21 7:47 ` Richard Stallman
2006-11-21 8:18 ` Kenichi Handa
2006-11-22 13:15 ` Richard Stallman
2006-11-12 5:14 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=v9psc1q6ak.fsf@marauder.physik.uni-ulm.de \
--to=reinersteib+gmane@imap.cc \
--cc=Reiner.Steib@gmx.de \
--cc=ding@gnus.org \
--cc=emacs-pretest-bug@gnu.org \
--cc=lxoliva@fsfla.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).