Gnus development mailing list
 help / color / mirror / Atom feed
* Regular expression performance in emacs, number of cached regexps
@ 1998-05-17 17:49 Gregory S. Stark
  1998-05-17 19:20 ` Kai Grossjohann
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Gregory S. Stark @ 1998-05-17 17:49 UTC (permalink / raw)
  Cc: ding


In profiling an elisp program I was working on I found that an inordinate
quantity of string memory was being allocated in a function I thought I had
optimized extensively. Using the source, I eventually found that the memory
was allocated by the regular expression searching functions to cache a small
constant number of compiled regular expressions.

Unfortunately in my code, which was fairly short and simple, the behaviour was
pathological. If I recall correctly, Emacs cached the five most recently used
compiled regular expressions, and I was looping through about six or seven. So
Emacs was repeatedly recompiling the same six or seven regular expressions and
allocating memory only to be thrown away as garbage shortly later, causing
excessive regular expression compilations and garbage collections.

I've since optimized my package to use less than five regular expressions, but
my concern is that several other large packages including W3 and Gnus use
re-search-forward and looking-at liberally in code that is otherwise quite
carefully optimized (more or less). Reducing the number of regular expressions
these packages use in loops is likely to be quite hard. I suspect increasing
the compile-time configuration constant governing the size of this cache will
be a big speed and memory boost for these packages.

greg

In case people are curious, here is the code I use to profile memory usage in
my packages. I've suggested something like this be integrated in elp.el, which
I don't think would really be all that hard, but I don't think elp.el is under
active development, which is a shame, it's a great tool.

(defadvice the-function-to-profile (around profile-memory)
  (let (before after delta)
    (setq before (memory-use-counts))
    ad-do-it
    (setq after (memory-use-counts))
    (while before
    (push (- (pop after) (pop before)) delta))
    (apply 'message
           "MEMORY USE cons:%d fl:%d vec:%d sym:%d str:%d misc:%d int:%d"
           (nreverse delta))))



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-17 17:49 Regular expression performance in emacs, number of cached regexps Gregory S. Stark
@ 1998-05-17 19:20 ` Kai Grossjohann
       [not found] ` <vafhg2o1z7k.fsf@ramses..cs.uni-dortmund.de>
  1998-05-18  5:29 ` Richard Stallman
  2 siblings, 0 replies; 12+ messages in thread
From: Kai Grossjohann @ 1998-05-17 19:20 UTC (permalink / raw)
  Cc: ding, w3-beta

>>>>> gsstark@mit.edu (Gregory S. Stark) writes:

  > [...] If I recall correctly, Emacs cached the five most recently
  > used compiled regular expressions, and I was looping through about
  > six or seven. So Emacs was repeatedly recompiling the same six or
  > seven regular expressions and allocating memory only to be thrown
  > away as garbage shortly later, causing excessive regular
  > expression compilations and garbage collections. [...]

I think I read that the limit has been increased to maybe 20 or so in
a more recent Emacs (the yet to be released 20.3?).

But I'm sure RMS knows best :-)

kai
-- 
A large number of young women don't trust men with beards.
(BFBS Radio)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
       [not found] ` <vafhg2o1z7k.fsf@ramses..cs.uni-dortmund.de>
@ 1998-05-17 19:33   ` Hrvoje Niksic
  0 siblings, 0 replies; 12+ messages in thread
From: Hrvoje Niksic @ 1998-05-17 19:33 UTC (permalink / raw)


Kai Grossjohann <grossjohann@charly.cs.uni-dortmund.de> writes:

> >>>>> gsstark@mit.edu (Gregory S. Stark) writes:
> 
>   > [...] If I recall correctly, Emacs cached the five most recently
>   > used compiled regular expressions, and I was looping through about
>   > six or seven. So Emacs was repeatedly recompiling the same six or
>   > seven regular expressions and allocating memory only to be thrown
>   > away as garbage shortly later, causing excessive regular
>   > expression compilations and garbage collections. [...]
> 
> I think I read that the limit has been increased to maybe 20 or so in
> a more recent Emacs (the yet to be released 20.3?).

Yes.  So did XEmacs, a few releases ago.

(see definition of REGEXP_CACHE_SIZE in `src/search.c'.)

> But I'm sure RMS knows best :-)

Ha ha ha.

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
Be nice to your kids.
They'll choose your nursing home.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-17 17:49 Regular expression performance in emacs, number of cached regexps Gregory S. Stark
  1998-05-17 19:20 ` Kai Grossjohann
       [not found] ` <vafhg2o1z7k.fsf@ramses..cs.uni-dortmund.de>
@ 1998-05-18  5:29 ` Richard Stallman
  1998-05-18 15:49   ` William M. Perry
  1998-05-19  7:55   ` Jan Vroonhof
  2 siblings, 2 replies; 12+ messages in thread
From: Richard Stallman @ 1998-05-18  5:29 UTC (permalink / raw)
  Cc: ding, w3-beta

I can easily increase this cache size--the question is,
how far should I increase it?

It was already increased to 20 in Emacs 20.  So I don't
think I need to increase it now.  How about switching to Emacs 20?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-18  5:29 ` Richard Stallman
@ 1998-05-18 15:49   ` William M. Perry
  1998-05-19  4:45     ` Richard Stallman
  1998-05-19  7:55   ` Jan Vroonhof
  1 sibling, 1 reply; 12+ messages in thread
From: William M. Perry @ 1998-05-18 15:49 UTC (permalink / raw)
  Cc: gsstark, ding, w3-beta

Richard Stallman <rms@santafe.edu> writes:

> I can easily increase this cache size--the question is, how far should I
> increase it?
> 
> It was already increased to 20 in Emacs 20.  So I don't think I need to
> increase it now.  How about switching to Emacs 20?

  Well, as people are wont to say nowadays - memory is cheap, and so is
disk.  If I'm reading the code right, the regexp cache only takes 398 bytes
(worst case on a 64bit machine)... so 20 cache entries are only 7960 (~
7.75k).

  I would propose bumping this up to about 64k total (164 entries).  This
would help _alot_ with gnus and Emacs/W3.

-Bill P.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-18 15:49   ` William M. Perry
@ 1998-05-19  4:45     ` Richard Stallman
  1998-05-19 13:52       ` William M. Perry
  1998-05-20  6:43       ` Dale Hagglund
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Stallman @ 1998-05-19  4:45 UTC (permalink / raw)
  Cc: gsstark, ding, w3-beta

      Well, as people are wont to say nowadays - memory is cheap, and so is
    disk.  If I'm reading the code right, the regexp cache only takes 398 bytes
    (worst case on a 64bit machine)... so 20 cache entries are only 7960 (~
    7.75k).

There must be a misunderstanding, because the data in a cache
entry has variable size.  Perhaps you're looking only at the
cache structure itself and not its contents.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-18  5:29 ` Richard Stallman
  1998-05-18 15:49   ` William M. Perry
@ 1998-05-19  7:55   ` Jan Vroonhof
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Vroonhof @ 1998-05-19  7:55 UTC (permalink / raw)


Richard Stallman <rms@santafe.edu> writes:

> I can easily increase this cache size--the question is,
> how far should I increase it?

Make it settible from lisp? A quick glance at the code [1] gives the
impression that it is accessed by pointer chasing anyway so you would
use not (much) speed by the compiler not able to make static array
optimisations. 

Jan

[1] of XEmacs, which is all I have availible here at the moment,
sorry.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-19  4:45     ` Richard Stallman
@ 1998-05-19 13:52       ` William M. Perry
  1998-05-20  1:38         ` Richard Stallman
  1998-05-20  6:43       ` Dale Hagglund
  1 sibling, 1 reply; 12+ messages in thread
From: William M. Perry @ 1998-05-19 13:52 UTC (permalink / raw)
  Cc: gsstark, ding, w3-beta

Richard Stallman <rms@santafe.edu> writes:

>     Well, as people are wont to say nowadays - memory is cheap, and so is
>     disk.  If I'm reading the code right, the regexp cache only takes 398
>     bytes (worst case on a 64bit machine)... so 20 cache entries are only
>     7960 (~ 7.75k).
> 
> There must be a misunderstanding, because the data in a cache entry has
> variable size.  Perhaps you're looking only at the cache structure itself
> and not its contents.

  I was assuming this was fairly small - how bad could a pathological posix 
regexp get? :)  Another 64k total?  Megabytes?  I've never delved too much
into the internals of regexp matching - it scares me.

-Bill P.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-19 13:52       ` William M. Perry
@ 1998-05-20  1:38         ` Richard Stallman
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Stallman @ 1998-05-20  1:38 UTC (permalink / raw)
  Cc: gsstark, ding, w3-beta

If the cache had 64 items, I could easily imagine it might occupy 64k,
counting the contents.  I don't consider 64k insignificant.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-19  4:45     ` Richard Stallman
  1998-05-19 13:52       ` William M. Perry
@ 1998-05-20  6:43       ` Dale Hagglund
  1998-05-20 11:47         ` William M. Perry
  1998-05-21  7:13         ` Richard Stallman
  1 sibling, 2 replies; 12+ messages in thread
From: Dale Hagglund @ 1998-05-20  6:43 UTC (permalink / raw)
  Cc: wmperry, gsstark, ding, w3-beta

Maybe adding a new function along the lines of

	(re-compile REGEXP)

that returns a compiled form of REGEXP would be a useful approach.
Along with this, all the various regexp search functions could check
if they're given a compiled or an uncompiled regexp, and either check
the cache and compile if necessary, or just use the given compiled
form.

This would let authors of regexp-heavy code pre-compile their own
regexps, avoiding the entire issue of cache size.  The down-side of
this is that over-zealous use of this could chew quite a bit of space
in compiled regexps.

Is there some reason why this sort of approach is a bad idea?  I've
used emacs extensively for many years, and written a fair amount of
lisp code, but I'm only looked at the C internals a few times.

Dale.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-20  6:43       ` Dale Hagglund
@ 1998-05-20 11:47         ` William M. Perry
  1998-05-21  7:13         ` Richard Stallman
  1 sibling, 0 replies; 12+ messages in thread
From: William M. Perry @ 1998-05-20 11:47 UTC (permalink / raw)
  Cc: rms, gsstark, ding, w3-beta

Dale Hagglund <rdh@best.com> writes:

> Maybe adding a new function along the lines of
> 
> 	(re-compile REGEXP)
> 
> that returns a compiled form of REGEXP would be a useful approach.  Along
> with this, all the various regexp search functions could check if they're
> given a compiled or an uncompiled regexp, and either check the cache and
> compile if necessary, or just use the given compiled form.
> 
> This would let authors of regexp-heavy code pre-compile their own
> regexps, avoiding the entire issue of cache size.  The down-side of this
> is that over-zealous use of this could chew quite a bit of space in
> compiled regexps.
> 
> Is there some reason why this sort of approach is a bad idea?  I've used
> emacs extensively for many years, and written a fair amount of lisp code,
> but I'm only looked at the C internals a few times.

  This is actually something that is on my todo list, but I have zero free
time right now due to my mother's illness, so its been on the back burner
for a while.

-bill p.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Regular expression performance in emacs, number of cached regexps
  1998-05-20  6:43       ` Dale Hagglund
  1998-05-20 11:47         ` William M. Perry
@ 1998-05-21  7:13         ` Richard Stallman
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Stallman @ 1998-05-21  7:13 UTC (permalink / raw)
  Cc: wmperry, gsstark, ding, w3-beta

I've thought about this idea before.  I am not sure it is a bad idea,
but I'm not sure it is worth its weight, so to speak.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~1998-05-21  7:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-05-17 17:49 Regular expression performance in emacs, number of cached regexps Gregory S. Stark
1998-05-17 19:20 ` Kai Grossjohann
     [not found] ` <vafhg2o1z7k.fsf@ramses..cs.uni-dortmund.de>
1998-05-17 19:33   ` Hrvoje Niksic
1998-05-18  5:29 ` Richard Stallman
1998-05-18 15:49   ` William M. Perry
1998-05-19  4:45     ` Richard Stallman
1998-05-19 13:52       ` William M. Perry
1998-05-20  1:38         ` Richard Stallman
1998-05-20  6:43       ` Dale Hagglund
1998-05-20 11:47         ` William M. Perry
1998-05-21  7:13         ` Richard Stallman
1998-05-19  7:55   ` Jan Vroonhof

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).