Gnus development mailing list
 help / color / mirror / Atom feed
* timezone.el patterns in emacs 19.34
@ 1997-05-02 23:18 Ken Raeburn
  0 siblings, 0 replies; 13+ messages in thread
From: Ken Raeburn @ 1997-05-02 23:18 UTC (permalink / raw)



FYI, in case anyone wants to try out this change...

------- Start of forwarded message -------
Date: Fri, 2 May 1997 04:44:11 -0400
Message-Id: <199705020844.EAA06520@kr-laptop.cygnus.com>
From: Ken Raeburn <raeburn@cygnus.com>
To: umerin@mse.kyutech.ac.jp, rms@gnu.ai.mit.edu
Subject: timezone.el patterns in emacs 19.34


I noticed some time ago that timezone-parse-date took up a large
portion of the time of building a summary buffer in Gnus.  Tonight I
did a little instrumenting of that routine.  If my results are
correct, all the mail headers I've processed since (over 12000) *all*
matched one of the regexp patterns, namely the one labeled

	   ;; Styles: (1) and (2) with timezone and buggy timezone

which means basically

	[wkday,] DD MMM YYYY hh:mm[:ss] [TZ]

This looks like the right form for news headers as well.

This is the fourth regexp to be tried.  The first two are looking for:

	wkday, DD MMM hh:mm:ss [T] YYYY [TZ]

with and without the timezone specification; I haven't seen this form
actually match anything.  The third regexp checked is styles 1/2
without the timezone.

Blindly assuming that the regexps are all mutually exclusive in the
strings they'll match, I moved this one to the top of the `cond'
expression for some timing tests, using some data from my mail
headers.  The speed improvement (testing just the inner guts of
timezone-parse-date, without the code for stripping text properties)
was about a factor of 2.5.

So if it is in fact safe, I think it'd be worth moving this pattern to
the top of the list.

Does anything else use timezone-parse-date so intensely, that might
"prefer" the order used now?

Ken

------- End of forwarded message -------


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-03 23:44       ` Ken Raeburn
  1997-05-04  0:41         ` Hrvoje Niksic
@ 1997-05-08 12:37         ` Lars Magne Ingebrigtsen
  1 sibling, 0 replies; 13+ messages in thread
From: Lars Magne Ingebrigtsen @ 1997-05-08 12:37 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> Okay, so how would you get around the requirement of matching a
> substring, without using regexps?

Have a look at parse-time.el (written by Erik Naggum), which is
included in the Gnus distribution.  It uses no regexps to handle date
parsing, and it's faster than the timezone functions.  It doesn't
understand ISO8601 dates yet, though.

Gnus doesn't use this yet, but will probably start using it instead of
timezone in Quassia Gnus.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-04 20:55                 ` Hrvoje Niksic
@ 1997-05-04 22:55                   ` Stainless Steel Rat
  0 siblings, 0 replies; 13+ messages in thread
From: Stainless Steel Rat @ 1997-05-04 22:55 UTC (permalink / raw)


-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Hrv" == Hrvoje Niksic <hniksic@srce.hr> writes:

Hrv> Can you elaborate?  As far as I know, the majority of countries
Hrv> (*including* the US) gives the program author copyright by default.
Hrv> However, there are still two cases:

Under US Copyright law, apart from the Berne Agreements -- International
Copyright law -- one may legally forfeit all rights to a work and place it
into the Public Domain.  The Copyright Act of 1976 does not suspend this
aspect of US Copyright law.

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3
Charset: noconv

iQCVAwUBM20Ty56VRH7BJMxHAQEEkQP5AQwoCakjH83O5odcxqUXLIG0g2cpuM/P
Kpde1NRp2DzuioIcMgmFbl8ST22HZ1Ca0IjWsnCpphQcpVii2CPLYsFZSsYEJA4Y
mwjYuOi4QLf87qrba4nEHlzqxyfcpWK02rEdvs0yqMYL1i1yr82soF29E3UE8dbs
0J5azycFmBE=
=a6lk
-----END PGP SIGNATURE-----
-- 
Rat <ratinox@peorth.gweep.net>    \ If Happy Fun Ball begins to smoke, get
PGP Key: at a key server near you! \ away immediately. Seek shelter and cover
                                    \ head.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-04 20:49               ` Johan Danielsson
@ 1997-05-04 20:55                 ` Hrvoje Niksic
  1997-05-04 22:55                   ` Stainless Steel Rat
  0 siblings, 1 reply; 13+ messages in thread
From: Hrvoje Niksic @ 1997-05-04 20:55 UTC (permalink / raw)


[ note: this is becoming completely off-topic to ding ]

joda@pdc.kth.se (Johan Danielsson) writes:
> And of course, this applies only to the US. In Europe, in general,
> there is no such thing as `not copyrighted'.

Can you elaborate?  As far as I know, the majority of countries
(*including* the US) gives the program author copyright by default.
However, there are still two cases:

1) the program written before the Berne convention, without copyright
   notice attached; and

2) the program deliberately disowned by its author.

Besides, I try to avoid the distinction betwee "Europe" and US.
Europe consists of many countries, each with its own set of laws and a
bouquet of law-related bogosities.

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
* Q: What is an experienced Emacs user?
* A: A person who wishes that the terminal had pedals.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-04 19:55             ` Hrvoje Niksic
@ 1997-05-04 20:49               ` Johan Danielsson
  1997-05-04 20:55                 ` Hrvoje Niksic
  0 siblings, 1 reply; 13+ messages in thread
From: Johan Danielsson @ 1997-05-04 20:49 UTC (permalink / raw)
  Cc: ding

Hrvoje Niksic <hniksic@srce.hr> writes:

> Public domain software 
>     Public domain software is software that is not copyrighted. It is
>     a special case of non-copylefted free software, which means that
>     some copies or modified versions may not be free at all.
>     Sometimes people use the term ``public domain'' in a loose fashion
>     to mean ``free'' or ``available gratis.'' However, ``public
>     domain'' is a legal term and means, precisely, ``not
>     copyrighted''.  For clarity, we recommend using ``public domain''
>     for that meaning only, and using other terms to convey the other
>     meanings.

And of course, this applies only to the US. In Europe, in general,
there is no such thing as `not copyrighted'.

Just to set things straight.

/Johan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-04  3:03           ` Ken Raeburn
@ 1997-05-04 19:55             ` Hrvoje Niksic
  1997-05-04 20:49               ` Johan Danielsson
  0 siblings, 1 reply; 13+ messages in thread
From: Hrvoje Niksic @ 1997-05-04 19:55 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> as the lisp or yacc versions?  Maybe.  Would it be worth spending the
> time?  I doubt it, at least for right now.  If you want to do it
> anyways, go for it....

I will not do it, unless a good speedup in Gnus (and possibly other
packages, but my focus is on Gnus, which uses the routine heavily) is
the result.  I don't consider it a particular fun thing to write, but
if it's a good point for speed optimization -- why not?

> > > > Then, there are copyright problems with it.
> > > No, there aren't.
> > Yes, there are.  The public domain code is not to be introduced in
> > important parts of Emacs, as it is compromisable.
> 
> "Compromisable?"  Why, because the FSF might not be able to get
> paperwork saying it really is PD?  They could try, and I suspect such
> paperwork would be enough.

No.  By definition, PD software is the software that is not
copyrighted -- it does not have a legal owner (which should be the
person or persons signing the papers).  From www.fsf.org:

Public domain software 
    Public domain software is software that is not copyrighted. It is
    a special case of non-copylefted free software, which means that
    some copies or modified versions may not be free at all.
    Sometimes people use the term ``public domain'' in a loose fashion
    to mean ``free'' or ``available gratis.'' However, ``public
    domain'' is a legal term and means, precisely, ``not
    copyrighted''.  For clarity, we recommend using ``public domain''
    for that meaning only, and using other terms to convey the other
    meanings.

Stallman considers PD programs unsafe to use in GNU.

    * Public domain.

    If you put the program in the public domain, we prefer to have a
    signed piece of paper--a disclaimer of rights--from you confirming
    this.  If the program is not very important, we can do without
    one; the worst that could happen is that we might some day be
    forced to stop using it.

    The law says that anyone can copyright a modified version of the
    public domain work.  (This doesn't restrict the original, which
    remains in the public domain; only the changes are copyrighted.)
    If we make extensive changes, we will probably do this and add our
    usual copyleft.  If we make small changes, we will leave the
    version we distribute in the public domain.

> (Actually, I'd be surprised if there weren't some other GNU programs
> using a date parser already.  For example, GNU date in sh-utils.)

Yes, there are pieces of public-domain software in GNU, but they are
either dispensable, or there is not yet a free replacement (as in the
case of `getdate.y').

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
Oh lord won't you buy me a color TV...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-04  0:41         ` Hrvoje Niksic
@ 1997-05-04  3:03           ` Ken Raeburn
  1997-05-04 19:55             ` Hrvoje Niksic
  0 siblings, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 1997-05-04  3:03 UTC (permalink / raw)



Yes, I think you could write a faster date parser in C than either the
lisp or yacc versions.  (This much is obvious -- as a worst case, use
the existing lisp interpreter or yacc output, both of which are C, and
improve things from there.)  Would it be as readable and maintainable
as the lisp or yacc versions?  Maybe.  Would it be worth spending the
time?  I doubt it, at least for right now.  If you want to do it
anyways, go for it....

> > > Then, there are copyright problems with it.
> > No, there aren't.
> Yes, there are.  The public domain code is not to be introduced in
> important parts of Emacs, as it is compromisable.

"Compromisable?"  Why, because the FSF might not be able to get
paperwork saying it really is PD?  They could try, and I suspect such
paperwork would be enough.  (Actually, I'd be surprised if there
weren't some other GNU programs using a date parser already.  For
example, GNU date in sh-utils.)

> > > And, I'd like to have a fast routine that matches those 7 or so
> > > format strings.
> > Yes, that's the problem I see with it -- it'll handle more than we
> > want.
> `getdate.y' would handle even more.

Uh, yes, that's what I meant to say.
The "it" I was referring to was getdate.y.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-03 23:44       ` Ken Raeburn
@ 1997-05-04  0:41         ` Hrvoje Niksic
  1997-05-04  3:03           ` Ken Raeburn
  1997-05-08 12:37         ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 13+ messages in thread
From: Hrvoje Niksic @ 1997-05-04  0:41 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> Okay, so how would you get around the requirement of matching a
> substring, without using regexps?

Simply make as many routines as there are formats.  For example, if
the format is

   Wkd, blah blah

then I search for week days, if I find them, look for `,', etc.  It's
tedious, but I can guarantee that it would be *much* faster than the
current, Lisp code.

> If you can do it, and get much better performance than we've got now
> with the lisp code, great.  But as I just indicated in other mail,
> my trivial little patch already addresses the performance issue to
> some degree.  So you'll have to do even better. :-)

I certainly won't do it before I get the exact numbers of how much
time is spent in timezone.el, before and after your patch.  It is too
much work to speed up Summary buffer building by 5% or so.

> And, by the way, if you do go ahead with this, I'd suggest returning
> numbers instead of strings when practical.  They can be turned into
> strings easily enough in lisp, and the gnus-dd-mmm usage just converts
> them back to numbers and discards the strings, which means more
> needless garbage to collect.  (Don't underestimate the cost of garbage
> collection.  I've managed to speed up some code by tweaking it to
> allocate less short-term heap storage.)

Yup.

> > NO!  I hate getdate.y.  First, because of the `.y' thingie. :-)
> 
> Aside from the substring issue, I think a parser would be well suited
> to this task.  I see two advantages to it:
>  * Tokenizes first, then looks for patterns in the tokens; this means
>    any backtracking doesn't have to re-process every character.
>  * Yacc sets up a FSM for matching multiple sequences in parallel (as
>    do the more sophisticated regexp matchers), so if a match is going
>    to be made, it'll be made in one pass; the hairy work is done at
>    build time.

But it's huge, and we don't need all that cruft.  In a C
implementation of timezone.el, it would make little difference whether
the parser is "intelligent" or not.

> > Then, there are copyright problems with it.
> 
> No, there aren't.

Yes, there are.  The public domain code is not to be introduced in
important parts of Emacs, as it is compromisable.

> > And, I'd like to have a fast routine that matches those 7 or so
> > format strings.
> 
> Yes, that's the problem I see with it -- it'll handle more than we
> want.

`getdate.y' would handle even more.

> A date header saying "+23hours" is bogus, and should be treated
> as such.  Though if getdate.y matches more "real" date formats than
> timezone-parse-date, I don't think it would be a bad idea to support
> them as well, since this is timezone-parse-date we're talking about,
> and not gnus-parse-conforming-date-header.

If I ever rewrite `timezone-parse-date', it will recognize the dates
it recognizes now.

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
Good pings come in small packets.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-03  4:04     ` Hrvoje Niksic
@ 1997-05-03 23:44       ` Ken Raeburn
  1997-05-04  0:41         ` Hrvoje Niksic
  1997-05-08 12:37         ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 13+ messages in thread
From: Ken Raeburn @ 1997-05-03 23:44 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> > Besides, I think you'd probably wind up doing the regexp bit
> > anyways, just at the C level.  These patterns don't have to match
> > the *entire* string, just some substring.
> I would do no regexp matching.  If I had to resort to regexps, then I
> would rather leave it in Lisp, as it is.

Okay, so how would you get around the requirement of matching a
substring, without using regexps?  If you can do it, and get much
better performance than we've got now with the lisp code, great.  But
as I just indicated in other mail, my trivial little patch already
addresses the performance issue to some degree.  So you'll have to do
even better. :-)

And, by the way, if you do go ahead with this, I'd suggest returning
numbers instead of strings when practical.  They can be turned into
strings easily enough in lisp, and the gnus-dd-mmm usage just converts
them back to numbers and discards the strings, which means more
needless garbage to collect.  (Don't underestimate the cost of garbage
collection.  I've managed to speed up some code by tweaking it to
allocate less short-term heap storage.)

> NO!  I hate getdate.y.  First, because of the `.y' thingie. :-)

Aside from the substring issue, I think a parser would be well suited
to this task.  I see two advantages to it:
 * Tokenizes first, then looks for patterns in the tokens; this means
   any backtracking doesn't have to re-process every character.
 * Yacc sets up a FSM for matching multiple sequences in parallel (as
   do the more sophisticated regexp matchers), so if a match is going
   to be made, it'll be made in one pass; the hairy work is done at
   build time.
Certainly, yacc isn't the be-all and end-all of parsing technology, by
a long shot.  But it's up to this task.

>	  Then,
> there are copyright problems with it.

No, there aren't.  The version I'm looking at has a "public domain, no
copyright" label on it.  There would be no problem with incorporating
a copy into emacs.

>	  And, I'd like to have a fast
> routine that matches those 7 or so format strings.

Yes, that's the problem I see with it -- it'll handle more than we
want.  A date header saying "+23hours" is bogus, and should be treated
as such.  Though if getdate.y matches more "real" date formats than
timezone-parse-date, I don't think it would be a bad idea to support
them as well, since this is timezone-parse-date we're talking about,
and not gnus-parse-conforming-date-header.

Ken


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-03  1:14 ` Hrvoje Niksic
  1997-05-03  3:07   ` Ken Raeburn
@ 1997-05-03 22:41   ` Ken Raeburn
  1 sibling, 0 replies; 13+ messages in thread
From: Ken Raeburn @ 1997-05-03 22:41 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> > I noticed some time ago that timezone-parse-date took up a large
> > portion of the time of building a summary buffer in Gnus.
> How large a portion?  20%, 50%?

I did some timing tests last night using elp.  I had to restart emacs
at one point, so the numbers may not be entirely reliable, but it
seemed to go from almost 20% to about 7% with my timezone.el change.

I got confirmation from RMS today that he is going to include that
change.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-03  3:07   ` Ken Raeburn
@ 1997-05-03  4:04     ` Hrvoje Niksic
  1997-05-03 23:44       ` Ken Raeburn
  0 siblings, 1 reply; 13+ messages in thread
From: Hrvoje Niksic @ 1997-05-03  4:04 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> > it should be written in C.  It's not that hard to write, and the
> > speedup would measure by hundreds (all those regexps would become
> > unnecessary).
> 
> I'm not so sure about converting it to straight C; I could imagine
> wanting to change the supported format list someday, and wanting to do
> so easily -- i.e., without recompiling emacs.

I think it's a small price to pay for a big speedup of Summary buffer
building.  Emacs is not a particularly fast piece of software, and I
don't think we should miss things like this.  The speedup would be
enormous, really.

The extensibility is not lost, as this subr can be overridden with
whatever you like, including the old version.

> Besides, I think you'd probably wind up doing the regexp bit
> anyways, just at the C level.  These patterns don't have to match
> the *entire* string, just some substring.

I would do no regexp matching.  If I had to resort to regexps, then I
would rather leave it in Lisp, as it is.

> Now, if we can ignore or work around that substring issue (not a
> trivial issue, Gnus isn't the only user of that code), and require a
> match against the whole string, then using getdate.y might be a win.
> I understand it's supposed to be pretty comprehensive.

NO!  I hate getdate.y.  First, because of the `.y' thingie. :-)  Then,
there are copyright problems with it.  And, I'd like to have a fast
routine that matches those 7 or so format strings.  Nothing more,
nothing less.  If someone wants more -- the doors of Lisp are wide
open.  I opt for speed in this case.

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
Oh lord won't you buy me a color TV...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
  1997-05-03  1:14 ` Hrvoje Niksic
@ 1997-05-03  3:07   ` Ken Raeburn
  1997-05-03  4:04     ` Hrvoje Niksic
  1997-05-03 22:41   ` Ken Raeburn
  1 sibling, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 1997-05-03  3:07 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> > I noticed some time ago that timezone-parse-date took up a large
> > portion of the time of building a summary buffer in Gnus.
> How large a portion?  20%, 50%?  Looking at timezone.el, it looks like

I don't remember for sure; the timing I did was some time ago.  In my
recent checks (mostly to verify the names of the routines before I
started looking at them) I interrupted the "generating summary" phase,
with debug-on-quit set, and that's where it usually wound up according
to the backtrace.

> it should be written in C.  It's not that hard to write, and the
> speedup would measure by hundreds (all those regexps would become
> unnecessary).

I'm not so sure about converting it to straight C; I could imagine
wanting to change the supported format list someday, and wanting to do
so easily -- i.e., without recompiling emacs.  (In fact, the ISO
format Francois Pinard just recommended using may require changes to
get the time zone right.)  Besides, I think you'd probably wind up
doing the regexp bit anyways, just at the C level.  These patterns
don't have to match the *entire* string, just some substring.

And if you do one regexp match, well, that doesn't seem much better
than what you'd do in what appears to be the common case under Gnus,
if you use the reordering I suggested.

Now, if we can ignore or work around that substring issue (not a
trivial issue, Gnus isn't the only user of that code), and require a
match against the whole string, then using getdate.y might be a win.
I understand it's supposed to be pretty comprehensive.

Ken


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: timezone.el patterns in emacs 19.34
       [not found] <199705020844.EAA06520@kr-laptop.cygnus.com>
@ 1997-05-03  1:14 ` Hrvoje Niksic
  1997-05-03  3:07   ` Ken Raeburn
  1997-05-03 22:41   ` Ken Raeburn
  0 siblings, 2 replies; 13+ messages in thread
From: Hrvoje Niksic @ 1997-05-03  1:14 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> I noticed some time ago that timezone-parse-date took up a large
> portion of the time of building a summary buffer in Gnus.

How large a portion?  20%, 50%?  Looking at timezone.el, it looks like
it should be written in C.  It's not that hard to write, and the
speedup would measure by hundreds (all those regexps would become
unnecessary).

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
ED WILL NOT CORRUPT YOUR PRECIOUS BODILY FLUIDS!!


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~1997-05-08 12:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-05-02 23:18 timezone.el patterns in emacs 19.34 Ken Raeburn
     [not found] <199705020844.EAA06520@kr-laptop.cygnus.com>
1997-05-03  1:14 ` Hrvoje Niksic
1997-05-03  3:07   ` Ken Raeburn
1997-05-03  4:04     ` Hrvoje Niksic
1997-05-03 23:44       ` Ken Raeburn
1997-05-04  0:41         ` Hrvoje Niksic
1997-05-04  3:03           ` Ken Raeburn
1997-05-04 19:55             ` Hrvoje Niksic
1997-05-04 20:49               ` Johan Danielsson
1997-05-04 20:55                 ` Hrvoje Niksic
1997-05-04 22:55                   ` Stainless Steel Rat
1997-05-08 12:37         ` Lars Magne Ingebrigtsen
1997-05-03 22:41   ` Ken Raeburn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).