Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* Fancy split - simple regexp problem
@ 2011-09-20 18:55 Adam Sjøgren
  2011-10-06  5:05 ` Tim Landscheidt
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Sjøgren @ 2011-09-20 18:55 UTC (permalink / raw)
  To: info-gnus-english

I am trying to add a split to my nnmail-fancy-split that will catch
headers like this:

  X-Bugzilla-Product: gdm

And put emails matching in the group bugzilla.[productname], i.e.
"bugzilla.gdm" in this example.

When I try with this split:

  ("X-Bugzilla-Product" "\\w+" "bugzilla.\\&")

B q tells me:

  "This message would go to bugzilla.m"

If I change the regexp to "\\w\\w\\w" it goes to bugzilla.gdm - but I
don't fancy (haha) writing an entry for each possible length.

What am I doing wrong?

I even tried M-x regexp-builder to make a regexp that starts with \\<
and ends with \\> and matches "gdm", but for some reason I can't make
the result work in fancy splitting.

I'm sure there is an obvious thing, I just can't see it.


  Best regards,

    Adam

-- 
 "You make a fair point. Knowing when not to say              Adam Sjøgren
  anything is not something the lazyweb had              asjo@koldfront.dk
  historically been good at."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-09-20 18:55 Fancy split - simple regexp problem Adam Sjøgren
@ 2011-10-06  5:05 ` Tim Landscheidt
  2011-10-06  9:22   ` Štěpán Němec
  0 siblings, 1 reply; 11+ messages in thread
From: Tim Landscheidt @ 2011-10-06  5:05 UTC (permalink / raw)
  To: info-gnus-english

asjo@koldfront.dk (Adam Sjøgren) wrote:

> [...]
> If I change the regexp to "\\w\\w\\w" it goes to bugzilla.gdm - but I
> don't fancy (haha) writing an entry for each possible length.

> What am I doing wrong?
> [...]

Nothing, I think :-). I personally don't use fancy split-
ting, but a deeper look at (at least Gnus 5.13's) code seems
to locate the culprit in Emacs' *backward* regular expres-
sion "non-greedity": Position point at the end of
"bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
is matched.

  I don't understand why the match is performed backward and
not forward, and the code (nnmail-split-it) looks far too
complicated to just flip it around.

Tim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-06  5:05 ` Tim Landscheidt
@ 2011-10-06  9:22   ` Štěpán Němec
  2011-10-20 12:50     ` Tim Landscheidt
  0 siblings, 1 reply; 11+ messages in thread
From: Štěpán Němec @ 2011-10-06  9:22 UTC (permalink / raw)
  To: info-gnus-english

On Thu, 06 Oct 2011 07:05:56 +0200
Tim Landscheidt wrote:

> Nothing, I think :-). I personally don't use fancy split-
> ting, but a deeper look at (at least Gnus 5.13's) code seems
> to locate the culprit in Emacs' *backward* regular expres-
> sion "non-greedity": Position point at the end of
> "bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
> is matched.

That's not "non-greedity", that's brokenness IMO. Reported as GNU Emacs
bug #9681.

-- 
Štěpán

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-06  9:22   ` Štěpán Němec
@ 2011-10-20 12:50     ` Tim Landscheidt
  2011-10-21  9:14       ` Štěpán Němec
  0 siblings, 1 reply; 11+ messages in thread
From: Tim Landscheidt @ 2011-10-20 12:50 UTC (permalink / raw)
  To: info-gnus-english

Štěpán Němec <stepnem@gmail.com> wrote:

>> Nothing, I think :-). I personally don't use fancy split-
>> ting, but a deeper look at (at least Gnus 5.13's) code seems
>> to locate the culprit in Emacs' *backward* regular expres-
>> sion "non-greedity": Position point at the end of
>> "bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
>> is matched.

> That's not "non-greedity", that's brokenness IMO. Reported as GNU Emacs
> bug #9681.

As your opinion doesn't seem to be shared by the Emacs de-
velopers, I'd suggest filing a bug for Gnus (as well :-)).

  It shouldn't be too hard to fix if someone starts with
documenting it :-).

Tim


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-20 12:50     ` Tim Landscheidt
@ 2011-10-21  9:14       ` Štěpán Němec
  2011-10-23 23:08         ` Tim Landscheidt
  0 siblings, 1 reply; 11+ messages in thread
From: Štěpán Němec @ 2011-10-21  9:14 UTC (permalink / raw)
  To: info-gnus-english

On Thu, 20 Oct 2011 14:50:06 +0200
Tim Landscheidt wrote:

> Štěpán Němec <stepnem@gmail.com> wrote:
>
>>> Nothing, I think :-). I personally don't use fancy split-
>>> ting, but a deeper look at (at least Gnus 5.13's) code seems
>>> to locate the culprit in Emacs' *backward* regular expres-
>>> sion "non-greedity": Position point at the end of
>>> "bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
>>> is matched.
>
>> That's not "non-greedity", that's brokenness IMO. Reported as GNU Emacs
>> bug #9681.
>
> As your opinion doesn't seem to be shared by the Emacs de-
> velopers, I'd suggest filing a bug for Gnus (as well :-)).

Be my guest. (I'm not interested in the Gnus problem in particular, as I
don't use splitting.)

Also, I don't think "doesn't seem to be shared" is a fair description.
I think most of them acknowledge the problem, but don't consider it
important enough to invest the (apparently rather big) amount of effort
necessary to fix it.

The brokenness is hardly deniable.

-- 
Štěpán

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-21  9:14       ` Štěpán Němec
@ 2011-10-23 23:08         ` Tim Landscheidt
  2011-10-24 10:40           ` Štěpán Němec
  0 siblings, 1 reply; 11+ messages in thread
From: Tim Landscheidt @ 2011-10-23 23:08 UTC (permalink / raw)
  To: info-gnus-english

Štěpán Němec <stepnem@gmail.com> wrote:

>>>> Nothing, I think :-). I personally don't use fancy split-
>>>> ting, but a deeper look at (at least Gnus 5.13's) code seems
>>>> to locate the culprit in Emacs' *backward* regular expres-
>>>> sion "non-greedity": Position point at the end of
>>>> "bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
>>>> is matched.

>>> That's not "non-greedity", that's brokenness IMO. Reported as GNU Emacs
>>> bug #9681.

>> As your opinion doesn't seem to be shared by the Emacs de-
>> velopers, I'd suggest filing a bug for Gnus (as well :-)).

> Be my guest. (I'm not interested in the Gnus problem in particular, as I
> don't use splitting.)

I do, but not the fancy variant.

> Also, I don't think "doesn't seem to be shared" is a fair description.
> I think most of them acknowledge the problem, but don't consider it
> important enough to invest the (apparently rather big) amount of effort
> necessary to fix it.

> The brokenness is hardly deniable.

Don't get me wrong, fundamentally I'm with you on that it
constitutes what in MySQL would be called a "gotcha" - some-
thing totally unexpected, yet somewhere documented.

  But I disagree with you to consider this a *bug*, as it
*is* documented and it is logistically impossible to change
Emacs' behaviour in this regard without assessing whether
this breaks potentially quazillions lines of code. And I
don't think that the developers consider it a bug either.

Tim


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-23 23:08         ` Tim Landscheidt
@ 2011-10-24 10:40           ` Štěpán Němec
  2011-10-24 22:08             ` Tim Landscheidt
  0 siblings, 1 reply; 11+ messages in thread
From: Štěpán Němec @ 2011-10-24 10:40 UTC (permalink / raw)
  To: info-gnus-english

On Mon, 24 Oct 2011 01:08:05 +0200
Tim Landscheidt wrote:

> Don't get me wrong, fundamentally I'm with you on that it
> constitutes what in MySQL would be called a "gotcha" - some-
> thing totally unexpected, yet somewhere documented.

No it's not documented (definitely not clearly and not enough, which is
really the same as not at all), see my bug report.

>   But I disagree with you to consider this a *bug*, as it
> *is* documented and it is logistically impossible to change
> Emacs' behaviour in this regard without assessing whether
> this breaks potentially quazillions lines of code. And I
> don't think that the developers consider it a bug either.

That's ridiculous. Are you saying that making _backward_ regexp search
_really_ do backward, not forward matching is going to break
"quazillions lines of code"? Not that I really care, but I very much
doubt that. What's much more likely is that it would _fix_ both known
and unknown breakages and unexpected behaviours, such as the one that
started this thread.

Anyway, if you want to continue arguing about this please use the bug
thread.

-- 
Štěpán

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-24 10:40           ` Štěpán Němec
@ 2011-10-24 22:08             ` Tim Landscheidt
  0 siblings, 0 replies; 11+ messages in thread
From: Tim Landscheidt @ 2011-10-24 22:08 UTC (permalink / raw)
  To: info-gnus-english

Štěpán Němec <stepnem@gmail.com> wrote:

>> Don't get me wrong, fundamentally I'm with you on that it
>> constitutes what in MySQL would be called a "gotcha" - some-
>> thing totally unexpected, yet somewhere documented.

> No it's not documented (definitely not clearly and not enough, which is
> really the same as not at all), see my bug report.

It is, cf. the docstring for re-search-backward.

>>   But I disagree with you to consider this a *bug*, as it
>> *is* documented and it is logistically impossible to change
>> Emacs' behaviour in this regard without assessing whether
>> this breaks potentially quazillions lines of code. And I
               ^^^^^^^^^^^
>> don't think that the developers consider it a bug either.

> That's ridiculous. Are you saying that making _backward_ regexp search
> _really_ do backward, not forward matching is going to break
> "quazillions lines of code"? Not that I really care, but I very much
> doubt that. What's much more likely is that it would _fix_ both known
> and unknown breakages and unexpected behaviours, such as the one that
> started this thread.

No, I haven't said that it is going to break them, but that
it has the potential to do so.

> Anyway, if you want to continue arguing about this please use the bug
> thread.

I don't want to continue and not to be misquoted.

Tim


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
       [not found]   ` <mailman.1950.1320607816.15868.info-gnus-english@gnu.org>
@ 2012-01-03 23:51     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 11+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-01-03 23:51 UTC (permalink / raw)
  To: Adam Sjøgren; +Cc: info-gnus-english

asjo@koldfront.dk (Adam Sjøgren) writes:

> Thanks for the suggestion - it doesn't seem to work, though:
>
> I added the defadvice to my .gnus (and eval'ed it); the split I am using
> is this:
>
>   ("X-Bugzilla-Product" "\\(\\w+\\)" "bugzilla.\\1")
>
> and the email has this header:
>
>   X-Bugzilla-Product: evince
>
> But B q says:
>
>   This message would go to bugzilla.e
>
> If I type M-x posix-search-backward \w+ I only get one char matched -
> same as typing C-u M-x \w+ - so maybe I overlooked something, or some
> other remedy is needed?

Searching forward does match the entire word, though, so I would have
thought that that should work, actually.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
  2011-10-26  8:41 ` Nicolas Berthier
@ 2011-11-06 19:30   ` Adam Sjøgren
       [not found]   ` <mailman.1950.1320607816.15868.info-gnus-english@gnu.org>
  1 sibling, 0 replies; 11+ messages in thread
From: Adam Sjøgren @ 2011-11-06 19:30 UTC (permalink / raw)
  To: info-gnus-english

On Wed, 26 Oct 2011 10:41:45 +0200, Nicolas wrote:

> You  may be  interrested in  trying the  following hack  (or something
> similar)  which makes  fancy splitting  use a  greedy yet  much slower
> backward searching algorithm:

> (defadvice nnmail-split-it (around my-greedy-nnmail-split-it activate)
>   "Around advice temporarily replacing `re-search-backward' with
> `posix-search-backward'."
>   (flet ((re-search-backward (regexp &optional bound noerr count)
> 	   (posix-search-backward regexp bound noerr count)))
>       ad-do-it))

Thanks for the suggestion - it doesn't seem to work, though:

I added the defadvice to my .gnus (and eval'ed it); the split I am using
is this:

  ("X-Bugzilla-Product" "\\(\\w+\\)" "bugzilla.\\1")

and the email has this header:

  X-Bugzilla-Product: evince

But B q says:

  This message would go to bugzilla.e

If I type M-x posix-search-backward \w+ I only get one char matched -
same as typing C-u M-x \w+ - so maybe I overlooked something, or some
other remedy is needed?


  Best regards,

    Adam

-- 
 "I always liked songs with parentheses in the title."        Adam Sjøgren
                                                         asjo@koldfront.dk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fancy split - simple regexp problem
       [not found] <mailman.3990.1316544929.939.info-gnus-english@gnu.org>
@ 2011-10-26  8:41 ` Nicolas Berthier
  2011-11-06 19:30   ` Adam Sjøgren
       [not found]   ` <mailman.1950.1320607816.15868.info-gnus-english@gnu.org>
  0 siblings, 2 replies; 11+ messages in thread
From: Nicolas Berthier @ 2011-10-26  8:41 UTC (permalink / raw)
  To: Adam Sjøgren; +Cc: info-gnus-english

The following message is a courtesy copy of an article
that has been posted to gnu.emacs.gnus as well.


Adam Sjøgren wrote:
> [...]
>
> When I try with this split:
>
>   ("X-Bugzilla-Product" "\\w+" "bugzilla.\\&")
>
> B q tells me:
>
>   "This message would go to bugzilla.m"
>
> [...]
>
> I'm sure there is an obvious thing, I just can't see it.

As you can see on the other branch of this thread, others are trolling
on this  topic. So  no, it's  not that obvious.   IMO, the  problem is
indeed greedyness of `re-search-backward'  that is not POSIX compliant
apparently [1].

You  may be  interrested in  trying the  following hack  (or something
similar)  which makes  fancy splitting  use a  greedy yet  much slower
backward searching algorithm:

(defadvice nnmail-split-it (around my-greedy-nnmail-split-it activate)
  "Around advice temporarily replacing `re-search-backward' with
`posix-search-backward'."
  (flet ((re-search-backward (regexp &optional bound noerr count)
	   (posix-search-backward regexp bound noerr count)))
      ad-do-it))

Nicolas

[1] http://www.gnu.org/s/emacs/manual/html_node/elisp/POSIX-Regexps.html

-- 
Nicolas Berthier                              FSF Student member #7975

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-01-03 23:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-20 18:55 Fancy split - simple regexp problem Adam Sjøgren
2011-10-06  5:05 ` Tim Landscheidt
2011-10-06  9:22   ` Štěpán Němec
2011-10-20 12:50     ` Tim Landscheidt
2011-10-21  9:14       ` Štěpán Němec
2011-10-23 23:08         ` Tim Landscheidt
2011-10-24 10:40           ` Štěpán Němec
2011-10-24 22:08             ` Tim Landscheidt
     [not found] <mailman.3990.1316544929.939.info-gnus-english@gnu.org>
2011-10-26  8:41 ` Nicolas Berthier
2011-11-06 19:30   ` Adam Sjøgren
     [not found]   ` <mailman.1950.1320607816.15868.info-gnus-english@gnu.org>
2012-01-03 23:51     ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).