From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Sun, 26 Oct 2008 21:55:57 -0700 From: "Russ Cox" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net> Subject: Re: [9fans] non greedy regular expressions In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <6520c845566013ada472281bf9c0da73@coraid.com> <2e4a50a0810241652r38d2aa1ft2b6fb9104d2988ae@mail.gmail.com> Topicbox-Message-UUID: 27714752-ead4-11e9-9d60-3106f5b1d025 As I explained in an earlier post, your suggested > /ABC(.*?)CBA/ is less robust than Charles's spelled-out version, since yours doesn't handle nested expressions well. That's a serious enough limitation that your scenario stops being a compelling argument for leftmost-first matching and non-greedy operators. > Then universality wins and we may end up using perl/python exclusively. No one said you couldn't. If they let you do your job faster than acme and sam do, no one here is going to tell you not to use them. > My question then is: wouldn't it be better to switch to the > leftmost-first paradigm, hence open possible use of (non-)greedy > operators, and in a way contribute to an accord with perl/python > syntax? And use a good algorithm for that all? But maybe it's not > worth and the current state is just sufficient... Leftmost-first matching is difficult to explain. When POSIX had to define the rules for regexps, they chose to define them as leftmost-longest even though all the implementations were leftmost-first, because describing leftmost-first precisely was too complex. Leftmost-first matching is not widely understood. At the beginning of this conversation, you believed that in the absence of non-greedy operators, leftmost-first matching produced leftmost-longest matches. I would guess that more people believe this than know otherwise. Leftmost-first matching only lets you write a regexp that is half right (see above) in the use case that you are focusing on. I don't see the benefit of switching from leftmost-longest to leftmost-first. Certainly an "accord with perl/python" is not good justification. Playing catch-up with Perl and Python regexps can only lead to madness. Both systems are complex enough that essentially no one completely understands them. Russ