From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29879 invoked by alias); 8 Sep 2014 14:10:55 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 33130 Received: (qmail 26044 invoked from network); 8 Sep 2014 14:10:41 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, SPF_HELO_PASS autolearn=ham version=3.3.2 X-AuditID: cbfec7f5-b7f776d000003e54-66-540db8de9c68 Date: Mon, 08 Sep 2014 15:10:37 +0100 From: Peter Stephenson To: Zsh Hackers' List Subject: VARARR in pattern code Message-id: <20140908151037.5ec31e8a@pwslap01u.europe.root.pri> In-reply-to: References: Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupmluLIzCtJLcpLzFFi42I5/e/4Nd17O3hDDLb8VrI42PyQyYHRY9XB D0wBjFFcNimpOZllqUX6dglcGecWLmQq2MlesfDtNMYGxjesXYycHBICJhIXr85lgbDFJC7c W8/WxcjFISSwlFHiy6ZWFghnOZPE7CO3wTpYBFQlVr25zw5iswkYSkzdNJuxi5GDQ0RAW6L9 oxhIWFhAXmLGrOssIGFeAXuJ7it1IGFOgWCJnwc/M4PYQgIBElPPrQCz+QX0Ja7+/cQEcYO9 xMwrZxhBbF4BQYkfk++B3cYsoCWxeVsTK4QtL7F5zVvmCYwCs5CUzUJSNgtJ2QJG5lWMoqml yQXFSem5RnrFibnFpXnpesn5uZsYIQH4dQfj0mNWhxgFOBiVeHgTrvKECLEmlhVX5h5ilOBg VhLh9VnHGyLEm5JYWZValB9fVJqTWnyIkYmDU6qB8dTLac4Bt1kKbe0jnr1u/u7s4RO7MD/1 ZcBiYZb3z/3+P1zEI/YnZ5v5pNIdHKy9N1V38J4qjHNbXet36bxT45Y+28oE67WfJv9vqXv9 /dCDGsHnXE8r+Fqbso8ZGEx7s9KoYquLkt1byyuVfua9x1b/3+lr1xSpuu1FbZjoZR+7goKK GKt4JZbijERDLeai4kQAEvfH5h4CAAA= While looking at the problem with repeated *'s, I notice that inside the pattern code for closures --- *'s, #'s and ##'s --- there's a VARARR. /* * Array to record the start of characters for * backtracking. */ VARARR(char, charstart, patinend-patinput); If you're interested, that was added to fix a very similar problem with pathological backtracking involving negated matches with "~" or "^". It's otherwise a strange thing to have in pattern matching code (and it may be why the performance with multiple "*"s was quite so bad). We just made all VARARR's heap allocation. It occurs to me this one can be hit a lot of times when backtracking through a pattern with a lot of closures. I wonder if this one should be a special case --- zalloc if efficient enough? I haven't done any experiments so may be being alarmist. It might be possible to optimise the use of charstart out entirely in some cases. pws