From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [IPv6:2600:3c01:e000:146::1]) by inbox.vuxu.org (Postfix) with ESMTP id 7FFF827168 for ; Wed, 22 May 2024 20:49:20 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 5DD7343B60; Thu, 23 May 2024 04:49:14 +1000 (AEST) Received: from mcvoy.com (mcvoy.com [192.169.23.250]) by minnie.tuhs.org (Postfix) with ESMTPS id D3AED43B56 for ; Thu, 23 May 2024 04:49:07 +1000 (AEST) Received: by mcvoy.com (Postfix, from userid 3546) id 0654035E982; Wed, 22 May 2024 11:49:04 -0700 (PDT) Date: Wed, 22 May 2024 11:49:04 -0700 From: Larry McVoy To: Paul Winalski Message-ID: <20240522184904.GK25728@mcvoy.com> References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Message-ID-Hash: EGV3QZLHTV6WYGH6DM47NVGS36QEX6WX X-Message-ID-Hash: EGV3QZLHTV6WYGH6DM47NVGS36QEX6WX X-MailFrom: lm@mcvoy.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Luther Johnson , tuhs@tuhs.org X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: A fuzzy awk. List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote: > On Tue, May 21, 2024 at 2:12???PM Luther Johnson > wrote: > > > I like this anecdote because it points out the difference between being > > able to handle and process bizarre conditions, as if they were something > > that should work, which is maybe not that helpful, vs. detecting them and > > doing something reasonable, like failiing with a "limit exceeded" message > > > That is in fact precisely how the DEC compiler handled the 100 nested > parentheses condition. > > > . A silent, insidious failure down the line because a limit was exceeded > > is never good. > > > Amen! One should always do bounds checking when dealing with fixed-size > aggregate data structures. One compiler that I worked on got a bug report > of bad code being generated. The problem was an illegal optimization that > never should have triggered but did due to a corrupted data table. Finding > the culprit of the corruption took hours. It finally turned out to be due > to overflow of an adjacent data table in use elsewhere in the compiler. > The routine to add another entry to that table didn't check for table > overflow. We invented a data structure that gets around this problem nicely. It's an array of pointers that starts at [1] instead of [0]. The [0] entry encodes 2 things: In the upper bits, the log(2) the size of the array. So all arrays have at least [0] and [1]. So 2 pointers is the smallest array and that was important to us, we wanted it to scale up and scale down. In the lower bits, we record the number of used entries in the array. We assumed 32 bit pointers and with those we got ~134 million entries as our maximum number of entries. Usage is like char **space = allocLines(4); // start with space for 4 entries space = addLine(space, "I am [1]"); space = addLine(space, "I am [2]"); space = addLine(space, "I am [3]"); space = addLine(space, "I am [4]"); // realloc's to 8 entries freelines(space, 0); // second arg is typically 0 or free() It works GREAT. We used it all over BitKeeper, for stuff as small as commit comments to arrays of data structures. It scales down, scales up. Helper functions: /* * liblines - interfaces for autoexpanding data structures * * s= allocLines(n) * pre allocate space for slightly less than N entries. * s = addLine(s, line) * add line to s, allocating as needed. * line must be a pointer to preallocated space. * freeLines(s, freep) * free the lines array; if freep is set, call that on each entry. * if freep is 0, do not free each entry. * buf = popLine(s) * return the most recently added line (not an alloced copy of it) * reverseLines(s) * reverse the order of the lines in the array * sortLines(space, compar) * sort the lines using the compar function if set, else string_sort() * removeLine(s, which, freep) * look for all lines which match "which" and remove them from the array * returns number of matches found * removeLineN(s, i, freep) * remove the 'i'th line. * lines = splitLine(buf, delim, lines) * split buf on any/all chars in delim and put the tokens in lines. * buf = joinLines(":", s) * return one string which is all the strings glued together with ":" * does not free s, caller must free s. * buf = findLine(lines, needle); * Return the index the line in lines that matches needle */ It's all open source, apache licensed, but you'd have to tease it out of the bitkeeper source tree. Wouldn't be that hard and it would be useful.