From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [50.116.15.146]) by inbox.vuxu.org (Postfix) with ESMTP id 12FCD22384 for ; Wed, 22 May 2024 22:18:05 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id ECF6A43B3A; Thu, 23 May 2024 06:18:00 +1000 (AEST) Received: from mcvoy.com (mcvoy.com [192.169.23.250]) by minnie.tuhs.org (Postfix) with ESMTPS id E75BE43A94 for ; Thu, 23 May 2024 06:17:56 +1000 (AEST) Received: by mcvoy.com (Postfix, from userid 3546) id E0A7735E982; Wed, 22 May 2024 13:17:53 -0700 (PDT) Date: Wed, 22 May 2024 13:17:53 -0700 From: Larry McVoy To: tuhs@tuhs.org Message-ID: <20240522201753.GL25728@mcvoy.com> References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> <20240522184904.GK25728@mcvoy.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240522184904.GK25728@mcvoy.com> User-Agent: Mutt/1.5.24 (2015-08-30) Message-ID-Hash: ST6NJ2YNXXXCVJ4OMFPG44NJ3KF6PF7M X-Message-ID-Hash: ST6NJ2YNXXXCVJ4OMFPG44NJ3KF6PF7M X-MailFrom: lm@mcvoy.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: A fuzzy awk. List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Wayne teased this into a stand alone library here: https://github.com/wscott/bksupport On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote: > On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote: > > On Tue, May 21, 2024 at 2:12???PM Luther Johnson > > wrote: > > > > > I like this anecdote because it points out the difference between being > > > able to handle and process bizarre conditions, as if they were something > > > that should work, which is maybe not that helpful, vs. detecting them and > > > doing something reasonable, like failiing with a "limit exceeded" message > > > > > That is in fact precisely how the DEC compiler handled the 100 nested > > parentheses condition. > > > > > . A silent, insidious failure down the line because a limit was exceeded > > > is never good. > > > > > Amen! One should always do bounds checking when dealing with fixed-size > > aggregate data structures. One compiler that I worked on got a bug report > > of bad code being generated. The problem was an illegal optimization that > > never should have triggered but did due to a corrupted data table. Finding > > the culprit of the corruption took hours. It finally turned out to be due > > to overflow of an adjacent data table in use elsewhere in the compiler. > > The routine to add another entry to that table didn't check for table > > overflow. > > We invented a data structure that gets around this problem nicely. It's > an array of pointers that starts at [1] instead of [0]. The [0] > entry encodes 2 things: > > In the upper bits, the log(2) the size of the array. So all arrays > have at least [0] and [1]. So 2 pointers is the smallest array and > that was important to us, we wanted it to scale up and scale down. > > In the lower bits, we record the number of used entries in the array. > We assumed 32 bit pointers and with those we got ~134 million entries > as our maximum number of entries. > > Usage is like > > char **space = allocLines(4); // start with space for 4 entries > > space = addLine(space, "I am [1]"); > space = addLine(space, "I am [2]"); > space = addLine(space, "I am [3]"); > space = addLine(space, "I am [4]"); // realloc's to 8 entries > > freelines(space, 0); // second arg is typically 0 or free() > > It works GREAT. We used it all over BitKeeper, for stuff as small as > commit comments to arrays of data structures. It scales down, scales > up. Helper functions: > > /* > * liblines - interfaces for autoexpanding data structures > * > * s= allocLines(n) > * pre allocate space for slightly less than N entries. > * s = addLine(s, line) > * add line to s, allocating as needed. > * line must be a pointer to preallocated space. > * freeLines(s, freep) > * free the lines array; if freep is set, call that on each entry. > * if freep is 0, do not free each entry. > * buf = popLine(s) > * return the most recently added line (not an alloced copy of it) > * reverseLines(s) > * reverse the order of the lines in the array > * sortLines(space, compar) > * sort the lines using the compar function if set, else string_sort() > * removeLine(s, which, freep) > * look for all lines which match "which" and remove them from the array > * returns number of matches found > * removeLineN(s, i, freep) > * remove the 'i'th line. > * lines = splitLine(buf, delim, lines) > * split buf on any/all chars in delim and put the tokens in lines. > * buf = joinLines(":", s) > * return one string which is all the strings glued together with ":" > * does not free s, caller must free s. > * buf = findLine(lines, needle); > * Return the index the line in lines that matches needle > */ > > It's all open source, apache licensed, but you'd have to tease it out of > the bitkeeper source tree. Wouldn't be that hard and it would be useful. -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat