From: Larry McVoy <lm@mcvoy.com>
To: tuhs@tuhs.org
Subject: [TUHS] Re: A fuzzy awk.
Date: Wed, 22 May 2024 13:17:53 -0700 [thread overview]
Message-ID: <20240522201753.GL25728@mcvoy.com> (raw)
In-Reply-To: <20240522184904.GK25728@mcvoy.com>
Wayne teased this into a stand alone library here:
https://github.com/wscott/bksupport
On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote:
> On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote:
> > On Tue, May 21, 2024 at 2:12???PM Luther Johnson <luther.johnson@makerlisp.com>
> > wrote:
> >
> > > I like this anecdote because it points out the difference between being
> > > able to handle and process bizarre conditions, as if they were something
> > > that should work, which is maybe not that helpful, vs. detecting them and
> > > doing something reasonable, like failiing with a "limit exceeded" message
> > >
> > That is in fact precisely how the DEC compiler handled the 100 nested
> > parentheses condition.
> >
> > > . A silent, insidious failure down the line because a limit was exceeded
> > > is never good.
> > >
> > Amen! One should always do bounds checking when dealing with fixed-size
> > aggregate data structures. One compiler that I worked on got a bug report
> > of bad code being generated. The problem was an illegal optimization that
> > never should have triggered but did due to a corrupted data table. Finding
> > the culprit of the corruption took hours. It finally turned out to be due
> > to overflow of an adjacent data table in use elsewhere in the compiler.
> > The routine to add another entry to that table didn't check for table
> > overflow.
>
> We invented a data structure that gets around this problem nicely. It's
> an array of pointers that starts at [1] instead of [0]. The [0]
> entry encodes 2 things:
>
> In the upper bits, the log(2) the size of the array. So all arrays
> have at least [0] and [1]. So 2 pointers is the smallest array and
> that was important to us, we wanted it to scale up and scale down.
>
> In the lower bits, we record the number of used entries in the array.
> We assumed 32 bit pointers and with those we got ~134 million entries
> as our maximum number of entries.
>
> Usage is like
>
> char **space = allocLines(4); // start with space for 4 entries
>
> space = addLine(space, "I am [1]");
> space = addLine(space, "I am [2]");
> space = addLine(space, "I am [3]");
> space = addLine(space, "I am [4]"); // realloc's to 8 entries
>
> freelines(space, 0); // second arg is typically 0 or free()
>
> It works GREAT. We used it all over BitKeeper, for stuff as small as
> commit comments to arrays of data structures. It scales down, scales
> up. Helper functions:
>
> /*
> * liblines - interfaces for autoexpanding data structures
> *
> * s= allocLines(n)
> * pre allocate space for slightly less than N entries.
> * s = addLine(s, line)
> * add line to s, allocating as needed.
> * line must be a pointer to preallocated space.
> * freeLines(s, freep)
> * free the lines array; if freep is set, call that on each entry.
> * if freep is 0, do not free each entry.
> * buf = popLine(s)
> * return the most recently added line (not an alloced copy of it)
> * reverseLines(s)
> * reverse the order of the lines in the array
> * sortLines(space, compar)
> * sort the lines using the compar function if set, else string_sort()
> * removeLine(s, which, freep)
> * look for all lines which match "which" and remove them from the array
> * returns number of matches found
> * removeLineN(s, i, freep)
> * remove the 'i'th line.
> * lines = splitLine(buf, delim, lines)
> * split buf on any/all chars in delim and put the tokens in lines.
> * buf = joinLines(":", s)
> * return one string which is all the strings glued together with ":"
> * does not free s, caller must free s.
> * buf = findLine(lines, needle);
> * Return the index the line in lines that matches needle
> */
>
> It's all open source, apache licensed, but you'd have to tease it out of
> the bitkeeper source tree. Wouldn't be that hard and it would be useful.
--
---
Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat
next prev parent reply other threads:[~2024-05-22 20:18 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-20 14:09 Serissa
2024-05-21 1:56 ` Rob Pike
2024-05-21 2:47 ` Larry McVoy
2024-05-21 2:54 ` Lawrence Stewart
2024-05-21 3:36 ` Rob Pike
2024-05-21 11:59 ` Peter Weinberger (温博格) via TUHS
2024-05-21 3:53 ` George Michaelson
2024-05-21 16:59 ` Paul Winalski
2024-05-21 17:56 ` segaloco via TUHS
2024-05-21 18:12 ` Luther Johnson
2024-05-22 15:37 ` Paul Winalski
2024-05-22 18:49 ` Larry McVoy
2024-05-22 20:17 ` Larry McVoy [this message]
2024-05-22 3:26 ` Dave Horsfall
2024-05-22 5:08 ` Alexis
2024-05-22 13:12 ` Warner Losh
-- strict thread matches above, loose matches on Subject: below --
2024-05-23 13:49 Douglas McIlroy
2024-05-23 20:52 ` Rob Pike
2024-05-24 5:41 ` andrew
2024-05-24 7:17 ` Ralph Corderoy
2024-05-24 7:41 ` Rob Pike
2024-05-24 11:56 ` Dan Halbert
2024-05-25 0:17 ` Bakul Shah via TUHS
2024-05-25 0:57 ` G. Branden Robinson
2024-05-25 13:56 ` David Arnold
2024-05-25 17:18 ` Paul Winalski
2024-05-25 17:36 ` Tom Perrine
2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
2024-05-20 13:25 ` [TUHS] " Chet Ramey
2024-05-20 13:41 ` [TUHS] Re: A fuzzy awk Ralph Corderoy
2024-05-20 14:26 ` Chet Ramey
2024-05-22 13:44 ` arnold
2024-05-20 13:54 ` Ralph Corderoy
2024-05-19 23:08 [TUHS] The 'usage: ...' message. (Was: On Bloat...) Douglas McIlroy
2024-05-20 0:58 ` [TUHS] " Rob Pike
2024-05-20 3:19 ` arnold
2024-05-20 9:20 ` [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Ralph Corderoy
2024-05-20 13:10 ` [TUHS] " Chet Ramey
2024-05-20 13:30 ` [TUHS] Re: A fuzzy awk Ralph Corderoy
2024-05-20 13:48 ` Chet Ramey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240522201753.GL25728@mcvoy.com \
--to=lm@mcvoy.com \
--cc=tuhs@tuhs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).