The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Larry McVoy <lm@mcvoy.com>
To: tuhs@tuhs.org
Subject: [TUHS] Re: A fuzzy awk.
Date: Wed, 22 May 2024 13:17:53 -0700	[thread overview]
Message-ID: <20240522201753.GL25728@mcvoy.com> (raw)
In-Reply-To: <20240522184904.GK25728@mcvoy.com>

Wayne teased this into a stand alone library here: 

https://github.com/wscott/bksupport

On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote:
> On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote:
> > On Tue, May 21, 2024 at 2:12???PM Luther Johnson <luther.johnson@makerlisp.com>
> > wrote:
> > 
> > > I like this anecdote because it points out the difference between being
> > > able to handle and process bizarre conditions, as if they were something
> > > that should work, which is maybe not that helpful, vs. detecting them and
> > > doing something reasonable, like failiing with a "limit exceeded" message
> > >
> > That is in fact precisely how the DEC compiler handled the 100 nested
> > parentheses condition.
> > 
> > > . A silent, insidious failure down the line because a limit was exceeded
> > > is never good.
> > >
> > Amen!  One should always do bounds checking when dealing with fixed-size
> > aggregate data structures.  One compiler that I worked on got a bug report
> > of bad code being generated.  The problem was an illegal optimization that
> > never should have triggered but did due to a corrupted data table.  Finding
> > the culprit of the corruption took hours.  It finally turned out to be due
> > to overflow of an adjacent data table in use elsewhere in the compiler.
> > The routine to add another entry to that table didn't check for table
> > overflow.
> 
> We invented a data structure that gets around this problem nicely.  It's
> an array of pointers that starts at [1] instead of [0].  The [0]
> entry encodes 2 things:
> 
> In the upper bits, the log(2) the size of the array.  So all arrays
> have at least [0] and [1].  So 2 pointers is the smallest array and
> that was important to us, we wanted it to scale up and scale down.
> 
> In the lower bits, we record the number of used entries in the array.
> We assumed 32 bit pointers and with those we got ~134 million entries
> as our maximum number of entries.
> 
> Usage is like
> 
> char **space = allocLines(4);	// start with space for 4 entries
> 
> space = addLine(space, "I am [1]");
> space = addLine(space, "I am [2]");
> space = addLine(space, "I am [3]");
> space = addLine(space, "I am [4]");	// realloc's to 8 entries
> 
> freelines(space, 0);	// second arg is typically 0 or free()
> 
> It works GREAT.  We used it all over BitKeeper, for stuff as small as
> commit comments to arrays of data structures.  It scales down, scales
> up.  Helper functions:
> 
> /*
>  * liblines - interfaces for autoexpanding data structures
>  *
>  * s= allocLines(n)
>  *      pre allocate space for slightly less than N entries.
>  * s = addLine(s, line)
>  *      add line to s, allocating as needed.
>  *      line must be a pointer to preallocated space.
>  * freeLines(s, freep)
>  *      free the lines array; if freep is set, call that on each entry.
>  *      if freep is 0, do not free each entry.
>  * buf = popLine(s)
>  *      return the most recently added line (not an alloced copy of it)
>  * reverseLines(s)
>  *      reverse the order of the lines in the array
>  * sortLines(space, compar)
>  *      sort the lines using the compar function if set, else string_sort()
>  * removeLine(s, which, freep)
>  *      look for all lines which match "which" and remove them from the array
>  *      returns number of matches found
>  * removeLineN(s, i, freep)
>  *      remove the 'i'th line.
>  * lines = splitLine(buf, delim, lines)
>  *      split buf on any/all chars in delim and put the tokens in lines.
>  * buf = joinLines(":", s)
>  *      return one string which is all the strings glued together with ":"
>  *      does not free s, caller must free s.
>  * buf = findLine(lines, needle);
>  *      Return the index the line in lines that matches needle
>  */
> 
> It's all open source, apache licensed, but you'd have to tease it out of
> the bitkeeper source tree.  Wouldn't be that hard and it would be useful.

-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

  reply	other threads:[~2024-05-22 20:18 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-20 14:09 Serissa
2024-05-21  1:56 ` Rob Pike
2024-05-21  2:47   ` Larry McVoy
2024-05-21  2:54     ` Lawrence Stewart
2024-05-21  3:36       ` Rob Pike
2024-05-21 11:59         ` Peter Weinberger (温博格) via TUHS
2024-05-21  3:53   ` George Michaelson
2024-05-21 16:59   ` Paul Winalski
2024-05-21 17:56     ` segaloco via TUHS
2024-05-21 18:12     ` Luther Johnson
2024-05-22 15:37       ` Paul Winalski
2024-05-22 18:49         ` Larry McVoy
2024-05-22 20:17           ` Larry McVoy [this message]
2024-05-22  3:26     ` Dave Horsfall
2024-05-22  5:08       ` Alexis
2024-05-22 13:12         ` Warner Losh
  -- strict thread matches above, loose matches on Subject: below --
2024-05-23 13:49 Douglas McIlroy
2024-05-23 20:52 ` Rob Pike
2024-05-24  5:41   ` andrew
2024-05-24  7:17   ` Ralph Corderoy
2024-05-24  7:41     ` Rob Pike
2024-05-24 11:56     ` Dan Halbert
2024-05-25  0:17   ` Bakul Shah via TUHS
2024-05-25  0:57     ` G. Branden Robinson
2024-05-25 13:56     ` David Arnold
2024-05-25 17:18     ` Paul Winalski
2024-05-25 17:36       ` Tom Perrine
2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
2024-05-20 13:25 ` [TUHS] " Chet Ramey
2024-05-20 13:41   ` [TUHS] Re: A fuzzy awk Ralph Corderoy
2024-05-20 14:26     ` Chet Ramey
2024-05-22 13:44     ` arnold
2024-05-20 13:54 ` Ralph Corderoy
2024-05-19 23:08 [TUHS] The 'usage: ...' message. (Was: On Bloat...) Douglas McIlroy
2024-05-20  0:58 ` [TUHS] " Rob Pike
2024-05-20  3:19   ` arnold
2024-05-20  9:20     ` [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Ralph Corderoy
2024-05-20 13:10       ` [TUHS] " Chet Ramey
2024-05-20 13:30         ` [TUHS] Re: A fuzzy awk Ralph Corderoy
2024-05-20 13:48           ` Chet Ramey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240522201753.GL25728@mcvoy.com \
    --to=lm@mcvoy.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).