* ssam-1.6 and libutf-2.7
@ 1997-02-21 14:31 Alistair Crooks
From: Alistair Crooks @ 1997-02-21 14:31 UTC (permalink / raw)
  To: sam-fans

[This has already been posted to the wily list. At the risk of offending
those people who will thus see this message twice, I thought some folks
here might be interested. - agc]

I've made available new versions of libutf, some utf routines including
UTF-aware regular expressions, and ssam, a stream editor using the sam
command set. Please note the namechange, and the new URLs:

A complete list of changes follows at the end of this mail, but the
changes to ssam are mainly cosmetic and bug fixes, whilst I have started
implementing language-specific matching and ordering, using a function
called utflangcmp().

Many thanks to Bengt Kleberg ( for the
provision of Swedish, Finnish, Danish and Norwegian alphabets.

As usual, the correct way to install the software is:

% tar xvzf utf-2.7.tar.gz
% cd utf-2.7
% ./configure
% make tst
% make install
% cd ..
% tar xvzf ssam-1.6.tar.gz
% cd ssam-1.6
% ./configure
% make tst
% make install

This release has been tested on UTS 4.3.2 (S390 mainframe), Solaris
2.4 (SS5), and NetBSD/i386 1.2C.

Take care,

ssam-1.6 changes
+ tarted up explanation code, and added a test
+ moved stuff around in ssam()
+ moved ure match arrays from the program stack to within ssam_t.
We now allocate space for the match offsets when we know how many
we'll need. This removes the hardcoded limit on subexpressions.
+ implemented a saner way of introducing default `p' command. We now
do this when parsing, rather than on execution. Removes some cruft
from execution functions.
+ ran gcc -Wall again, and cleaned up miscellaneous warnings,
changing etc on the way.
+ added code to free match array, if requested via flags. Modify
existing free checks, so that de-allocation takes place if storage
was allocated, not if it was used.
+ re-code 'x' and 'y' commands to take advantage of improved ure ^
matching code. 'g', 'v' and 's' commands are unaffected. This is
actually a significant speedup, especially when searching for
anchored matches in large strings/files.
+ split writing files part of ssam() out into ssamcommit(), and
call it accordingly. This gives us more control over file writing.
+ changed Makefile to track change to the name of the library
+ deleted "urelang.h", which doesn't exist anymore, and added "utf.h"
utf-2.7 changes
+ fixed a bug in ^ matching - anchored searches were only tried once,
which didn't take into account the case where the string to be matched
included newline characters.
+ re-arrange tests so that error tests are done at end. Add a test
for anchored beginning of line matching
+ added utflangcmp function, with a couple of supporting functions
to get ordinal number of bits. Added findword test program, and
one extra test case.
+ Swedish and Finnish alphabets from
(Bengt Kleberg)
+ changed langcoll.utf file so that letters in brackets [] in an
alphabet have the same collation ordering (e.g.  v and w in Swedish)
Modified all utf functions that use utfrune on the alphabets
+ bug fix for definition of ETCDIR - not incorporated in previous
changes from Alan Watson (my mistake)
+ renamed library from libure to libutf (at suggestion of Alan
Watson). Changed Makefile to make this possible.
+ fixed bug where v and w in Swedish weren't comparing as the same
+ Norwegian and Danish alphabets from
(Bengt Kleberg)
+ fixed a bug whereby language names were occasionally misconstrued
(the old "English not found, using English" problem)

