The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] A Reiser tour de force
@ 2022-04-01 15:59 Douglas McIlroy
  2022-04-01 17:15 ` David Barto
  0 siblings, 1 reply; 9+ messages in thread
From: Douglas McIlroy @ 2022-04-01 15:59 UTC (permalink / raw)
  To: TUHS main list

The recent discussion about Research choosing BSD's paging over
Reiser-London's brought to mind a stunning program by Reiser that
Research did adopt.

A critical primitive in the Blit terminal was bitblt (block transfer
of a rectangular area). It was used ubiquitously, for example to
refresh data when window-stacking changed, to move data within a
window, or to pop up a menu.. The display memory was word-oriented, so
bitblt was fraught with niggling details about bit alignment and
overlap of source and destination. A general bitblt subroutine was a
rats' nest of conditionals--grossly inefficient for important special
cases like scrolling.

Bitblt got refined (i.e. elaborated) several times before Reiser did
away with it entirely. Instead he wrote a just-in-time generator of
optimal code. Thousands of distinct variants, which varied in size
from 16 to 72 bytes, could be produced by the same 400 lines of
assembler code.

Doug

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-01 15:59 [TUHS] A Reiser tour de force Douglas McIlroy
@ 2022-04-01 17:15 ` David Barto
  2022-04-01 17:26   ` Jon Steinhart
  0 siblings, 1 reply; 9+ messages in thread
From: David Barto @ 2022-04-01 17:15 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: TUHS main list



> On Apr 1, 2022, at 8:59 AM, Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
> 
> The recent discussion about Research choosing BSD's paging over
> Reiser-London's brought to mind a stunning program by Reiser that
> Research did adopt.
> 
> A critical primitive in the Blit terminal was bitblt (block transfer
> of a rectangular area). It was used ubiquitously, for example to
> refresh data when window-stacking changed, to move data within a
> window, or to pop up a menu.. The display memory was word-oriented, so
> bitblt was fraught with niggling details about bit alignment and
> overlap of source and destination. A general bitblt subroutine was a
> rats' nest of conditionals--grossly inefficient for important special
> cases like scrolling.
> 
> Bitblt got refined (i.e. elaborated) several times before Reiser did
> away with it entirely. Instead he wrote a just-in-time generator of
> optimal code. Thousands of distinct variants, which varied in size
> from 16 to 72 bytes, could be produced by the same 400 lines of
> assembler code.
> 
> Doug

Does this exist for the rest of us to study?

	David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-01 17:15 ` David Barto
@ 2022-04-01 17:26   ` Jon Steinhart
  2022-04-01 19:41     ` Steffen Nurpmeso
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Steinhart @ 2022-04-01 17:26 UTC (permalink / raw)
  To: TUHS main list

David Barto writes:
> 
>
>
> > On Apr 1, 2022, at 8:59 AM, Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
> > 
> > The recent discussion about Research choosing BSD's paging over
> > Reiser-London's brought to mind a stunning program by Reiser that
> > Research did adopt.
> > 
> > A critical primitive in the Blit terminal was bitblt (block transfer
> > of a rectangular area). It was used ubiquitously, for example to
> > refresh data when window-stacking changed, to move data within a
> > window, or to pop up a menu.. The display memory was word-oriented, so
> > bitblt was fraught with niggling details about bit alignment and
> > overlap of source and destination. A general bitblt subroutine was a
> > rats' nest of conditionals--grossly inefficient for important special
> > cases like scrolling.
> > 
> > Bitblt got refined (i.e. elaborated) several times before Reiser did
> > away with it entirely. Instead he wrote a just-in-time generator of
> > optimal code. Thousands of distinct variants, which varied in size
> > from 16 to 72 bytes, could be produced by the same 400 lines of
> > assembler code.
> > 
> > Doug
>
> Does this exist for the rest of us to study?
>
> 	David

It's not insanely complicated by modern standards.  Without any knowledge
of other work, I did the same thing for a 68020 based graphics system where
the JIT code went into the I-cache and was amazingly fast for its day.

If I remember correctly, things started with an outer-loop test to see
if there were overlapping regions to determine whether to go forward
to backwards to avoid having the destination trash the source.

Then, there was an inner loop test for whether or not the left edge
was on a word boundary, the right edge was on a word boundary, or
whether or not both edges were inside of the same word.  Depending
on the results the generated inner loop code had left edge handling,
non-edge handling, and right edge handling code.

Finally, code was generated plugging in whatever was needed for the
op-code.

Maybe this is just a semantic nit, but to me this is just a better
implementation of bitblt, not doing away with it.

Jon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-01 17:26   ` Jon Steinhart
@ 2022-04-01 19:41     ` Steffen Nurpmeso
  2022-04-01 21:29       ` Rob Pike
  2022-04-01 21:43       ` Jon Steinhart
  0 siblings, 2 replies; 9+ messages in thread
From: Steffen Nurpmeso @ 2022-04-01 19:41 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

Jon Steinhart wrote in
 <202204011726.231HQFm03349496@darkstar.fourwinds.com>:
 |David Barto writes:
 |>> On Apr 1, 2022, at 8:59 AM, Douglas McIlroy <douglas.mcilroy@dartmouth.e\
 |>> du> wrote:
 |>> The recent discussion about Research choosing BSD's paging over
 |>> Reiser-London's brought to mind a stunning program by Reiser that
 |>> Research did adopt.
 |>> 
 |>> A critical primitive in the Blit terminal was bitblt (block transfer
 |>> of a rectangular area). It was used ubiquitously, for example to
 ...
 |>> Bitblt got refined (i.e. elaborated) several times before Reiser did
 |>> away with it entirely. Instead he wrote a just-in-time generator of
 |>> optimal code. Thousands of distinct variants, which varied in size
 |>> from 16 to 72 bytes, could be produced by the same 400 lines of
 |>> assembler code.
 ...
 |> Does this exist for the rest of us to study?
 ...
 |It's not insanely complicated by modern standards.  Without any knowledge
 |of other work, I did the same thing for a 68020 based graphics system where
 |the JIT code went into the I-cache and was amazingly fast for its day.
 |
 |If I remember correctly, things started with an outer-loop test to see
 |if there were overlapping regions to determine whether to go forward
 |to backwards to avoid having the destination trash the source.
 ...

Only to add that "modern standard" C libraries define
"overlapping" by means of exclusivity, meaning that memcpy(x,
&x[1], 1) is an invalid overlapping that requires memmove() to be
used.

 |#?132(0.02 1/76)|alp-2022:tmp$ cat t.c
 |#include <string.h>
 |void doit(char *cp);
 |int main(void){
 |  char buf[3] = {'1', '2', '\0'};
 |  doit(buf);
 |  return buf[0];
 |}
 |void doit(char *cp){
 |  memcpy(cp, &cp[1], 2);
 |}
 |#?0(0.02 1/76)|alp-2022:tmp$ gcc -D_FORTIFY_SOURCE=2 -O -o zt t.c
 |#?0(0.02 1/76)|alp-2022:tmp$ ./zt
 |Illegal instruction
 |#?132(0.02 1/76)|alp-2022:tmp$ gcc --version
 |gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219
 |Copyright (C) 2021 Free Software Foundation, Inc.
 |This is free software; see the source for copying conditions.  There is NO
 |warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 |
 |#?0(0.02 1/76)|alp-2022:tmp$

Your mileage will vary though.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-01 19:41     ` Steffen Nurpmeso
@ 2022-04-01 21:29       ` Rob Pike
  2022-04-01 21:31         ` Rob Pike
  2022-04-01 21:43       ` Jon Steinhart
  1 sibling, 1 reply; 9+ messages in thread
From: Rob Pike @ 2022-04-01 21:29 UTC (permalink / raw)
  To: Jon Steinhart, TUHS main list

A copy of our paper is at https://9p.io/cm/cs/doc/87/archtr.ps.gz

-rob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-01 21:29       ` Rob Pike
@ 2022-04-01 21:31         ` Rob Pike
  0 siblings, 0 replies; 9+ messages in thread
From: Rob Pike @ 2022-04-01 21:31 UTC (permalink / raw)
  To: Jon Steinhart, TUHS main list

And yes, this _was_ bitblt, its apotheosis, not its elimination.

-rbo

On Sat, Apr 2, 2022 at 8:29 AM Rob Pike <robpike@gmail.com> wrote:
>
> A copy of our paper is at https://9p.io/cm/cs/doc/87/archtr.ps.gz
>
> -rob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-01 19:41     ` Steffen Nurpmeso
  2022-04-01 21:29       ` Rob Pike
@ 2022-04-01 21:43       ` Jon Steinhart
  1 sibling, 0 replies; 9+ messages in thread
From: Jon Steinhart @ 2022-04-01 21:43 UTC (permalink / raw)
  To: TUHS main list

Steffen Nurpmeso writes:
> Jon Steinhart wrote in
>  <202204011726.231HQFm03349496@darkstar.fourwinds.com>:
>  |David Barto writes:
>  |>> On Apr 1, 2022, at 8:59 AM, Douglas McIlroy <douglas.mcilroy@dartmouth.e\
>  |>> du> wrote:
>  |>> The recent discussion about Research choosing BSD's paging over
>  |>> Reiser-London's brought to mind a stunning program by Reiser that
>  |>> Research did adopt.
>  |>> 
>  |>> A critical primitive in the Blit terminal was bitblt (block transfer
>  |>> of a rectangular area). It was used ubiquitously, for example to
>  ...
>  |>> Bitblt got refined (i.e. elaborated) several times before Reiser did
>  |>> away with it entirely. Instead he wrote a just-in-time generator of
>  |>> optimal code. Thousands of distinct variants, which varied in size
>  |>> from 16 to 72 bytes, could be produced by the same 400 lines of
>  |>> assembler code.
>  ...
>  |> Does this exist for the rest of us to study?
>  ...
>  |It's not insanely complicated by modern standards.  Without any knowledge
>  |of other work, I did the same thing for a 68020 based graphics system where
>  |the JIT code went into the I-cache and was amazingly fast for its day.
>  |
>  |If I remember correctly, things started with an outer-loop test to see
>  |if there were overlapping regions to determine whether to go forward
>  |to backwards to avoid having the destination trash the source.
>  ...
>
> Only to add that "modern standard" C libraries define
> "overlapping" by means of exclusivity, meaning that memcpy(x,
> &x[1], 1) is an invalid overlapping that requires memmove() to be
> used.

Uh, yeah, but not relevant at least to the code I did which was written
in C but generated machine code, not calls to library functions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
  2022-04-03 11:22 Paul Ruizendaal via TUHS
@ 2022-04-03 12:24 ` Rob Pike
  0 siblings, 0 replies; 9+ messages in thread
From: Rob Pike @ 2022-04-03 12:24 UTC (permalink / raw)
  To: Paul Ruizendaal; +Cc: TUHS main list

That is not Reiser's implementation. It's not even a compiling
implementation, it's just some loops. I think that's one of Bart's.

Reiser's was just assembly language, in a single file, not pseudo-C.

I went looking around github but couldn't find it. I've got some feelers out.

-rob

On Sun, Apr 3, 2022 at 9:25 PM Paul Ruizendaal via TUHS
<tuhs@minnie.tuhs.org> wrote:
>
> A not-very-thorough search at tuhs turned up V9/jerq/src/lib/j/bitblt.c
> It appears to be a pre-Reiser bitblt, not what was asked for.
>
>
> The Reiser code is in the V8 jerq tarball that Dan Cross donated:
> v8jerq.tar.bz2
>
> It is in file blit/src/libj/bitblt.s (attached below for convenience). It is 750 lines of 68K assembler. It does not appear to have been ported to the Bellmac 32 CPU. Maybe it did not make sense in that context.
>
> Paul
>
>
> There also is a file “bitblt.C” in that same directory (dated May 82, versus Aug 82 for the assembler file) that seems to have a similar approach coded up in C, with a big switch where each case has a lot of “asm()” statements. It is about 400 lines long. No author is mentioned.
>
> I have not investigated deeply, but at first glance it is possible that the approach was first coded up as a kind of threaded code in C (with inline asm) and later that Summer redone as fully hand coded assembler. Just a guess.
>
> Paul
>
> ====
>
> #include <jerq.h>
> #define LEFTDIR 8
> #define NOSHIFT 4
> #define DAMMIT 4 /* you'll see why */
> #undef sw
>
> bitblt(sm,r,dm,p,fc)
> Bitmap *sm,*dm;
> Rectangle r;
> Point p;
> int fc;
> {
> register Word *source,*dest,*_source,*_dest; /* %a2-%a5 */
> register UWord m,mask1,mask2; /* %d2,%d3,%d4 */
> register int a,b,i; /* %d5,%d6,%d7 */
>
> int j,h,w,dx1,sw,dw;
>
> /* clip to the destination Bitmap only */
> #define rp dest
> #define pp source
> rp = (int *) &(dm->rect);
> pp = (int *) &p;
> if ((a = *rp++ - *pp++) > 0) {
> *(pp-1) += a;
> r.origin.x += a;
> }
> if ((a = *rp++ - *pp) > 0) {
> *pp += a;
> r.origin.y += a;
> }
> if ((a = r.origin.x + *rp++ - *(pp-1)) < r.corner.x)
> r.corner.x = a;
> if ((a = r.origin.y + *rp - *pp) < r.corner.y)
> r.corner.y = a;
> i = r.corner.y - r.origin.y; /* going to be h */
> a = r.corner.x - r.origin.x - 1; /* going to be dx1 */
> if (i <= 0 || a < 0)
> return;
> if (a < 16)
> goto narrow;
> h = i;
> dx1 = a; /* i and b are regs, avoid work! */
> sw = sm->width << 1; /* sleazy hack to avoid shift */
> dw = dm->width << 1; /* in outer, inner loops */
> w = ((p.x+dx1) >> 4) - (p.x >> 4) - 1; /* inner loop */
> mask1 = ~topbits[p.x & 15];
> mask2 = topbits[((p.x+dx1) & 15) + 1];
> if (sm == dm) { /* may have to mess with loop order */
> if (r.origin.y < p.y) { /* swap top with bottom */
> r.origin.y += h-1;
> p.y += h-1;
> sw = -sw;
> dw = -dw;
> }
> if (r.origin.x < p.x) { /* swap left with right */
> fc |= LEFTDIR;
> r.origin.x += dx1;
> p.x += dx1;
> }
> }
> _dest = addr(dm,p);
> _source = addr(sm,r.origin);
> a = (p.x&15) - (r.origin.x&15);
> if (a < 0)
> a += 16;
> else /* a == 0 means no shift, remember that */
> _source--; /* else grab long and shift right */
> b = 16 - a;
> if (a == 0)
> fc |= NOSHIFT;
> source = _source;
> dest = _dest;
> switch (fc) {
> case F_STORE | NOSHIFT:
> b = w;
> _source++;
> source = _source;
> a = h; /* a is free => use it */
> do {
> *dest++ = (~mask1 & *dest) | (mask1 & *source++);
> if ((i = b>>2) > 0) do {
> *((long *)dest)++ = *((long *)source)++;
> *((long *)dest)++ = *((long *)source)++;
> } while (--i > 0);
> if ((i = b&3) > 0) do {
> *dest++ = *source++;
> } while (--i > 0);
> *dest = (~mask2 & *dest) | (mask2 & *source);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--a > 0);
> break;
> case F_STORE:
> do {
> asm(" mov.l (%a2)+,%d2 # (long) m = *source++");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> *dest++ = (~mask1 & *dest) | (mask1 & m);
> asm(" ror.l %d6,%d2 # rotate m right by b");
> if ((i = w) > 0) do {
> m = *source++;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest++ = m;
> asm(" ror.l %d6,%d2 # m >> b");
> } while (--i > 0);
> m = *source;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest = (~mask2 & *dest) | (mask2 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_STORE | NOSHIFT | LEFTDIR:
> b = w;
> _source++;
> source = _source;
> a = h;
> do {
> *dest = (~mask2 & *dest) | (mask2 & *source);
> if ((i = b>>2) > 0) do {
> *--((long *)dest) = *--((long *)source);
> *--((long *)dest) = *--((long *)source);
> } while (--i > 0);
> if ((i = b&3) > 0) do {
> *(--dest) = *(--source);
> } while (--i > 0);
> dest--;
> *dest = (~mask1 & *dest) | (mask1 & *(--source));
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--a > 0);
> break;
> case F_STORE | LEFTDIR:
> do {
> asm(" mov.l (%a2),%d2 # (long) m = *source");
> asm(" ror.l %d5,%d2 # m >> a");
> *dest = (~mask2 & *dest) | (mask2 & m);
> asm(" rol.l %d5,%d2 # m << a");
> if ((i = w) > 0) do {
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> *(--dest) = m;
> asm(" rol.l %d5,%d2 # m << a");
> } while (--i > 0);
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> dest--;
> *dest = (~mask1 & *dest) | (mask1 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_OR:
> case F_OR | NOSHIFT:
> do {
> asm(" mov.l (%a2)+,%d2 # (long) m = *source++");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> *dest++ |= (mask1 & m);
> asm(" ror.l %d6,%d2 # rotate m right by b");
> if ((i = w) > 0) do {
> m = *source++;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest++ |= m;
> asm(" ror.l %d6,%d2 # m >> b");
> } while (--i > 0);
> m = *source;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest |= (mask2 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_OR | LEFTDIR:
> case F_OR | NOSHIFT | LEFTDIR:
> do {
> asm(" mov.l (%a2),%d2 # (long) m = *source");
> asm(" ror.l %d5,%d2 # m >> a");
> *dest |= (mask2 & m);
> asm(" rol.l %d5,%d2 # m << a");
> if ((i = w) > 0) do {
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> *(--dest) |= m;
> asm(" rol.l %d5,%d2 # m << a");
> } while (--i > 0);
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> dest--;
> *dest |= (mask1 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_CLR:
> case F_CLR | NOSHIFT:
> do {
> asm(" mov.l (%a2)+,%d2 # (long) m = *source++");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> *dest++ &= ~(mask1 & m);
> asm(" ror.l %d6,%d2 # rotate m right by b");
> if ((i = w) > 0) do {
> m = *source++;
> asm(" ror.l %d5,%d2 # m >> a");
> asm(" not.w %d2 # m = ~m");
> *dest++ &= m;
> asm(" ror.l %d6,%d2 # m >> b");
> } while (--i > 0);
> m = *source;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest &= ~(mask2 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_CLR | LEFTDIR:
> case F_CLR | NOSHIFT | LEFTDIR:
> do {
> asm(" mov.l (%a2),%d2 # (long) m = *source");
> asm(" ror.l %d5,%d2 # m >> a");
> *dest &= ~(mask2 & m);
> asm(" rol.l %d5,%d2 # m << a");
> if ((i = w) > 0) do {
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> asm(" not.w %d2 # m = ~m");
> *(--dest) &= m;
> asm(" rol.l %d5,%d2 # m << a");
> } while (--i > 0);
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> dest--;
> *dest &= ~(mask1 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_XOR | NOSHIFT:
> b = w;
> _source++;
> source = _source;
> a = h;
> do {
> *dest++ ^= (mask1 & *source++);
> if ((i = b>>2) > 0) do {
> *((long *)dest)++ ^= *((long *)source)++;
> *((long *)dest)++ ^= *((long *)source)++;
> } while (--i > 0);
> if ((i = b&3) > 0) do {
> *dest++ ^= *source++;
> } while (--i > 0);
> *dest ^= (mask2 & *source);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--a > 0);
> break;
> case F_XOR:
> default:
> do {
> asm(" mov.l (%a2)+,%d2 # (long) m = *source++");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> *dest++ ^= (mask1 & m);
> asm(" ror.l %d6,%d2 # rotate m right by b");
> if ((i = w) > 0) do {
> m = *source++;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest++ ^= m;
> asm(" ror.l %d6,%d2 # m >> b");
> } while (--i > 0);
> m = *source;
> asm(" ror.l %d5,%d2 # m >> a");
> *dest ^= (mask2 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> case F_XOR | NOSHIFT | LEFTDIR:
> b = w;
> _source++;
> source = _source;
> a = h;
> do {
> *dest ^= (mask2 & *source);
> if ((i = b>>2) > 0) do {
> *--((long *)dest) ^= *--((long *)source);
> *--((long *)dest) ^= *--((long *)source);
> } while (--i > 0);
> if ((i = b&3) > 0) do {
> *(--dest) ^= *(--source);
> } while (--i > 0);
> dest--;
> *dest ^= (mask1 & *(--source));
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--a > 0);
> break;
> case F_XOR | LEFTDIR:
> do {
> asm(" mov.l (%a2),%d2 # (long) m = *source");
> asm(" ror.l %d5,%d2 # m >> a");
> *dest ^= (mask2 & m);
> asm(" rol.l %d5,%d2 # m << a");
> if ((i = w) > 0) do {
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> *(--dest) ^= m;
> asm(" rol.l %d5,%d2 # m << a");
> } while (--i > 0);
> m = *(--source);
> asm(" rol.l %d6,%d2 # m << b");
> dest--;
> *dest ^= (mask1 & m);
> (char *) _source += sw;
> source = _source;
> (char *) _dest += dw;
> dest = _dest;
> } while (--h > 0);
> break;
> }
> return;
> narrow:
> /*
> * width is 16 bits or less, so we can do it by reading and
> * writing 32 bits at a time
> */
> _source = (Word *) sm;
> _dest = (Word *) dm;
> mask1 = ((Bitmap *) _source)->width; /* source increment */
> mask1 <<= 1; /* hack to add to an address register */
> mask2 = ((Bitmap *) _dest)->width; /* dest increment */
> mask2 <<= 1;
> if (_source == _dest && r.origin.y < p.y) { /* swap top with bottom */
> r.origin.y += i-1;
> p.y += i-1;
> mask1 = -mask1;
> mask2 = -mask2;
> }
> asm(" mov &0,%d6 # (long) b = 0");
> b = topbits[a+1];
> a = (16 - (p.x & 15)); /* hocus pocus to get long mask */
> asm(" rol.l %d5,%d6 # (long) b <<= a");
> a = 16 - a - (r.origin.x & 15); /* shift constant */
> if (a < 0) { /* guess what! -1 == 63 to the 68000!!! */
> fc |= DAMMIT; /* not fatal, just slow */
> }
> source = addr(_source,r.origin);
> dest = addr(_dest,p);
> switch (fc) {
> case F_STORE:
> case F_STORE | DAMMIT:
> asm(" mov.l %d6,%d1 # prepare inverse mask");
> asm(" not.l %d1 ");
> do {
> asm(" mov.l (%a2),%d2 # m = *source");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> asm(" and.l %d6,%d2 # m &= b");
> asm(" mov.l (%a3),%d0 # m |= *dest&~b");
> asm(" and.l %d1,%d0 ");
> asm(" or.l %d0,%d2 ");
> asm(" mov.l %d2,(%a3) # *dest = m");
> (char *) source += (int) mask1;
> (char *) dest += (int) mask2;
> } while (--i > 0);
> break;
> case F_OR:
> case F_OR | DAMMIT:
> do {
> asm(" mov.l (%a2),%d2 # m = *source");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> asm(" and.l %d6,%d2 # m &= b");
> asm(" or.l %d2,(%a3) # *dest |= m");
> (char *) source += (int) mask1;
> (char *) dest += (int) mask2;
> } while (--i > 0);
> break;
> case F_CLR:
> case F_CLR | DAMMIT:
> do {
> asm(" mov.l (%a2),%d2 # m = *source");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> asm(" and.l %d6,%d2 # m &= b");
> asm(" not.l %d2 # m = ^m");
> asm(" and.l %d2,(%a3) # *dest &= m");
> (char *) source += (int) mask1;
> (char *) dest += (int) mask2;
> } while (--i > 0);
> break;
> case F_XOR:
> do {
> asm(" mov.l (%a2),%d2 # m = *source");
> asm(" ror.l %d5,%d2 # rotate m right by a");
> asm(" and.l %d6,%d2 # m &= b");
> asm(" eor.l %d2,(%a3) # *dest ^= m");
> (char *) source += (int) mask1;
> (char *) dest += (int) mask2;
> } while (--i > 0);
> break;
> case F_XOR | DAMMIT:
> a = -a;
> do {
> asm(" mov.l (%a2),%d2 # m = *source");
> asm(" rol.l %d5,%d2 # rotate m left by a");
> asm(" and.l %d6,%d2 # m &= b");
> asm(" eor.l %d2,(%a3) # *dest ^= m");
> (char *) source += (int) mask1;
> (char *) dest += (int) mask2;
> } while (--i > 0);
> break;
> }
> return;
> }
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TUHS] A Reiser tour de force
@ 2022-04-03 11:22 Paul Ruizendaal via TUHS
  2022-04-03 12:24 ` Rob Pike
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Ruizendaal via TUHS @ 2022-04-03 11:22 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 12603 bytes --]

>> A not-very-thorough search at tuhs turned up V9/jerq/src/lib/j/bitblt.c
>> It appears to be a pre-Reiser bitblt, not what was asked for.
> 
> 
> The Reiser code is in the V8 jerq tarball that Dan Cross donated:
> v8jerq.tar.bz2 <https://www.tuhs.org/Archive/Distributions/Research/Dan_Cross_v8/v8jerq.tar.bz2>
> 
> It is in file blit/src/libj/bitblt.s (attached below for convenience). It is 750 lines of 68K assembler. It does not appear to have been ported to the Bellmac 32 CPU. Maybe it did not make sense in that context.
> 
> Paul

There also is a file “bitblt.C” in that same directory (dated May 82, versus Aug 82 for the assembler file) that seems to have a similar approach coded up in C, with a big switch where each case has a lot of “asm()” statements. It is about 400 lines long. No author is mentioned.

I have not investigated deeply, but at first glance it is possible that the approach was first coded up as a kind of threaded code in C (with inline asm) and later that Summer redone as fully hand coded assembler. Just a guess.

Paul

====

#include <jerq.h>
#define LEFTDIR	8
#define NOSHIFT 4
#define DAMMIT	4		/* you'll see why */
#undef	sw

bitblt(sm,r,dm,p,fc)
Bitmap *sm,*dm;
Rectangle r;
Point p;
int fc;
{
	register Word *source,*dest,*_source,*_dest;	/* %a2-%a5 */
	register UWord m,mask1,mask2;		/* %d2,%d3,%d4 */
	register int a,b,i;			/* %d5,%d6,%d7 */

	int j,h,w,dx1,sw,dw;

	/* clip to the destination Bitmap only */
#define	rp	dest
#define pp	source
	rp = (int *) &(dm->rect);
	pp = (int *) &p;
	if ((a = *rp++ - *pp++) > 0) {
		*(pp-1) += a;
		r.origin.x += a;
	}
	if ((a = *rp++ - *pp) > 0) {
		*pp += a;
		r.origin.y += a;
	}
	if ((a = r.origin.x + *rp++ - *(pp-1)) < r.corner.x)
		r.corner.x = a;
	if ((a = r.origin.y + *rp - *pp) < r.corner.y)
		r.corner.y = a;
	i = r.corner.y - r.origin.y;	/* going to be h */
	a = r.corner.x - r.origin.x - 1;	/* going to be dx1 */
	if (i <= 0 || a < 0)
		return;
	if (a < 16)
		goto narrow;
	h = i; 
	dx1 = a;		/* i and b are regs, avoid work! */
	sw = sm->width << 1;	/* sleazy hack to avoid shift */
	dw = dm->width << 1;	/* in outer, inner loops */
	w = ((p.x+dx1) >> 4) - (p.x >> 4) - 1;	/* inner loop */
	mask1 = ~topbits[p.x & 15];
	mask2 = topbits[((p.x+dx1) & 15) + 1];
	if (sm == dm) {		/* may have to mess with loop order */
		if (r.origin.y < p.y) {		/* swap top with bottom */
			r.origin.y += h-1;
			p.y += h-1;
			sw = -sw;
			dw = -dw;
		}
		if (r.origin.x < p.x) {	/* swap left with right */
			fc |= LEFTDIR;
			r.origin.x += dx1;
			p.x += dx1;
		}
	}
	_dest = addr(dm,p);
	_source = addr(sm,r.origin);
	a = (p.x&15) - (r.origin.x&15);
	if (a < 0)
		a += 16;
	else	/* a == 0 means no shift, remember that */
		_source--;	/* else grab long and shift right */
	b = 16 - a;
	if (a == 0)
		fc |= NOSHIFT;
	source = _source;
	dest = _dest;
	switch (fc) {
	case F_STORE | NOSHIFT:
		b = w;
		_source++;
		source = _source;
		a = h;		/* a is free => use it */
		do {
			*dest++ = (~mask1 & *dest) | (mask1 & *source++);
			if ((i = b>>2) > 0) do {
				*((long *)dest)++ = *((long *)source)++;
				*((long *)dest)++ = *((long *)source)++;
			} while (--i > 0);
			if ((i = b&3) > 0) do {
				*dest++ = *source++;
			} while (--i > 0);
			*dest = (~mask2 & *dest) | (mask2 & *source);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--a > 0);
		break;
	case F_STORE:
		do {
asm("			mov.l	(%a2)+,%d2	# (long) m = *source++");
asm("			ror.l	%d5,%d2		# rotate m right by a");
			*dest++ = (~mask1 & *dest) | (mask1 & m);
asm("			ror.l	%d6,%d2		# rotate m right by b");
			if ((i = w) > 0) do {
				m = *source++;
asm("				ror.l	%d5,%d2 	# m >> a");
				*dest++ = m;
asm("				ror.l	%d6,%d2		# m >> b");
			} while (--i > 0);
			m = *source;
asm("			ror.l	%d5,%d2		# m >> a");
			*dest = (~mask2 & *dest) | (mask2 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_STORE | NOSHIFT | LEFTDIR:
		b = w;
		_source++;
		source = _source;
		a = h;
		do {
			*dest = (~mask2 & *dest) | (mask2 & *source);
			if ((i = b>>2) > 0) do {
				*--((long *)dest) = *--((long *)source);
				*--((long *)dest) = *--((long *)source);
			} while (--i > 0);
			if ((i = b&3) > 0) do {
				*(--dest) = *(--source);
			} while (--i > 0);
			dest--;
			*dest = (~mask1 & *dest) | (mask1 & *(--source));
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--a > 0);
		break;
	case F_STORE | LEFTDIR:
		do {
asm("			mov.l	(%a2),%d2	# (long) m = *source");
asm("			ror.l	%d5,%d2		# m >> a");
			*dest = (~mask2 & *dest) | (mask2 & m);
asm("			rol.l	%d5,%d2		# m << a");
			if ((i = w) > 0) do {
				m = *(--source);
asm("				rol.l	%d6,%d2		# m << b");
				*(--dest) = m;
asm("				rol.l	%d5,%d2		# m << a");
			} while (--i > 0);
			m = *(--source);
asm("			rol.l	%d6,%d2		# m << b");
			dest--;
			*dest = (~mask1 & *dest) | (mask1 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_OR:
	case F_OR | NOSHIFT:
		do {
asm("			mov.l	(%a2)+,%d2	# (long) m = *source++");
asm("			ror.l	%d5,%d2		# rotate m right by a");
			*dest++ |= (mask1 & m);
asm("			ror.l	%d6,%d2		# rotate m right by b");
			if ((i = w) > 0) do {
				m = *source++;
asm("				ror.l	%d5,%d2 	# m >> a");
				*dest++ |= m;
asm("				ror.l	%d6,%d2		# m >> b");
			} while (--i > 0);
			m = *source;
asm("			ror.l	%d5,%d2		# m >> a");
			*dest |= (mask2 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_OR | LEFTDIR:
	case F_OR | NOSHIFT | LEFTDIR:
		do {
asm("			mov.l	(%a2),%d2	# (long) m = *source");
asm("			ror.l	%d5,%d2		# m >> a");
			*dest |= (mask2 & m);
asm("			rol.l	%d5,%d2		# m << a");
			if ((i = w) > 0) do {
				m = *(--source);
asm("				rol.l	%d6,%d2		# m << b");
				*(--dest) |= m;
asm("				rol.l	%d5,%d2		# m << a");
			} while (--i > 0);
			m = *(--source);
asm("			rol.l	%d6,%d2		# m << b");
			dest--;
			*dest |= (mask1 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_CLR:
	case F_CLR | NOSHIFT:
		do {
asm("			mov.l	(%a2)+,%d2	# (long) m = *source++");
asm("			ror.l	%d5,%d2		# rotate m right by a");
			*dest++ &= ~(mask1 & m);
asm("			ror.l	%d6,%d2		# rotate m right by b");
			if ((i = w) > 0) do {
				m = *source++;
asm("				ror.l	%d5,%d2 	# m >> a");
asm("				not.w	%d2		# m = ~m");
				*dest++ &= m;
asm("				ror.l	%d6,%d2		# m >> b");
			} while (--i > 0);
			m = *source;
asm("			ror.l	%d5,%d2		# m >> a");
			*dest &= ~(mask2 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_CLR | LEFTDIR:
	case F_CLR | NOSHIFT | LEFTDIR:
		do {
asm("			mov.l	(%a2),%d2	# (long) m = *source");
asm("			ror.l	%d5,%d2		# m >> a");
			*dest &= ~(mask2 & m);
asm("			rol.l	%d5,%d2		# m << a");
			if ((i = w) > 0) do {
				m = *(--source);
asm("				rol.l	%d6,%d2		# m << b");
asm("				not.w	%d2		# m = ~m");
				*(--dest) &= m;
asm("				rol.l	%d5,%d2		# m << a");
			} while (--i > 0);
			m = *(--source);
asm("			rol.l	%d6,%d2		# m << b");
			dest--;
			*dest &= ~(mask1 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_XOR | NOSHIFT:
		b = w;
		_source++;
		source = _source;
		a = h;
		do {
			*dest++ ^= (mask1 & *source++);
			if ((i = b>>2) > 0) do {
				*((long *)dest)++ ^= *((long *)source)++;
				*((long *)dest)++ ^= *((long *)source)++;
			} while (--i > 0);
			if ((i = b&3) > 0) do {
				*dest++ ^= *source++;
			} while (--i > 0);
			*dest ^= (mask2 & *source);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--a > 0);
		break;
	case F_XOR:
	default:
		do {
asm("			mov.l	(%a2)+,%d2	# (long) m = *source++");
asm("			ror.l	%d5,%d2		# rotate m right by a");
			*dest++ ^= (mask1 & m);
asm("			ror.l	%d6,%d2		# rotate m right by b");
			if ((i = w) > 0) do {
				m = *source++;
asm("				ror.l	%d5,%d2 	# m >> a");
				*dest++ ^= m;
asm("				ror.l	%d6,%d2		# m >> b");
			} while (--i > 0);
			m = *source;
asm("			ror.l	%d5,%d2		# m >> a");
			*dest ^= (mask2 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	case F_XOR | NOSHIFT | LEFTDIR:
		b = w;
		_source++;
		source = _source;
		a = h;
		do {
			*dest ^= (mask2 & *source);
			if ((i = b>>2) > 0) do {
				*--((long *)dest) ^= *--((long *)source);
				*--((long *)dest) ^= *--((long *)source);
			} while (--i > 0);
			if ((i = b&3) > 0) do {
				*(--dest) ^= *(--source);
			} while (--i > 0);
			dest--;
			*dest ^= (mask1 & *(--source));
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--a > 0);
		break;
	case F_XOR | LEFTDIR:
		do {
asm("			mov.l	(%a2),%d2	# (long) m = *source");
asm("			ror.l	%d5,%d2		# m >> a");
			*dest ^= (mask2 & m);
asm("			rol.l	%d5,%d2		# m << a");
			if ((i = w) > 0) do {
				m = *(--source);
asm("				rol.l	%d6,%d2		# m << b");
				*(--dest) ^= m;
asm("				rol.l	%d5,%d2		# m << a");
			} while (--i > 0);
			m = *(--source);
asm("			rol.l	%d6,%d2		# m << b");
			dest--;
			*dest ^= (mask1 & m);
			(char *) _source += sw;
			source = _source;
			(char *) _dest += dw;
			dest = _dest;
		} while (--h > 0);
		break;
	}
	return;
narrow:
	/*
	 * width is 16 bits or less, so we can do it by reading and
	 * writing 32 bits at a time
	 */
	_source = (Word *) sm;
	_dest = (Word *) dm;
	mask1 = ((Bitmap *) _source)->width;	/* source increment */
	mask1 <<= 1;		/* hack to add to an address register */
	mask2 = ((Bitmap *) _dest)->width;	/* dest increment */
	mask2 <<= 1;
	if (_source == _dest && r.origin.y < p.y) {	/* swap top with bottom */
		r.origin.y += i-1;
		p.y += i-1;
		mask1 = -mask1;
		mask2 = -mask2;
	}
asm("	mov	&0,%d6		# (long) b = 0");
	b = topbits[a+1];
	a = (16 - (p.x & 15));	/* hocus pocus to get long mask */
asm("	rol.l	%d5,%d6		# (long) b <<= a");
	a = 16 - a - (r.origin.x & 15);	/* shift constant */
	if (a < 0) {		/* guess what! -1 == 63 to the 68000!!! */
		fc |= DAMMIT;	/* not fatal, just slow */
	}
	source = addr(_source,r.origin);
	dest = addr(_dest,p);
	switch (fc) {
	case F_STORE:
	case F_STORE | DAMMIT:
asm("		mov.l	%d6,%d1		# prepare inverse mask");
asm("		not.l	%d1		");
		do {
asm("			mov.l	(%a2),%d2	# m = *source");
asm("			ror.l	%d5,%d2		# rotate m right by a");
asm("			and.l	%d6,%d2		# m &= b");
asm("			mov.l	(%a3),%d0	# m |= *dest&~b");
asm("			and.l	%d1,%d0		");
asm("			or.l	%d0,%d2		");
asm("			mov.l	%d2,(%a3)	# *dest = m");
			(char *) source += (int) mask1;
			(char *) dest += (int) mask2;
		} while (--i > 0);
		break;
	case F_OR:
	case F_OR | DAMMIT:
		do {
asm("			mov.l	(%a2),%d2	# m = *source");
asm("			ror.l	%d5,%d2		# rotate m right by a");
asm("			and.l	%d6,%d2		# m &= b");
asm("			or.l	%d2,(%a3)	# *dest |= m");
			(char *) source += (int) mask1;
			(char *) dest += (int) mask2;
		} while (--i > 0);
		break;
	case F_CLR:
	case F_CLR | DAMMIT:
		do {
asm("			mov.l	(%a2),%d2	# m = *source");
asm("			ror.l	%d5,%d2		# rotate m right by a");
asm("			and.l	%d6,%d2		# m &= b");
asm("			not.l	%d2		# m = ^m");
asm("			and.l	%d2,(%a3)	# *dest &= m");
			(char *) source += (int) mask1;
			(char *) dest += (int) mask2;
		} while (--i > 0);
		break;
	case F_XOR:
		do {
asm("			mov.l	(%a2),%d2	# m = *source");
asm("			ror.l	%d5,%d2		# rotate m right by a");
asm("			and.l	%d6,%d2		# m &= b");
asm("			eor.l	%d2,(%a3)	# *dest ^= m");
			(char *) source += (int) mask1;
			(char *) dest += (int) mask2;
		} while (--i > 0);
		break;
	case F_XOR | DAMMIT:
		a = -a;
		do {
asm("			mov.l	(%a2),%d2	# m = *source");
asm("			rol.l	%d5,%d2		# rotate m left by a");
asm("			and.l	%d6,%d2		# m &= b");
asm("			eor.l	%d2,(%a3)	# *dest ^= m");
			(char *) source += (int) mask1;
			(char *) dest += (int) mask2;
		} while (--i > 0);
		break;
	}
	return;
}


[-- Attachment #2: Type: text/html, Size: 58167 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-03 12:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 15:59 [TUHS] A Reiser tour de force Douglas McIlroy
2022-04-01 17:15 ` David Barto
2022-04-01 17:26   ` Jon Steinhart
2022-04-01 19:41     ` Steffen Nurpmeso
2022-04-01 21:29       ` Rob Pike
2022-04-01 21:31         ` Rob Pike
2022-04-01 21:43       ` Jon Steinhart
2022-04-03 11:22 Paul Ruizendaal via TUHS
2022-04-03 12:24 ` Rob Pike

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).