The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] why is sum reporting different checksum's between v6 and v7
@ 2015-12-12  0:03 Will Senn
  2015-12-12  0:10 ` Clem cole
  2015-12-12  0:38 ` Random832
  0 siblings, 2 replies; 8+ messages in thread
From: Will Senn @ 2015-12-12  0:03 UTC (permalink / raw)


All,

While working on the latest episode of my saga about moving files 
between v6 and v7, I noticed that the sum utility from v6 reports a 
different checksum than it does using the sum utility from v7 for the 
same file. To confirm, I did the following on both systems:

# echo "Hello, World" > hi.txt
# cat hi.txt
Hello, World

Then on v6:
# sum hi.txt
1106 1

But on v7:
# sum hi.txt
37264     1

There is no man page for the utility on v6, and it's assembler. On v7, 
there's a manpage and it's C:
man sum
...
Sum calculates and prints a 16-bit checksum for the named
      file, and also prints the number of blocks in the file.
...

A few questions:
1. I'll eventually be able to read assembly and learn what the v6 
utility is doing the hard way, but does anyone know what's going on here?
2. Why is sum reporting different checksum's between v6 and v7?
3. Do you know of an alternative to check that the bytes were 
transferred exactly? I used od and then compared the text representation 
of the bytes  on the host using diff (other than differences in output 
between v6 and v7 related to duplicate lines, it worked ok but is clunky).

Thanks,

Will



^ permalink raw reply	[flat|nested] 8+ messages in thread
* [TUHS] why is sum reporting different checksum's between v6 and v7
@ 2015-12-12  0:30 Noel Chiappa
  2015-12-12  1:07 ` Random832
  0 siblings, 1 reply; 8+ messages in thread
From: Noel Chiappa @ 2015-12-12  0:30 UTC (permalink / raw)


    > From: Will Senn

    > I noticed that the sum utility from v6 reports a different checksum
    > than it does using the sum utility from v7 for the same file.
    > ... does anyone know what's going on here?
    > Why is sum reporting different checksum's between v6 and v7?

The two use different algorithms to accumulate the sum (I have added comments
to the relevant portion of the V6 assembler one, to help understand it):

V6:
	mov	$buf,r2		/ Pointer to buffer in R2
    2:	movb	(r2)+,r4	/ Get new byte into R4 (sign extends!)
	add	r4,r5		/ Add to running sum
	adc	r5		/ If overflow, add carry into low end of sum
	sob	r0,2b		/ If any bytes left, go around again

Read the description of MOVB in the PDP-11 Processor manual.

V7:
	while ((c = getc(f)) != EOF) {
		nbytes++;
		if (sum&01)
			sum = (sum>>1) + 0x8000;
		else
			sum >>= 1;
		sum += c;
		sum &= 0xFFFF;
		}

I'm not clear on some of that, so I'll leave its exact workings as an
exercise, but I'm pretty sure it's not a equivalent algorithm (as in,
something that would produce the same results); it's certainly not
identical. (The right shift is basically a rotate, so it's not a straight sum,
it's more like the Fletcher checksum used by XNS, if anyone remembers that.)

Among the parts I don't get, for instance, sum is declared as 'unsigned',
presumably 16 bits, so the last line _should_ be a NOP!? Also, with 'c' being
implicitly declared as an 'int', does the assignment sign extend? I have this
vague memory that it does. And does the right shift if the low bit is one
really do what the code seems to indicate it does? I have this bit that ASR on
the PDP-11 copies the high bit, not shifts in a 0 (check the processor
manual).  That is, of course, assuming that the compiler implements the '>>'
with an ASR, not a ROR followed by a clear of the high bit, or something.
  
	Noel



^ permalink raw reply	[flat|nested] 8+ messages in thread
* [TUHS] why is sum reporting different checksum's between v6 and v7
@ 2015-12-12  1:22 Noel Chiappa
  2015-12-12  1:46 ` Random832
  2015-12-12  1:50 ` John Cowan
  0 siblings, 2 replies; 8+ messages in thread
From: Noel Chiappa @ 2015-12-12  1:22 UTC (permalink / raw)


    > From: Random832

    > Interestingly, the SysIII sum.c program, which I assume yields the same
    > result for this input, appears to go through the whole input
    > accumulating the sum of all the bytes into a long, then adds the two
    > halves of the long at the end rather than after every byte.

That's the same hack a lot of TCP/IP checksums routines used on machines with
longer words; add the words, then fold the result in the shorter length at the
end. The one I wrote for the 68K back in '84 did that.

    > This suggests that the two programs would give different results for
    > very large files that overflow a 32-bit value.

No, I don't think so, depending on the exact detals of the implementation. As
long as when folding the two halves together, you add any carry into the sum,
you get the same result as doing it into a 16-bit sum. (If my memory of how
this all works is correct - the neurons aren't what they used to be,
especially late in the day... :-)

    > Also, if this sign extends, then its behavior on "negative" (high bit
    > set) bytes is likely to be very different from the SysIII one, which
    > uses getc.

I have this bit set that in C, 'char' is defined to be signed, and
furthermore that when you assign a shorter int to a longer one, the sign is
extended. So if one has a char holding '0200' octal (i.e. -128), assigning it
to a 16-bit int should result in the latter holding '0177600' (i.e. still
-128). So in fact I think they probably act the same.

	Noel



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-12  1:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-12  0:03 [TUHS] why is sum reporting different checksum's between v6 and v7 Will Senn
2015-12-12  0:10 ` Clem cole
2015-12-12  0:38 ` Random832
2015-12-12  0:30 Noel Chiappa
2015-12-12  1:07 ` Random832
2015-12-12  1:22 Noel Chiappa
2015-12-12  1:46 ` Random832
2015-12-12  1:50 ` John Cowan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).