The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [Unix-jun72] Kernel built under V1
@ 2008-05-18 14:56 James A. Markevitch
  2008-05-18 18:04 ` Tim Newsham
  0 siblings, 1 reply; 10+ messages in thread
From: James A. Markevitch @ 2008-05-18 14:56 UTC (permalink / raw)


For fun, I tried building the kernel under V1 and booting it and it
looks like it simply works.

I did the following, outside the simulator:

        tools/assemv2
        mkdir fs/usr/kernel
        cp -pi build/u?.s fs/usr/kernel
        tools/imgbuild
        boot/installboot
        tools/pdp11 boot/simh.cfg

Then, I logged in as root and did:

        chdir /usr/kernel
        as u?.s
        chdir ../boot
        sh run
        msys2 u ../kernel/a.out

More realistically, the kernel should be built in /usr/sys.

James Markevitch



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] Kernel built under V1
  2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch
@ 2008-05-18 18:04 ` Tim Newsham
  2008-05-20  5:16   ` [Unix-jun72] status on disassembler Doug Merritt
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Newsham @ 2008-05-18 18:04 UTC (permalink / raw)


> I did the following, outside the simulator:
>
>        tools/assemv2
>        mkdir fs/usr/kernel
>        cp -pi build/u?.s fs/usr/kernel
>        tools/imgbuild
>        boot/installboot
>        tools/pdp11 boot/simh.cfg

If you grab the latest svn sources the process is a lot simpler.
You just have to

        cd build
        make
        <cp extra files into root or usr>
        rm protofs
        make install

but the extra commands aren't necessary because in the latest Makefile
we're installing /usr/src with the commands sources and /usr/sys with
the kernel sources.

>        msys2 u ../kernel/a.out

Great!

> James Markevitch

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-18 18:04 ` Tim Newsham
@ 2008-05-20  5:16   ` Doug Merritt
  2008-05-20  7:39     ` Warren Toomey
  2008-05-20 19:49     ` Tim Newsham
  0 siblings, 2 replies; 10+ messages in thread
From: Doug Merritt @ 2008-05-20  5:16 UTC (permalink / raw)


Interim status on disassembler:

Usually we think "crank out opcodes and operands, how hard could it be?"

But for our effort, we really would like to have disassembly produce
something as close to the original source code as possible, including
many things such as which parts are code versus data, and much else.

It turns out there is in fact "much else", and I have been addressing
all that, starting from the basic "decode opcodes and operands"
disassembler of Jeffery L. Post.

My initial efforts are directed at disassembling s2 programs and
comparing them to reconstructed s1 sources. This is a rich experimental
playground, for now.

It's coming along very nicely, but there are a lot of "little"
issues, and I will only skim the surface here.

It understands all v1 syscalls, and disassembles appropriately.

It generates all needed "not in assembler" syscall definitions, when used.

My first approach to "temporary labels" (1f/1b, see Knuth) failed badly;
if anyone has insights on how such things should be disassembled,
please tell me; I'm still mentally going through various possible
algorithms.

Currently it supports a "directives file" to insert human-defined labels
and parameter types. The latter successfully disassembles constructs
of the form
     jsr  f5, mesg; <This is a mesg\n\0>

...after being told that address (say 0200) equals "mesg" and that
that subr takes a jsr_r5 parameter of type asciz.

It does other cool stuff, too, but alas, still fails in all sorts
of horrible ways, e.g. in not being aware of the difference between
rvals and lvals when it should.

In general, I'm trying to automate any aspects of disassembly that
can be, to avoid needing human intervention. Doing so increases
the fidelity of the reconstruction: minimizing intervention
is inherently desirable, for historical reconstruction.

Another example of philosophy: it converts BCS to BES if it follows a syscall.

In short, it is pretty cool in certain regards, but is not ready
for prime time by any means. It's not beta, it's not alpha, it's
prototype -- but pretty promising.

Handling simple BSS issues is another TODO example (probably next on my list)

Just thought I should report on current progress, since
people get impatient sometimes. Feedback appreciated.
   Doug
--
Professional Wild-eyed Visionary        Member, Crusaders for a Better Tomorrow



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-20  5:16   ` [Unix-jun72] status on disassembler Doug Merritt
@ 2008-05-20  7:39     ` Warren Toomey
  2008-05-20 19:49     ` Tim Newsham
  1 sibling, 0 replies; 10+ messages in thread
From: Warren Toomey @ 2008-05-20  7:39 UTC (permalink / raw)


On Mon, May 19, 2008 at 10:16:28PM -0700, Doug Merritt wrote:
> Interim status on disassembler:
> My first approach to "temporary labels" (1f/1b, see Knuth) failed badly;
> if anyone has insights on how such things should be disassembled,
> please tell me; I'm still mentally going through various possible
> algorithms.

Doug, I was also halfway through a disassembler. I take a 2-pass approach.
On the 1st pass I collect interesting addresses, e.g. for jsrs, branches
etc. Then I allocate labels based on address order. Then on the second
pass I use the labels when printing. I'll send you a copy of the work
so far.

Cheers,
	Warren



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-20  5:16   ` [Unix-jun72] status on disassembler Doug Merritt
  2008-05-20  7:39     ` Warren Toomey
@ 2008-05-20 19:49     ` Tim Newsham
  2008-05-21  7:44       ` Doug Merritt
  1 sibling, 1 reply; 10+ messages in thread
From: Tim Newsham @ 2008-05-20 19:49 UTC (permalink / raw)


> My first approach to "temporary labels" (1f/1b, see Knuth) failed badly;
> if anyone has insights on how such things should be disassembled,
> please tell me; I'm still mentally going through various possible
> algorithms.

This is a standard register allocation problem (ie. assigning registers to 
variables when compiling a program).  You need to figure out over which 
ranges of the program the labels are "live".  Then you need to figure out 
which ones are live at the same time and build a graph of which labels 
cannot share the same label.  Then you go through and start assigning 
labels in a way that fits this constraint.

http://en.wikipedia.org/wiki/Register_allocation
http://en.wikipedia.org/wiki/Graph_coloring

Should be covered in compilers text books.

>   Doug

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-20 19:49     ` Tim Newsham
@ 2008-05-21  7:44       ` Doug Merritt
  2008-05-21  9:49         ` Brad Parker
                           ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Doug Merritt @ 2008-05-21  7:44 UTC (permalink / raw)


Tim Newsham wrote:
> This is a standard register allocation problem (ie. assigning registers to 

Thanks for the thought, but I don't think it is, quite, in part because
of the presence of backward branches:

2:
	br	1f
	br	2b
1:

Here the 2 could be a 1, despite overlapping ranges:

1:
	br	1f
	br	1b
1:

In standard register allocation, live/dead ranges are forward only:

	mov	$0,r0
	mov	$1,r1
	mov	r0,(sp)+
	mov	r1,(sp)+

r0 and r1 cannot be subsumed into a single register (completely
ignoring the possibility of other optimizations as off topic here).

Perhaps that's merely a superficial difference, but it's enough
to confuse my thinking so far, if so.

To further complicate things, it would be especially nice to allocate
labels not just in a legal way (that preserves the semantics of the
program), but actually in a way as similar as possible to the way that 
humans were doing so, for the sake of improved reconstruction. (A
complication with *that* is that, in some cases, the labels are
obviously subtopimal, presumably due to the historical sequence of edits.)

Some adaptation of e.g. graph coloring might conceivably do fine at the
former but not the latter, so if I succeeded at the first I would
still want to keep improving it in the direction of the second.

P.S. And obviously it would yield to a big enough hammer (dynamic
programming, exhaustive backtracking, genetic algorithms, simulated
annealing, etc), but I dislike the idea. I'm currently pondering
establishing a partial order and using a nice simple topological sort,
and/or attempting a simulation of how a human might go about it --
which might end up being the simplest way to accomplish both goals.
	Doug
--
Professional Wild-eyed Visionary        Member, Crusaders for a Better Tomorrow



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-21  7:44       ` Doug Merritt
@ 2008-05-21  9:49         ` Brad Parker
  2008-05-21 13:40         ` Brad Parker
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Brad Parker @ 2008-05-21  9:49 UTC (permalink / raw)



Doug Merritt wrote:
>Tim Newsham wrote:
>> This is a standard register allocation problem (ie. assigning registers to 
>
>Thanks for the thought, but I don't think it is, quite, in part because
>of the presence of backward branches:
>
>2:
>	br	1f
>	br	2b
>1:
>
>Here the 2 could be a 1, despite overlapping ranges:
>
>1:
>	br	1f
>	br	1b
>1:

I'm confused.  Those are both the same, aren't they?

Isn't the simplest thing to do 2 passes and generate numeric labels as
needed?  i.e.

L0001:
	br	L0002
	br	L0001
L0002:

That's ugly, but it's correct and will assemble so I'd start there.

Once you have that, and you have all the code in memory as some sort of
tree, it should be possible to make a pass an attempt to clean this
up a little.

Here's a snipped of real code, with the uninteresting parts turned into "..."

badsys:
	...
	mov	$3f,u.namep
	...
	br	1f
	...
	br	2f
1:
	...
2:
	br	sysexit
3:
	<core\0\0>

-------

badsys:
	...
	mov	$L0001,u.namep
	...
	br	L0002
	...
	br	L0003
L0002:
	...
L0003:
	br	sysexit
L0001:
	<core\0\0>

You should be able to scan the code and check for references to each
location.  Assume for a second you have an vector with one entry for
each instruction location.  You could then chain references off each
element.  With that information you could tell the "lifetime" of a
label. i.e.  at L0001 above you could look and decide that that no one
else referenced it going forward.

I would think that as you scanned that vector from beginning to end you
could keep a count of active references, and use that to reset your
local label number.

Let's try it.

   badsys:
01	...
02 	mov	$L0001,u.namep
03	...
04	br	L0002
05	...
06	br	L0003
   L0002:
07	...
   L0003:
08	br	sysexit
   L0001:
09	<core\0\0>
10


01
02 uses L0001      +1
03
04 uses L0002      +2
05
06 uses L0003      +3
07 defines L0002   +2; look forward, no refs to L0002 so decrement
08 defines L0003   +1; look forward, no refs to L0003 so decrement
09 defines L0001   0; look forward, no refs to L0001 so decrement
10

I'm probably trivializing it, but I'm willing to put my money where my mouth
is if you want me to take a look at your code.

I think it has to be a 2 pass operation, however. I think this is true of
any good disassembler; they are generally multiple pass.  Deciding where
the "data" is sometimes takes a little work, which can sometimes be
a heuristic like "it looks like ascii and there is a reference to it",
so having a tree of references generally helps.

This is not unlike binary recompilation where you follow each possible
code path to a terminal node.

-brad




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-21  7:44       ` Doug Merritt
  2008-05-21  9:49         ` Brad Parker
@ 2008-05-21 13:40         ` Brad Parker
  2008-05-21 14:05         ` Brad Parker
  2008-05-21 21:34         ` Tim Newsham
  3 siblings, 0 replies; 10+ messages in thread
From: Brad Parker @ 2008-05-21 13:40 UTC (permalink / raw)



Doug Merritt wrote:
>
>In standard register allocation, live/dead ranges are forward only:

I think I figured out one way to assign local labels.  Basically I
discover the "span" of each reference.  I only concider small spans.  I
bump a count when the span starts and decrement when it ends.  I assign
labels using the count.

I have an example using ls I'll send in a bit.

-brad




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-21  7:44       ` Doug Merritt
  2008-05-21  9:49         ` Brad Parker
  2008-05-21 13:40         ` Brad Parker
@ 2008-05-21 14:05         ` Brad Parker
  2008-05-21 21:34         ` Tim Newsham
  3 siblings, 0 replies; 10+ messages in thread
From: Brad Parker @ 2008-05-21 14:05 UTC (permalink / raw)



Here's an example from 'ls'.  It's not ment to be good dis-assembly,
just a rough approximation, but it shows references and how the short
spans get used to assign labels.

ps: I'm not trying to write a disassembler.  This is from a binary code
recompilation hobby project.  It just happened that there was enough
info already in the code to figure out where the labels might go.

I'm happy to share the code if this would be helpful.

-brad

...
004374: span 4346 - 4376
004416: span 4410 - 4532
004436: span 4436 - 4446
004454: span 4434 - 4454
004456: span 4456 - 4476
...
local 4374; 1 (4346-4376)
assign1 4376; 1
assign1 4410; 0
assign1 4424; 0
assign1 4434; 0
local 4436; 1 (4436-4446)
assign1 4446; 2
assign1 4452; 1
local 4454; 1 (4434-4454)
local 4456; 0 (4456-4476)
assign1 4476; 1
...
004374: L1
 2 004376
 2 004346
004376: 
 1 004374
004410: 
 1 004416
004416: L2318
 2 004532
 2 004524
 2 004452
 2 004410
004424: 
 1 004602
004434: 
 1 004454
004436: L2
 2 004446
004446: 
 1 004436
004452: 
 1 004416
004454: L1
 2 004434
004456: L1
 2 004476
004476: 
 1 004456
...

 12    004370 000000 HALT R5
 12    004372 000207 RTS R5
1:
 12 -> 004374 112423 MOVB R4 R3
 13    004376 001376 BNE R3 ; (ref 4374, 1b)
 13    004400 005303 DEC R3
 13    004402 126327 CPMB R3 R7
              177777
       004406 000057 ?
 14    004410 001402 BEQ R7 ; (ref 4416, L2318)
 14    004412 112723 MOVB R7 R3
              000057
L2318:
 14 -> 004416 004567 JSR R7
              002722
 14    004422 044052 BIC R0 R2
 15    004424 103466 BLO R2 ; (ref 4602, L2434)
 15    004426 012746 MOV R7 R6
              000004
 15    004432 005700 TST R0
 16    004434 001007 BNE R0 ; (ref 4454, 1f)
2:
 16 -> 004436 004567 JSR R7
              002702
 16    004442 044052 BIC R0 R2
 16    004444 005316 DEC R6
 17    004446 001373 BNE R6 ; (ref 4436, 2b)
 17    004450 005726 TST R6
 18    004452 000761 BR R6 ; (ref 4416, L2318)
1:
 17 -> 004454 010302 MOV R3 R2
1:
 17 -> 004456 004567 JSR R7
              002662
 17    004462 044052 BIC R0 R2
 17    004464 010021 MOV R0 R1
 17    004466 110022 MOVB R0 R2
 17    004470 000300 SWAB R0
 17    004472 110022 MOVB R0 R2
 17    004474 005316 DEC R6
 18    004476 001367 BNE R6 ; (ref 4456, 1b)
 18    004500 005726 TST R6
 18    004502 105022 CLRB R2
 18    004504 005767 TST R7
              003232



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Unix-jun72] status on disassembler
  2008-05-21  7:44       ` Doug Merritt
                           ` (2 preceding siblings ...)
  2008-05-21 14:05         ` Brad Parker
@ 2008-05-21 21:34         ` Tim Newsham
  3 siblings, 0 replies; 10+ messages in thread
From: Tim Newsham @ 2008-05-21 21:34 UTC (permalink / raw)


> Thanks for the thought, but I don't think it is, quite, in part because
> of the presence of backward branches:
>
> 2:
> 	br	1f
> 	br	2b
> 1:
>
> Here the 2 could be a 1, despite overlapping ranges:
>
> 1:
> 	br	1f
> 	br	1b
> 1:

I think you can consider "1f" and "1b" to be two separate names for
the sake of the register coloring algorithm (both of which happen
to exist at the point where the "1:" label is defined).  In many
cases only the forward one will be live, in other cases only the
backward one will be live, but sometimes both will be live simultaneously.
This does add a slight difference to the graph because "1f" and "1b"
must be given the same number "1" when both are alive at the same
definition site.

> To further complicate things, it would be especially nice to allocate
> labels not just in a legal way (that preserves the semantics of the
> program), but actually in a way as similar as possible to the way that
> humans were doing so, for the sake of improved reconstruction. (A
> complication with *that* is that, in some cases, the labels are
> obviously subtopimal, presumably due to the historical sequence of edits.)

I think doing a minimum coloring will be pretty close to what a human
would do.  If you want to, you can try to swap around labels so that
the outter most labels have low numbers and the inner labels (nested
inside loops) have the higher numbers...

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-05-21 21:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch
2008-05-18 18:04 ` Tim Newsham
2008-05-20  5:16   ` [Unix-jun72] status on disassembler Doug Merritt
2008-05-20  7:39     ` Warren Toomey
2008-05-20 19:49     ` Tim Newsham
2008-05-21  7:44       ` Doug Merritt
2008-05-21  9:49         ` Brad Parker
2008-05-21 13:40         ` Brad Parker
2008-05-21 14:05         ` Brad Parker
2008-05-21 21:34         ` Tim Newsham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).