* [Unix-jun72] Kernel built under V1
@ 2008-05-18 14:56 James A. Markevitch
2008-05-18 18:04 ` Tim Newsham
0 siblings, 1 reply; 10+ messages in thread
From: James A. Markevitch @ 2008-05-18 14:56 UTC (permalink / raw)
For fun, I tried building the kernel under V1 and booting it and it
looks like it simply works.
I did the following, outside the simulator:
tools/assemv2
mkdir fs/usr/kernel
cp -pi build/u?.s fs/usr/kernel
tools/imgbuild
boot/installboot
tools/pdp11 boot/simh.cfg
Then, I logged in as root and did:
chdir /usr/kernel
as u?.s
chdir ../boot
sh run
msys2 u ../kernel/a.out
More realistically, the kernel should be built in /usr/sys.
James Markevitch
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] Kernel built under V1
2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch
@ 2008-05-18 18:04 ` Tim Newsham
2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt
0 siblings, 1 reply; 10+ messages in thread
From: Tim Newsham @ 2008-05-18 18:04 UTC (permalink / raw)
> I did the following, outside the simulator:
>
> tools/assemv2
> mkdir fs/usr/kernel
> cp -pi build/u?.s fs/usr/kernel
> tools/imgbuild
> boot/installboot
> tools/pdp11 boot/simh.cfg
If you grab the latest svn sources the process is a lot simpler.
You just have to
cd build
make
<cp extra files into root or usr>
rm protofs
make install
but the extra commands aren't necessary because in the latest Makefile
we're installing /usr/src with the commands sources and /usr/sys with
the kernel sources.
> msys2 u ../kernel/a.out
Great!
> James Markevitch
Tim Newsham
http://www.thenewsh.com/~newsham/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-18 18:04 ` Tim Newsham
@ 2008-05-20 5:16 ` Doug Merritt
2008-05-20 7:39 ` Warren Toomey
2008-05-20 19:49 ` Tim Newsham
0 siblings, 2 replies; 10+ messages in thread
From: Doug Merritt @ 2008-05-20 5:16 UTC (permalink / raw)
Interim status on disassembler:
Usually we think "crank out opcodes and operands, how hard could it be?"
But for our effort, we really would like to have disassembly produce
something as close to the original source code as possible, including
many things such as which parts are code versus data, and much else.
It turns out there is in fact "much else", and I have been addressing
all that, starting from the basic "decode opcodes and operands"
disassembler of Jeffery L. Post.
My initial efforts are directed at disassembling s2 programs and
comparing them to reconstructed s1 sources. This is a rich experimental
playground, for now.
It's coming along very nicely, but there are a lot of "little"
issues, and I will only skim the surface here.
It understands all v1 syscalls, and disassembles appropriately.
It generates all needed "not in assembler" syscall definitions, when used.
My first approach to "temporary labels" (1f/1b, see Knuth) failed badly;
if anyone has insights on how such things should be disassembled,
please tell me; I'm still mentally going through various possible
algorithms.
Currently it supports a "directives file" to insert human-defined labels
and parameter types. The latter successfully disassembles constructs
of the form
jsr f5, mesg; <This is a mesg\n\0>
...after being told that address (say 0200) equals "mesg" and that
that subr takes a jsr_r5 parameter of type asciz.
It does other cool stuff, too, but alas, still fails in all sorts
of horrible ways, e.g. in not being aware of the difference between
rvals and lvals when it should.
In general, I'm trying to automate any aspects of disassembly that
can be, to avoid needing human intervention. Doing so increases
the fidelity of the reconstruction: minimizing intervention
is inherently desirable, for historical reconstruction.
Another example of philosophy: it converts BCS to BES if it follows a syscall.
In short, it is pretty cool in certain regards, but is not ready
for prime time by any means. It's not beta, it's not alpha, it's
prototype -- but pretty promising.
Handling simple BSS issues is another TODO example (probably next on my list)
Just thought I should report on current progress, since
people get impatient sometimes. Feedback appreciated.
Doug
--
Professional Wild-eyed Visionary Member, Crusaders for a Better Tomorrow
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt
@ 2008-05-20 7:39 ` Warren Toomey
2008-05-20 19:49 ` Tim Newsham
1 sibling, 0 replies; 10+ messages in thread
From: Warren Toomey @ 2008-05-20 7:39 UTC (permalink / raw)
On Mon, May 19, 2008 at 10:16:28PM -0700, Doug Merritt wrote:
> Interim status on disassembler:
> My first approach to "temporary labels" (1f/1b, see Knuth) failed badly;
> if anyone has insights on how such things should be disassembled,
> please tell me; I'm still mentally going through various possible
> algorithms.
Doug, I was also halfway through a disassembler. I take a 2-pass approach.
On the 1st pass I collect interesting addresses, e.g. for jsrs, branches
etc. Then I allocate labels based on address order. Then on the second
pass I use the labels when printing. I'll send you a copy of the work
so far.
Cheers,
Warren
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt
2008-05-20 7:39 ` Warren Toomey
@ 2008-05-20 19:49 ` Tim Newsham
2008-05-21 7:44 ` Doug Merritt
1 sibling, 1 reply; 10+ messages in thread
From: Tim Newsham @ 2008-05-20 19:49 UTC (permalink / raw)
> My first approach to "temporary labels" (1f/1b, see Knuth) failed badly;
> if anyone has insights on how such things should be disassembled,
> please tell me; I'm still mentally going through various possible
> algorithms.
This is a standard register allocation problem (ie. assigning registers to
variables when compiling a program). You need to figure out over which
ranges of the program the labels are "live". Then you need to figure out
which ones are live at the same time and build a graph of which labels
cannot share the same label. Then you go through and start assigning
labels in a way that fits this constraint.
http://en.wikipedia.org/wiki/Register_allocation
http://en.wikipedia.org/wiki/Graph_coloring
Should be covered in compilers text books.
> Doug
Tim Newsham
http://www.thenewsh.com/~newsham/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-20 19:49 ` Tim Newsham
@ 2008-05-21 7:44 ` Doug Merritt
2008-05-21 9:49 ` Brad Parker
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Doug Merritt @ 2008-05-21 7:44 UTC (permalink / raw)
Tim Newsham wrote:
> This is a standard register allocation problem (ie. assigning registers to
Thanks for the thought, but I don't think it is, quite, in part because
of the presence of backward branches:
2:
br 1f
br 2b
1:
Here the 2 could be a 1, despite overlapping ranges:
1:
br 1f
br 1b
1:
In standard register allocation, live/dead ranges are forward only:
mov $0,r0
mov $1,r1
mov r0,(sp)+
mov r1,(sp)+
r0 and r1 cannot be subsumed into a single register (completely
ignoring the possibility of other optimizations as off topic here).
Perhaps that's merely a superficial difference, but it's enough
to confuse my thinking so far, if so.
To further complicate things, it would be especially nice to allocate
labels not just in a legal way (that preserves the semantics of the
program), but actually in a way as similar as possible to the way that
humans were doing so, for the sake of improved reconstruction. (A
complication with *that* is that, in some cases, the labels are
obviously subtopimal, presumably due to the historical sequence of edits.)
Some adaptation of e.g. graph coloring might conceivably do fine at the
former but not the latter, so if I succeeded at the first I would
still want to keep improving it in the direction of the second.
P.S. And obviously it would yield to a big enough hammer (dynamic
programming, exhaustive backtracking, genetic algorithms, simulated
annealing, etc), but I dislike the idea. I'm currently pondering
establishing a partial order and using a nice simple topological sort,
and/or attempting a simulation of how a human might go about it --
which might end up being the simplest way to accomplish both goals.
Doug
--
Professional Wild-eyed Visionary Member, Crusaders for a Better Tomorrow
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-21 7:44 ` Doug Merritt
@ 2008-05-21 9:49 ` Brad Parker
2008-05-21 13:40 ` Brad Parker
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Brad Parker @ 2008-05-21 9:49 UTC (permalink / raw)
Doug Merritt wrote:
>Tim Newsham wrote:
>> This is a standard register allocation problem (ie. assigning registers to
>
>Thanks for the thought, but I don't think it is, quite, in part because
>of the presence of backward branches:
>
>2:
> br 1f
> br 2b
>1:
>
>Here the 2 could be a 1, despite overlapping ranges:
>
>1:
> br 1f
> br 1b
>1:
I'm confused. Those are both the same, aren't they?
Isn't the simplest thing to do 2 passes and generate numeric labels as
needed? i.e.
L0001:
br L0002
br L0001
L0002:
That's ugly, but it's correct and will assemble so I'd start there.
Once you have that, and you have all the code in memory as some sort of
tree, it should be possible to make a pass an attempt to clean this
up a little.
Here's a snipped of real code, with the uninteresting parts turned into "..."
badsys:
...
mov $3f,u.namep
...
br 1f
...
br 2f
1:
...
2:
br sysexit
3:
<core\0\0>
-------
badsys:
...
mov $L0001,u.namep
...
br L0002
...
br L0003
L0002:
...
L0003:
br sysexit
L0001:
<core\0\0>
You should be able to scan the code and check for references to each
location. Assume for a second you have an vector with one entry for
each instruction location. You could then chain references off each
element. With that information you could tell the "lifetime" of a
label. i.e. at L0001 above you could look and decide that that no one
else referenced it going forward.
I would think that as you scanned that vector from beginning to end you
could keep a count of active references, and use that to reset your
local label number.
Let's try it.
badsys:
01 ...
02 mov $L0001,u.namep
03 ...
04 br L0002
05 ...
06 br L0003
L0002:
07 ...
L0003:
08 br sysexit
L0001:
09 <core\0\0>
10
01
02 uses L0001 +1
03
04 uses L0002 +2
05
06 uses L0003 +3
07 defines L0002 +2; look forward, no refs to L0002 so decrement
08 defines L0003 +1; look forward, no refs to L0003 so decrement
09 defines L0001 0; look forward, no refs to L0001 so decrement
10
I'm probably trivializing it, but I'm willing to put my money where my mouth
is if you want me to take a look at your code.
I think it has to be a 2 pass operation, however. I think this is true of
any good disassembler; they are generally multiple pass. Deciding where
the "data" is sometimes takes a little work, which can sometimes be
a heuristic like "it looks like ascii and there is a reference to it",
so having a tree of references generally helps.
This is not unlike binary recompilation where you follow each possible
code path to a terminal node.
-brad
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-21 7:44 ` Doug Merritt
2008-05-21 9:49 ` Brad Parker
@ 2008-05-21 13:40 ` Brad Parker
2008-05-21 14:05 ` Brad Parker
2008-05-21 21:34 ` Tim Newsham
3 siblings, 0 replies; 10+ messages in thread
From: Brad Parker @ 2008-05-21 13:40 UTC (permalink / raw)
Doug Merritt wrote:
>
>In standard register allocation, live/dead ranges are forward only:
I think I figured out one way to assign local labels. Basically I
discover the "span" of each reference. I only concider small spans. I
bump a count when the span starts and decrement when it ends. I assign
labels using the count.
I have an example using ls I'll send in a bit.
-brad
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-21 7:44 ` Doug Merritt
2008-05-21 9:49 ` Brad Parker
2008-05-21 13:40 ` Brad Parker
@ 2008-05-21 14:05 ` Brad Parker
2008-05-21 21:34 ` Tim Newsham
3 siblings, 0 replies; 10+ messages in thread
From: Brad Parker @ 2008-05-21 14:05 UTC (permalink / raw)
Here's an example from 'ls'. It's not ment to be good dis-assembly,
just a rough approximation, but it shows references and how the short
spans get used to assign labels.
ps: I'm not trying to write a disassembler. This is from a binary code
recompilation hobby project. It just happened that there was enough
info already in the code to figure out where the labels might go.
I'm happy to share the code if this would be helpful.
-brad
...
004374: span 4346 - 4376
004416: span 4410 - 4532
004436: span 4436 - 4446
004454: span 4434 - 4454
004456: span 4456 - 4476
...
local 4374; 1 (4346-4376)
assign1 4376; 1
assign1 4410; 0
assign1 4424; 0
assign1 4434; 0
local 4436; 1 (4436-4446)
assign1 4446; 2
assign1 4452; 1
local 4454; 1 (4434-4454)
local 4456; 0 (4456-4476)
assign1 4476; 1
...
004374: L1
2 004376
2 004346
004376:
1 004374
004410:
1 004416
004416: L2318
2 004532
2 004524
2 004452
2 004410
004424:
1 004602
004434:
1 004454
004436: L2
2 004446
004446:
1 004436
004452:
1 004416
004454: L1
2 004434
004456: L1
2 004476
004476:
1 004456
...
12 004370 000000 HALT R5
12 004372 000207 RTS R5
1:
12 -> 004374 112423 MOVB R4 R3
13 004376 001376 BNE R3 ; (ref 4374, 1b)
13 004400 005303 DEC R3
13 004402 126327 CPMB R3 R7
177777
004406 000057 ?
14 004410 001402 BEQ R7 ; (ref 4416, L2318)
14 004412 112723 MOVB R7 R3
000057
L2318:
14 -> 004416 004567 JSR R7
002722
14 004422 044052 BIC R0 R2
15 004424 103466 BLO R2 ; (ref 4602, L2434)
15 004426 012746 MOV R7 R6
000004
15 004432 005700 TST R0
16 004434 001007 BNE R0 ; (ref 4454, 1f)
2:
16 -> 004436 004567 JSR R7
002702
16 004442 044052 BIC R0 R2
16 004444 005316 DEC R6
17 004446 001373 BNE R6 ; (ref 4436, 2b)
17 004450 005726 TST R6
18 004452 000761 BR R6 ; (ref 4416, L2318)
1:
17 -> 004454 010302 MOV R3 R2
1:
17 -> 004456 004567 JSR R7
002662
17 004462 044052 BIC R0 R2
17 004464 010021 MOV R0 R1
17 004466 110022 MOVB R0 R2
17 004470 000300 SWAB R0
17 004472 110022 MOVB R0 R2
17 004474 005316 DEC R6
18 004476 001367 BNE R6 ; (ref 4456, 1b)
18 004500 005726 TST R6
18 004502 105022 CLRB R2
18 004504 005767 TST R7
003232
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler
2008-05-21 7:44 ` Doug Merritt
` (2 preceding siblings ...)
2008-05-21 14:05 ` Brad Parker
@ 2008-05-21 21:34 ` Tim Newsham
3 siblings, 0 replies; 10+ messages in thread
From: Tim Newsham @ 2008-05-21 21:34 UTC (permalink / raw)
> Thanks for the thought, but I don't think it is, quite, in part because
> of the presence of backward branches:
>
> 2:
> br 1f
> br 2b
> 1:
>
> Here the 2 could be a 1, despite overlapping ranges:
>
> 1:
> br 1f
> br 1b
> 1:
I think you can consider "1f" and "1b" to be two separate names for
the sake of the register coloring algorithm (both of which happen
to exist at the point where the "1:" label is defined). In many
cases only the forward one will be live, in other cases only the
backward one will be live, but sometimes both will be live simultaneously.
This does add a slight difference to the graph because "1f" and "1b"
must be given the same number "1" when both are alive at the same
definition site.
> To further complicate things, it would be especially nice to allocate
> labels not just in a legal way (that preserves the semantics of the
> program), but actually in a way as similar as possible to the way that
> humans were doing so, for the sake of improved reconstruction. (A
> complication with *that* is that, in some cases, the labels are
> obviously subtopimal, presumably due to the historical sequence of edits.)
I think doing a minimum coloring will be pretty close to what a human
would do. If you want to, you can try to swap around labels so that
the outter most labels have low numbers and the inner labels (nested
inside loops) have the higher numbers...
Tim Newsham
http://www.thenewsh.com/~newsham/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-05-21 21:34 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch
2008-05-18 18:04 ` Tim Newsham
2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt
2008-05-20 7:39 ` Warren Toomey
2008-05-20 19:49 ` Tim Newsham
2008-05-21 7:44 ` Doug Merritt
2008-05-21 9:49 ` Brad Parker
2008-05-21 13:40 ` Brad Parker
2008-05-21 14:05 ` Brad Parker
2008-05-21 21:34 ` Tim Newsham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).