* [Unix-jun72] Kernel built under V1 @ 2008-05-18 14:56 James A. Markevitch 2008-05-18 18:04 ` Tim Newsham 0 siblings, 1 reply; 10+ messages in thread From: James A. Markevitch @ 2008-05-18 14:56 UTC (permalink / raw) For fun, I tried building the kernel under V1 and booting it and it looks like it simply works. I did the following, outside the simulator: tools/assemv2 mkdir fs/usr/kernel cp -pi build/u?.s fs/usr/kernel tools/imgbuild boot/installboot tools/pdp11 boot/simh.cfg Then, I logged in as root and did: chdir /usr/kernel as u?.s chdir ../boot sh run msys2 u ../kernel/a.out More realistically, the kernel should be built in /usr/sys. James Markevitch ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] Kernel built under V1 2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch @ 2008-05-18 18:04 ` Tim Newsham 2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt 0 siblings, 1 reply; 10+ messages in thread From: Tim Newsham @ 2008-05-18 18:04 UTC (permalink / raw) > I did the following, outside the simulator: > > tools/assemv2 > mkdir fs/usr/kernel > cp -pi build/u?.s fs/usr/kernel > tools/imgbuild > boot/installboot > tools/pdp11 boot/simh.cfg If you grab the latest svn sources the process is a lot simpler. You just have to cd build make <cp extra files into root or usr> rm protofs make install but the extra commands aren't necessary because in the latest Makefile we're installing /usr/src with the commands sources and /usr/sys with the kernel sources. > msys2 u ../kernel/a.out Great! > James Markevitch Tim Newsham http://www.thenewsh.com/~newsham/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-18 18:04 ` Tim Newsham @ 2008-05-20 5:16 ` Doug Merritt 2008-05-20 7:39 ` Warren Toomey 2008-05-20 19:49 ` Tim Newsham 0 siblings, 2 replies; 10+ messages in thread From: Doug Merritt @ 2008-05-20 5:16 UTC (permalink / raw) Interim status on disassembler: Usually we think "crank out opcodes and operands, how hard could it be?" But for our effort, we really would like to have disassembly produce something as close to the original source code as possible, including many things such as which parts are code versus data, and much else. It turns out there is in fact "much else", and I have been addressing all that, starting from the basic "decode opcodes and operands" disassembler of Jeffery L. Post. My initial efforts are directed at disassembling s2 programs and comparing them to reconstructed s1 sources. This is a rich experimental playground, for now. It's coming along very nicely, but there are a lot of "little" issues, and I will only skim the surface here. It understands all v1 syscalls, and disassembles appropriately. It generates all needed "not in assembler" syscall definitions, when used. My first approach to "temporary labels" (1f/1b, see Knuth) failed badly; if anyone has insights on how such things should be disassembled, please tell me; I'm still mentally going through various possible algorithms. Currently it supports a "directives file" to insert human-defined labels and parameter types. The latter successfully disassembles constructs of the form jsr f5, mesg; <This is a mesg\n\0> ...after being told that address (say 0200) equals "mesg" and that that subr takes a jsr_r5 parameter of type asciz. It does other cool stuff, too, but alas, still fails in all sorts of horrible ways, e.g. in not being aware of the difference between rvals and lvals when it should. In general, I'm trying to automate any aspects of disassembly that can be, to avoid needing human intervention. Doing so increases the fidelity of the reconstruction: minimizing intervention is inherently desirable, for historical reconstruction. Another example of philosophy: it converts BCS to BES if it follows a syscall. In short, it is pretty cool in certain regards, but is not ready for prime time by any means. It's not beta, it's not alpha, it's prototype -- but pretty promising. Handling simple BSS issues is another TODO example (probably next on my list) Just thought I should report on current progress, since people get impatient sometimes. Feedback appreciated. Doug -- Professional Wild-eyed Visionary Member, Crusaders for a Better Tomorrow ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt @ 2008-05-20 7:39 ` Warren Toomey 2008-05-20 19:49 ` Tim Newsham 1 sibling, 0 replies; 10+ messages in thread From: Warren Toomey @ 2008-05-20 7:39 UTC (permalink / raw) On Mon, May 19, 2008 at 10:16:28PM -0700, Doug Merritt wrote: > Interim status on disassembler: > My first approach to "temporary labels" (1f/1b, see Knuth) failed badly; > if anyone has insights on how such things should be disassembled, > please tell me; I'm still mentally going through various possible > algorithms. Doug, I was also halfway through a disassembler. I take a 2-pass approach. On the 1st pass I collect interesting addresses, e.g. for jsrs, branches etc. Then I allocate labels based on address order. Then on the second pass I use the labels when printing. I'll send you a copy of the work so far. Cheers, Warren ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt 2008-05-20 7:39 ` Warren Toomey @ 2008-05-20 19:49 ` Tim Newsham 2008-05-21 7:44 ` Doug Merritt 1 sibling, 1 reply; 10+ messages in thread From: Tim Newsham @ 2008-05-20 19:49 UTC (permalink / raw) > My first approach to "temporary labels" (1f/1b, see Knuth) failed badly; > if anyone has insights on how such things should be disassembled, > please tell me; I'm still mentally going through various possible > algorithms. This is a standard register allocation problem (ie. assigning registers to variables when compiling a program). You need to figure out over which ranges of the program the labels are "live". Then you need to figure out which ones are live at the same time and build a graph of which labels cannot share the same label. Then you go through and start assigning labels in a way that fits this constraint. http://en.wikipedia.org/wiki/Register_allocation http://en.wikipedia.org/wiki/Graph_coloring Should be covered in compilers text books. > Doug Tim Newsham http://www.thenewsh.com/~newsham/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-20 19:49 ` Tim Newsham @ 2008-05-21 7:44 ` Doug Merritt 2008-05-21 9:49 ` Brad Parker ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Doug Merritt @ 2008-05-21 7:44 UTC (permalink / raw) Tim Newsham wrote: > This is a standard register allocation problem (ie. assigning registers to Thanks for the thought, but I don't think it is, quite, in part because of the presence of backward branches: 2: br 1f br 2b 1: Here the 2 could be a 1, despite overlapping ranges: 1: br 1f br 1b 1: In standard register allocation, live/dead ranges are forward only: mov $0,r0 mov $1,r1 mov r0,(sp)+ mov r1,(sp)+ r0 and r1 cannot be subsumed into a single register (completely ignoring the possibility of other optimizations as off topic here). Perhaps that's merely a superficial difference, but it's enough to confuse my thinking so far, if so. To further complicate things, it would be especially nice to allocate labels not just in a legal way (that preserves the semantics of the program), but actually in a way as similar as possible to the way that humans were doing so, for the sake of improved reconstruction. (A complication with *that* is that, in some cases, the labels are obviously subtopimal, presumably due to the historical sequence of edits.) Some adaptation of e.g. graph coloring might conceivably do fine at the former but not the latter, so if I succeeded at the first I would still want to keep improving it in the direction of the second. P.S. And obviously it would yield to a big enough hammer (dynamic programming, exhaustive backtracking, genetic algorithms, simulated annealing, etc), but I dislike the idea. I'm currently pondering establishing a partial order and using a nice simple topological sort, and/or attempting a simulation of how a human might go about it -- which might end up being the simplest way to accomplish both goals. Doug -- Professional Wild-eyed Visionary Member, Crusaders for a Better Tomorrow ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-21 7:44 ` Doug Merritt @ 2008-05-21 9:49 ` Brad Parker 2008-05-21 13:40 ` Brad Parker ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Brad Parker @ 2008-05-21 9:49 UTC (permalink / raw) Doug Merritt wrote: >Tim Newsham wrote: >> This is a standard register allocation problem (ie. assigning registers to > >Thanks for the thought, but I don't think it is, quite, in part because >of the presence of backward branches: > >2: > br 1f > br 2b >1: > >Here the 2 could be a 1, despite overlapping ranges: > >1: > br 1f > br 1b >1: I'm confused. Those are both the same, aren't they? Isn't the simplest thing to do 2 passes and generate numeric labels as needed? i.e. L0001: br L0002 br L0001 L0002: That's ugly, but it's correct and will assemble so I'd start there. Once you have that, and you have all the code in memory as some sort of tree, it should be possible to make a pass an attempt to clean this up a little. Here's a snipped of real code, with the uninteresting parts turned into "..." badsys: ... mov $3f,u.namep ... br 1f ... br 2f 1: ... 2: br sysexit 3: <core\0\0> ------- badsys: ... mov $L0001,u.namep ... br L0002 ... br L0003 L0002: ... L0003: br sysexit L0001: <core\0\0> You should be able to scan the code and check for references to each location. Assume for a second you have an vector with one entry for each instruction location. You could then chain references off each element. With that information you could tell the "lifetime" of a label. i.e. at L0001 above you could look and decide that that no one else referenced it going forward. I would think that as you scanned that vector from beginning to end you could keep a count of active references, and use that to reset your local label number. Let's try it. badsys: 01 ... 02 mov $L0001,u.namep 03 ... 04 br L0002 05 ... 06 br L0003 L0002: 07 ... L0003: 08 br sysexit L0001: 09 <core\0\0> 10 01 02 uses L0001 +1 03 04 uses L0002 +2 05 06 uses L0003 +3 07 defines L0002 +2; look forward, no refs to L0002 so decrement 08 defines L0003 +1; look forward, no refs to L0003 so decrement 09 defines L0001 0; look forward, no refs to L0001 so decrement 10 I'm probably trivializing it, but I'm willing to put my money where my mouth is if you want me to take a look at your code. I think it has to be a 2 pass operation, however. I think this is true of any good disassembler; they are generally multiple pass. Deciding where the "data" is sometimes takes a little work, which can sometimes be a heuristic like "it looks like ascii and there is a reference to it", so having a tree of references generally helps. This is not unlike binary recompilation where you follow each possible code path to a terminal node. -brad ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-21 7:44 ` Doug Merritt 2008-05-21 9:49 ` Brad Parker @ 2008-05-21 13:40 ` Brad Parker 2008-05-21 14:05 ` Brad Parker 2008-05-21 21:34 ` Tim Newsham 3 siblings, 0 replies; 10+ messages in thread From: Brad Parker @ 2008-05-21 13:40 UTC (permalink / raw) Doug Merritt wrote: > >In standard register allocation, live/dead ranges are forward only: I think I figured out one way to assign local labels. Basically I discover the "span" of each reference. I only concider small spans. I bump a count when the span starts and decrement when it ends. I assign labels using the count. I have an example using ls I'll send in a bit. -brad ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-21 7:44 ` Doug Merritt 2008-05-21 9:49 ` Brad Parker 2008-05-21 13:40 ` Brad Parker @ 2008-05-21 14:05 ` Brad Parker 2008-05-21 21:34 ` Tim Newsham 3 siblings, 0 replies; 10+ messages in thread From: Brad Parker @ 2008-05-21 14:05 UTC (permalink / raw) Here's an example from 'ls'. It's not ment to be good dis-assembly, just a rough approximation, but it shows references and how the short spans get used to assign labels. ps: I'm not trying to write a disassembler. This is from a binary code recompilation hobby project. It just happened that there was enough info already in the code to figure out where the labels might go. I'm happy to share the code if this would be helpful. -brad ... 004374: span 4346 - 4376 004416: span 4410 - 4532 004436: span 4436 - 4446 004454: span 4434 - 4454 004456: span 4456 - 4476 ... local 4374; 1 (4346-4376) assign1 4376; 1 assign1 4410; 0 assign1 4424; 0 assign1 4434; 0 local 4436; 1 (4436-4446) assign1 4446; 2 assign1 4452; 1 local 4454; 1 (4434-4454) local 4456; 0 (4456-4476) assign1 4476; 1 ... 004374: L1 2 004376 2 004346 004376: 1 004374 004410: 1 004416 004416: L2318 2 004532 2 004524 2 004452 2 004410 004424: 1 004602 004434: 1 004454 004436: L2 2 004446 004446: 1 004436 004452: 1 004416 004454: L1 2 004434 004456: L1 2 004476 004476: 1 004456 ... 12 004370 000000 HALT R5 12 004372 000207 RTS R5 1: 12 -> 004374 112423 MOVB R4 R3 13 004376 001376 BNE R3 ; (ref 4374, 1b) 13 004400 005303 DEC R3 13 004402 126327 CPMB R3 R7 177777 004406 000057 ? 14 004410 001402 BEQ R7 ; (ref 4416, L2318) 14 004412 112723 MOVB R7 R3 000057 L2318: 14 -> 004416 004567 JSR R7 002722 14 004422 044052 BIC R0 R2 15 004424 103466 BLO R2 ; (ref 4602, L2434) 15 004426 012746 MOV R7 R6 000004 15 004432 005700 TST R0 16 004434 001007 BNE R0 ; (ref 4454, 1f) 2: 16 -> 004436 004567 JSR R7 002702 16 004442 044052 BIC R0 R2 16 004444 005316 DEC R6 17 004446 001373 BNE R6 ; (ref 4436, 2b) 17 004450 005726 TST R6 18 004452 000761 BR R6 ; (ref 4416, L2318) 1: 17 -> 004454 010302 MOV R3 R2 1: 17 -> 004456 004567 JSR R7 002662 17 004462 044052 BIC R0 R2 17 004464 010021 MOV R0 R1 17 004466 110022 MOVB R0 R2 17 004470 000300 SWAB R0 17 004472 110022 MOVB R0 R2 17 004474 005316 DEC R6 18 004476 001367 BNE R6 ; (ref 4456, 1b) 18 004500 005726 TST R6 18 004502 105022 CLRB R2 18 004504 005767 TST R7 003232 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Unix-jun72] status on disassembler 2008-05-21 7:44 ` Doug Merritt ` (2 preceding siblings ...) 2008-05-21 14:05 ` Brad Parker @ 2008-05-21 21:34 ` Tim Newsham 3 siblings, 0 replies; 10+ messages in thread From: Tim Newsham @ 2008-05-21 21:34 UTC (permalink / raw) > Thanks for the thought, but I don't think it is, quite, in part because > of the presence of backward branches: > > 2: > br 1f > br 2b > 1: > > Here the 2 could be a 1, despite overlapping ranges: > > 1: > br 1f > br 1b > 1: I think you can consider "1f" and "1b" to be two separate names for the sake of the register coloring algorithm (both of which happen to exist at the point where the "1:" label is defined). In many cases only the forward one will be live, in other cases only the backward one will be live, but sometimes both will be live simultaneously. This does add a slight difference to the graph because "1f" and "1b" must be given the same number "1" when both are alive at the same definition site. > To further complicate things, it would be especially nice to allocate > labels not just in a legal way (that preserves the semantics of the > program), but actually in a way as similar as possible to the way that > humans were doing so, for the sake of improved reconstruction. (A > complication with *that* is that, in some cases, the labels are > obviously subtopimal, presumably due to the historical sequence of edits.) I think doing a minimum coloring will be pretty close to what a human would do. If you want to, you can try to swap around labels so that the outter most labels have low numbers and the inner labels (nested inside loops) have the higher numbers... Tim Newsham http://www.thenewsh.com/~newsham/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-05-21 21:34 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch 2008-05-18 18:04 ` Tim Newsham 2008-05-20 5:16 ` [Unix-jun72] status on disassembler Doug Merritt 2008-05-20 7:39 ` Warren Toomey 2008-05-20 19:49 ` Tim Newsham 2008-05-21 7:44 ` Doug Merritt 2008-05-21 9:49 ` Brad Parker 2008-05-21 13:40 ` Brad Parker 2008-05-21 14:05 ` Brad Parker 2008-05-21 21:34 ` Tim Newsham
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).