The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: brad@heeltoe.com (Brad Parker)
Subject: [Unix-jun72] status on disassembler
Date: Wed, 21 May 2008 05:49:25 -0400	[thread overview]
Message-ID: <27294.1211363365@mini> (raw)
In-Reply-To: <20080521074439.9D3575A52C@remarque.org>


Doug Merritt wrote:
>Tim Newsham wrote:
>> This is a standard register allocation problem (ie. assigning registers to 
>
>Thanks for the thought, but I don't think it is, quite, in part because
>of the presence of backward branches:
>
>2:
>	br	1f
>	br	2b
>1:
>
>Here the 2 could be a 1, despite overlapping ranges:
>
>1:
>	br	1f
>	br	1b
>1:

I'm confused.  Those are both the same, aren't they?

Isn't the simplest thing to do 2 passes and generate numeric labels as
needed?  i.e.

L0001:
	br	L0002
	br	L0001
L0002:

That's ugly, but it's correct and will assemble so I'd start there.

Once you have that, and you have all the code in memory as some sort of
tree, it should be possible to make a pass an attempt to clean this
up a little.

Here's a snipped of real code, with the uninteresting parts turned into "..."

badsys:
	...
	mov	$3f,u.namep
	...
	br	1f
	...
	br	2f
1:
	...
2:
	br	sysexit
3:
	<core\0\0>

-------

badsys:
	...
	mov	$L0001,u.namep
	...
	br	L0002
	...
	br	L0003
L0002:
	...
L0003:
	br	sysexit
L0001:
	<core\0\0>

You should be able to scan the code and check for references to each
location.  Assume for a second you have an vector with one entry for
each instruction location.  You could then chain references off each
element.  With that information you could tell the "lifetime" of a
label. i.e.  at L0001 above you could look and decide that that no one
else referenced it going forward.

I would think that as you scanned that vector from beginning to end you
could keep a count of active references, and use that to reset your
local label number.

Let's try it.

   badsys:
01	...
02 	mov	$L0001,u.namep
03	...
04	br	L0002
05	...
06	br	L0003
   L0002:
07	...
   L0003:
08	br	sysexit
   L0001:
09	<core\0\0>
10


01
02 uses L0001      +1
03
04 uses L0002      +2
05
06 uses L0003      +3
07 defines L0002   +2; look forward, no refs to L0002 so decrement
08 defines L0003   +1; look forward, no refs to L0003 so decrement
09 defines L0001   0; look forward, no refs to L0001 so decrement
10

I'm probably trivializing it, but I'm willing to put my money where my mouth
is if you want me to take a look at your code.

I think it has to be a 2 pass operation, however. I think this is true of
any good disassembler; they are generally multiple pass.  Deciding where
the "data" is sometimes takes a little work, which can sometimes be
a heuristic like "it looks like ascii and there is a reference to it",
so having a tree of references generally helps.

This is not unlike binary recompilation where you follow each possible
code path to a terminal node.

-brad




  reply	other threads:[~2008-05-21  9:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-18 14:56 [Unix-jun72] Kernel built under V1 James A. Markevitch
2008-05-18 18:04 ` Tim Newsham
2008-05-20  5:16   ` [Unix-jun72] status on disassembler Doug Merritt
2008-05-20  7:39     ` Warren Toomey
2008-05-20 19:49     ` Tim Newsham
2008-05-21  7:44       ` Doug Merritt
2008-05-21  9:49         ` Brad Parker [this message]
2008-05-21 13:40         ` Brad Parker
2008-05-21 14:05         ` Brad Parker
2008-05-21 21:34         ` Tim Newsham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27294.1211363365@mini \
    --to=brad@heeltoe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).