The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Lexical comparator
       [not found] <200309181041.h8IAfAWe000686@skeeve.com>
@ 2003-09-18 11:45 ` Warren Toomey
  0 siblings, 0 replies; 2+ messages in thread
From: Warren Toomey @ 2003-09-18 11:45 UTC (permalink / raw)


On Thu, Sep 18, 2003 at 01:41:10PM +0300, Aharon Robbins wrote:
> > If anybody has Unix kernel trees which they cannot divulge due to licensing
> > restrictions, I'd appreciate you creating tokenised files of the kernel
> > source and e-mailing them to me.
> 
> Hmmm.  Just between us chickens, given tokenized versions of an entire tree,
> how hard would it be to recreate a functional kernel?

Pretty damn hard. All identifiers, (variable names) are reduced to
a single token. Actually, that's not true. The meaning of the names
is removed an replaced with numeric identifiers that are unique to
each file. Here's a tokenised portion of 32V (bio.c):

   56:   struct id10 * 
   57:   id13 ( id14 , id15 ) 
   58:   id16 id14 ; 
   59:   id17 id15 ; 
   60:   { 
   61:   register struct id10 * id18 ; 
   62:   
   63:   id18 = id19 ( id14 , id15 ) ; 
   64:   if ( id18 ->id20 & id21 ) { 
   65:   #ifdef id1 
   66:   id9 . id5 ++ ; 
   67:   #endif 
   68:   return( id18 ) ; 
   69:   } 
   70:   id18 ->id20 |= id22 ; 
   71:   id18 ->id23 = id24 ; 
   72:   ( * id25 [ id26 ( id14 ) ] . id27 ) ( id18 ) ; 
   73:   #ifdef id1 
   74:   id9 . id3 ++ ; 
   75:   #endif 
   76:   id28 ( id18 ) ; 
   77:   return( id18 ) ; 
   78:   } 

Now go and check the actual source and work out which function it is!
[ see http://minnie.tuhs.org/UnixTree/32VKern/usr/src/sys/sys/bio.c.html ]

	Warren


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [TUHS] Lexical comparator
  2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
@ 2003-09-18  2:56   ` Warren Toomey
  0 siblings, 0 replies; 2+ messages in thread
From: Warren Toomey @ 2003-09-18  2:56 UTC (permalink / raw)


On Tue, Sep 16, 2003 at 08:48:53AM +1000, Warren Toomey wrote:
> While we're on the topic, I saw esr's code shredder/comparator that works
> on lines of code. This isn't going to work if variables get renamed etc.
> I'm writing a code comparator that works on a lexical basis, comparing
> C tokens. It's only going to be proof of concept (i.e. slow), but I
> should have it done by week's end and I'll pop a notice in here when it's
> ready.

Well, it's done. The software is now available at
http://minnie.tuhs.org/Programs/Ctcompare. I have also made available
some tokenised source trees so you can do some comparisons straight away.

If anybody has Unix kernel trees which they cannot divulge due to licensing
restrictions, I'd appreciate you creating tokenised files of the kernel
source and e-mailing them to me.

Thanks!
	Warren


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-09-18 11:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200309181041.h8IAfAWe000686@skeeve.com>
2003-09-18 11:45 ` [TUHS] Lexical comparator Warren Toomey
2003-09-15 20:39 [TUHS] Fwd: Helping in the battle against SCO Norman Wilson
2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
2003-09-18  2:56   ` [TUHS] Lexical comparator Warren Toomey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).