The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Fwd: Helping in the battle against SCO
@ 2003-09-15 20:39 Norman Wilson
  2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
  2003-09-16  3:01 ` [TUHS] Fwd: Helping in the battle against SCO M. Warner Losh
  0 siblings, 2 replies; 6+ messages in thread
From: Norman Wilson @ 2003-09-15 20:39 UTC (permalink / raw)


Andru Luvisi:

  If SCO holds up a piece of common code and the good guys have no
  response, that is bad.

  If SCO holds up a piece of common code and the good guys already know
  that it actually came from BSD, and are prepared to demonstrate such,
  that is good.

  If SCO holds up a piece of common code and the good guys already know
  that it was contributed to Linux by SCO/Caldera themselves, and are
  prepared to demonstrate such, that is good.

  If there is infringing code, it should be taken out of Linux as quickly
  as possible.

======

I'll grant all those points, but if the idea is to defang SCO, the
effort still seems fruitless to me.

System V and Linux both contain appallingly large volumes of code.
(On a list that discusses the UNIX of the 1970s, perhaps I can say
that without creating undue ruckus.)  The odds are that quite a
lot of the code is similar.  Should we really spend months and months
tracking it all down and trying to declare where each line came from,
or should we wait until SCO declares a specific set of cases that matter
(as they must do sooner or later or abandon the court battle)?

When one is faced with an enormous set of possible computations, of
which only a handful are likely to be needed in the end, lazy evaluation
is usually the better choice.

It does seem sensible to me for the Linux community to do its best to
hunt down any infringing code, and to try to assess whether there's a
serious problem lurking that nobody had noticed.  But that ought to be
a matter of basic ethics, having nothing to do with SCO.  I doubt it
is likely to make much difference to the court battle anyway: SCO's
claim is that the infringing code is there now, that it was put there
deliberately at IBM's instigation to do harm to them, and that the harm
already exists; removing it now won't change any of that.  I think it's
a good idea to remove any infringements that are there now, even if they
are trivial ones; but let's not fool ourselves that it will pull SCO's
fangs to do so.

Norman Wilson
Toronto ON


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Lexical comparator, was Re: the battle against SCO
  2003-09-15 20:39 [TUHS] Fwd: Helping in the battle against SCO Norman Wilson
@ 2003-09-15 22:48 ` Warren Toomey
  2003-09-16  0:46   ` Rhys Weatherley
  2003-09-18  2:56   ` [TUHS] Lexical comparator Warren Toomey
  2003-09-16  3:01 ` [TUHS] Fwd: Helping in the battle against SCO M. Warner Losh
  1 sibling, 2 replies; 6+ messages in thread
From: Warren Toomey @ 2003-09-15 22:48 UTC (permalink / raw)


On Mon, Sep 15, 2003 at 04:39:31PM -0400, Norman Wilson wrote:
> It does seem sensible to me for the Linux community to do its best to
> hunt down any infringing code... But that ought to be a matter of basic 
> ethics, having nothing to do with SCO.  I doubt it is likely to make 
> much difference to the court battle anyway... I think it's
> a good idea to remove any infringements that are there now, even if they
> are trivial ones; but let's not fool ourselves that it will pull SCO's
> fangs to do so.
 
For me it's not just a matter of defeating SCO, it's also one of sheer
indignation in the face of Saganesque FUD ("billions and billions of
lines of code"). I seriously want to know if there's even the tiniest
possibility that SCO is right, or if they're are just Smoking Crack Often.
 
While we're on the topic, I saw esr's code shredder/comparator that works
on lines of code. This isn't going to work if variables get renamed etc.
I'm writing a code comparator that works on a lexical basis, comparing
C tokens. It's only going to be proof of concept (i.e. slow), but I
should have it done by week's end and I'll pop a notice in here when it's
ready.
 
Cheers,
        Warren


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Lexical comparator, was Re: the battle against SCO
  2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
@ 2003-09-16  0:46   ` Rhys Weatherley
  2003-09-18  2:56   ` [TUHS] Lexical comparator Warren Toomey
  1 sibling, 0 replies; 6+ messages in thread
From: Rhys Weatherley @ 2003-09-16  0:46 UTC (permalink / raw)


On Tuesday 16 September 2003 08:48 am, Warren Toomey wrote:

> While we're on the topic, I saw esr's code shredder/comparator that works
> on lines of code. This isn't going to work if variables get renamed etc.

I'd like to point out that the more steps that are taken to factor out 
identifier names, whitespace conventions, etc, the closer you approach a 
situation where the tool says "both programs are written in the same 
programming language" or "both programs use binary searching somewhere in 
their code".  Which, while true, isn't terribly useful to know.  A human 
being still needs to wade through the results and inspect them manually.

Cheers,

Rhys Weatherley.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Fwd: Helping in the battle against SCO
  2003-09-15 20:39 [TUHS] Fwd: Helping in the battle against SCO Norman Wilson
  2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
@ 2003-09-16  3:01 ` M. Warner Losh
  1 sibling, 0 replies; 6+ messages in thread
From: M. Warner Losh @ 2003-09-16  3:01 UTC (permalink / raw)


In message: <20030915204355.EB9311E5D at minnie.tuhs.org>
            norman at nose.cs.utoronto.ca (Norman Wilson) writes:
: tracking it all down and trying to declare where each line came from,

In BSD land, we can do that.  We have cvs annotate.  Looks like the
stubborn refusal to use source code control, and to have only a few
people putting things together makes it a lot harder to track things
down after the fact.  Good call that.

Warner


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Lexical comparator
  2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
  2003-09-16  0:46   ` Rhys Weatherley
@ 2003-09-18  2:56   ` Warren Toomey
  1 sibling, 0 replies; 6+ messages in thread
From: Warren Toomey @ 2003-09-18  2:56 UTC (permalink / raw)


On Tue, Sep 16, 2003 at 08:48:53AM +1000, Warren Toomey wrote:
> While we're on the topic, I saw esr's code shredder/comparator that works
> on lines of code. This isn't going to work if variables get renamed etc.
> I'm writing a code comparator that works on a lexical basis, comparing
> C tokens. It's only going to be proof of concept (i.e. slow), but I
> should have it done by week's end and I'll pop a notice in here when it's
> ready.

Well, it's done. The software is now available at
http://minnie.tuhs.org/Programs/Ctcompare. I have also made available
some tokenised source trees so you can do some comparisons straight away.

If anybody has Unix kernel trees which they cannot divulge due to licensing
restrictions, I'd appreciate you creating tokenised files of the kernel
source and e-mailing them to me.

Thanks!
	Warren


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] Lexical comparator
       [not found] <200309181041.h8IAfAWe000686@skeeve.com>
@ 2003-09-18 11:45 ` Warren Toomey
  0 siblings, 0 replies; 6+ messages in thread
From: Warren Toomey @ 2003-09-18 11:45 UTC (permalink / raw)


On Thu, Sep 18, 2003 at 01:41:10PM +0300, Aharon Robbins wrote:
> > If anybody has Unix kernel trees which they cannot divulge due to licensing
> > restrictions, I'd appreciate you creating tokenised files of the kernel
> > source and e-mailing them to me.
> 
> Hmmm.  Just between us chickens, given tokenized versions of an entire tree,
> how hard would it be to recreate a functional kernel?

Pretty damn hard. All identifiers, (variable names) are reduced to
a single token. Actually, that's not true. The meaning of the names
is removed an replaced with numeric identifiers that are unique to
each file. Here's a tokenised portion of 32V (bio.c):

   56:   struct id10 * 
   57:   id13 ( id14 , id15 ) 
   58:   id16 id14 ; 
   59:   id17 id15 ; 
   60:   { 
   61:   register struct id10 * id18 ; 
   62:   
   63:   id18 = id19 ( id14 , id15 ) ; 
   64:   if ( id18 ->id20 & id21 ) { 
   65:   #ifdef id1 
   66:   id9 . id5 ++ ; 
   67:   #endif 
   68:   return( id18 ) ; 
   69:   } 
   70:   id18 ->id20 |= id22 ; 
   71:   id18 ->id23 = id24 ; 
   72:   ( * id25 [ id26 ( id14 ) ] . id27 ) ( id18 ) ; 
   73:   #ifdef id1 
   74:   id9 . id3 ++ ; 
   75:   #endif 
   76:   id28 ( id18 ) ; 
   77:   return( id18 ) ; 
   78:   } 

Now go and check the actual source and work out which function it is!
[ see http://minnie.tuhs.org/UnixTree/32VKern/usr/src/sys/sys/bio.c.html ]

	Warren


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-18 11:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-15 20:39 [TUHS] Fwd: Helping in the battle against SCO Norman Wilson
2003-09-15 22:48 ` [TUHS] Lexical comparator, was " Warren Toomey
2003-09-16  0:46   ` Rhys Weatherley
2003-09-18  2:56   ` [TUHS] Lexical comparator Warren Toomey
2003-09-16  3:01 ` [TUHS] Fwd: Helping in the battle against SCO M. Warner Losh
     [not found] <200309181041.h8IAfAWe000686@skeeve.com>
2003-09-18 11:45 ` [TUHS] Lexical comparator Warren Toomey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).