From mboxrd@z Thu Jan  1 00:00:00 1970
To: 9fans@cse.psu.edu
From: Andrew Stitt <astitt@cats.ucsc.edu>
Message-ID: <Pine.SOL.3.96.1020626115705.5163A-100000@teach.ic.ucsc.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII
References: <d1e338bf88fa75c556b45bd0811040fa@plan9.bell-labs.com>
Subject: Re: [9fans] dumb question
Date: Thu, 27 Jun 2002 09:16:50 +0000
Topicbox-Message-UUID: bb2c9140-eaca-11e9-9e20-41e7f4b1d025

On Wed, 26 Jun 2002, rob pike, esq. wrote:

> > i beg to differ, tar uses memory, it uses system resources, i fail to see
> > how you think this is just as good as just recursively copying files. The
> > point is I shouldnt have to needlessly use this other program (for Tape
> > ARchives) to copy directorys. If this is on a fairly busy file server
> > needlessly running tar twice is simply wasteful and unacceptable when you
> > could just follow the directory tree.
>
> The command
> 	tar c ... | tar x ...
> will have almost identical cost to any explicit recursive copy, since
> it does the same amount of work.  The only extra overhead is writing

it does not do the same amount of work, it runs two processes which use at
least two pages instead of one, it also runs all the data through some
sort of algorithm to place it into an archive, then immediate reverse the
same algorithm to remove it, i dont see whats so complicated about this
concept, it needlessly processes the data. Im sure if you read and wrote
in parrallel using two processes itd be faster, guess what would be even
faster? parallel cp, why? cp doesnt archive then dearchive, its as simple
as that. i can almost guarentee you that tar-c|tar -x uses more clock
cycles then cp, why? two processes both are simulataneously packing and
unpacking data for a _local_ transfer. in reality the work ought to be
done mostly by a dma controller which simply copys sectors from one
location to another, then the other activities of adding inodes or table
entries etc should be done in parallel, you have to remember that IO is
done in the background because it is slow, with tar you have to at least
move the data into some pages in memory, processes it, then run it through
a device, where it gets copied into another processes address space, then
processed then written to disk. If im copying a large directory tree im
going to get a serious performance hit for that. If i can just have a well
written cp program deal with having the disk simply copy some sectors
around, hopefully with dma, thats going to be faster. maybe on your newer
faster computers the difference is so small "who cares" but thats what
keeps processor manufactuerers in business, cutting corners like that. you
keep replacing speed with blind simplicity and 2 months later your
hardware is obsolete, so you buy new faster hardware which now runs all
your older kludge fast enough, so you cut corners on the new stuff. take
microsoft for example. they wrote windows 95, which iirc runs reasonably
well on even a mere 386, now we've got win xp, which has several hundred
mhz as the minimum required speed! yet how much does it really add to the
mix? the UI is nearly the same, sure its got some bells and whistles but
you can take that away. I wouldnt dare try it on a 386 (much less use the
cd as more then a coaster but anyways) run win95 on a p4, yea its pretty
darn fast, compare to winxp on a p4? i doubt you're going to get the same
performance out of it. im extreme i know, but seriously, my points follow
logically, tar|tar is not a more efficient solution, nor is it in any way
acceptable. sure i may not do cp -R a lot, but when youve got 50-100
people using your network maybe they will? either way, its a complete
waste of resources that can be better used elsewhere. can someone tell me
whats so difficult to see about that?

> to and reading from the pipe, which is so cheap - and entirelylocal -
> that it is insignificant.
>
> Whether you cp or tar, you must read the files from one tree and write
> them to another.
>
> -rob