From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu From: Andrew Stitt Message-ID: Content-Type: TEXT/PLAIN; charset=US-ASCII References: Subject: Re: [9fans] dumb question Date: Thu, 27 Jun 2002 09:16:50 +0000 Topicbox-Message-UUID: bb2c9140-eaca-11e9-9e20-41e7f4b1d025 On Wed, 26 Jun 2002, rob pike, esq. wrote: > > i beg to differ, tar uses memory, it uses system resources, i fail to see > > how you think this is just as good as just recursively copying files. The > > point is I shouldnt have to needlessly use this other program (for Tape > > ARchives) to copy directorys. If this is on a fairly busy file server > > needlessly running tar twice is simply wasteful and unacceptable when you > > could just follow the directory tree. > > The command > tar c ... | tar x ... > will have almost identical cost to any explicit recursive copy, since > it does the same amount of work. The only extra overhead is writing it does not do the same amount of work, it runs two processes which use at least two pages instead of one, it also runs all the data through some sort of algorithm to place it into an archive, then immediate reverse the same algorithm to remove it, i dont see whats so complicated about this concept, it needlessly processes the data. Im sure if you read and wrote in parrallel using two processes itd be faster, guess what would be even faster? parallel cp, why? cp doesnt archive then dearchive, its as simple as that. i can almost guarentee you that tar-c|tar -x uses more clock cycles then cp, why? two processes both are simulataneously packing and unpacking data for a _local_ transfer. in reality the work ought to be done mostly by a dma controller which simply copys sectors from one location to another, then the other activities of adding inodes or table entries etc should be done in parallel, you have to remember that IO is done in the background because it is slow, with tar you have to at least move the data into some pages in memory, processes it, then run it through a device, where it gets copied into another processes address space, then processed then written to disk. If im copying a large directory tree im going to get a serious performance hit for that. If i can just have a well written cp program deal with having the disk simply copy some sectors around, hopefully with dma, thats going to be faster. maybe on your newer faster computers the difference is so small "who cares" but thats what keeps processor manufactuerers in business, cutting corners like that. you keep replacing speed with blind simplicity and 2 months later your hardware is obsolete, so you buy new faster hardware which now runs all your older kludge fast enough, so you cut corners on the new stuff. take microsoft for example. they wrote windows 95, which iirc runs reasonably well on even a mere 386, now we've got win xp, which has several hundred mhz as the minimum required speed! yet how much does it really add to the mix? the UI is nearly the same, sure its got some bells and whistles but you can take that away. I wouldnt dare try it on a 386 (much less use the cd as more then a coaster but anyways) run win95 on a p4, yea its pretty darn fast, compare to winxp on a p4? i doubt you're going to get the same performance out of it. im extreme i know, but seriously, my points follow logically, tar|tar is not a more efficient solution, nor is it in any way acceptable. sure i may not do cp -R a lot, but when youve got 50-100 people using your network maybe they will? either way, its a complete waste of resources that can be better used elsewhere. can someone tell me whats so difficult to see about that? > to and reading from the pipe, which is so cheap - and entirelylocal - > that it is insignificant. > > Whether you cp or tar, you must read the files from one tree and write > them to another. > > -rob