From mboxrd@z Thu Jan 1 00:00:00 1970 From: reed@reedmedia.net (Jeremy C. Reed) Date: Thu, 8 Apr 2010 21:11:33 -0500 (CDT) Subject: [pups] extract old archive format? In-Reply-To: <1270775431.29470.for-standards-violators@oclsc.org> References: <1270775431.29470.for-standards-violators@oclsc.org> Message-ID: On Thu, 8 Apr 2010, Norman Wilson wrote: > Bob Eager: > > The 'ar' format of that vintage is trivial, and documentation easily > found. I wrote programs to read it back in 1976! > > ====== > > That's nothing. Either Ken or Dennis wrote such a program > years before that! > > Warren even has a binary somewhere to prove it! > > Seriously, it's a binary format, so I don't know that > it would be easy to process in awk. (At least not in > awk-classic; stuff that works only in ghootandwaveawk > is not all that interesting to me.) But the format is > simple, and any language new or old that can handle > binary data without tears should do. I can't see how to do it in awk either. > If I didn't have an overfull plate already (and a > visit to the Auto-Electrocution Consultant tomorrow, > and one to the Canal Rooting Clinic Monday--proving > that one should follow Father's advice and Stay Away > >>From The Canal, Neddie) it would be interesting to > collect the different specifications for ar headers > over the years, and write a small suite of programs > to read them. Perhaps in Python, just to be difficult. > (Why isn't there a language called Goon, Warren?) Well I found the ar specification (in ar.5 not ar.1). struct ar_hdr { char ar_name[14]; long ar_date; char ar_uid; char ar_gid; int ar_mode; long ar_size; }; This is same as the old ar.c source. (plus more in the manual page.) Now my problem is I don't know what "long" or "int" is on the old PDP-11 / system 5 this was made on. And I read about PDP-11 "middle endianess" (first time I heard of "middle"). So I had (wrong but gets ar_name and ar_size correct for my few tests for the first header but chops two characters into the data section). struct { char ar_name[14]; int32_t ar_date; char ar_uid; char ar_gid; uint16_t ar_mode; uint16_t ar_size; } ar_buf; Well I know above is wrong because ar_size and ar_date should be the same. But I get ar_size correct each time. But it also loses the next two bytes from the data. So I am guessing I have some endian issue where I am getting some things reversed. Any ideas? Note I am not using any system 5 or PDP-11 system. I am using a modern little endian (amd64) system to extract the files that were created in 1970s. Once I figure out the structure and endianness (if applicable) I will share back my code so others can extract ...