[-- Attachment #1: Type: text/plain, Size: 835 bytes --] All, I'm tooling along during our newfangled rolling blackouts and frigid temperatures (in Texas!) and reading some good old unix books. I keep coming across the commands cut and paste and join and suchlike. I use cut all the time for stuff like: ls -l | tr -s ' '| cut -f1,4,9 -d \ ... -rw-r--r-- staff main.rs and who | grep wsenn | cut -c 1-8,10-17 wsenn console wsenn ttys000 but that's just cuz it's convenient and useful. To my knowledge, I've never used paste or join outside of initially coming across them. But, they seem to 'fit' with cut. My question for y'all is, was there a subset of related utilities that these were part of that served some common purpose? On a related note, join seems like part of an aborted (aka never fully realized) attempt at a text based rdb to me... What say you? Will [-- Attachment #2: Type: text/html, Size: 1275 bytes --]
[-- Attachment #1: Type: text/plain, Size: 724 bytes --] On Tue, 16 Feb 2021, Will Senn wrote: > To my knowledge, I've never used paste or join outside of initially > coming across them. But, they seem to 'fit' with cut. My question for > y'all is, was there a subset of related utilities that these were part > of that served some common purpose? On a related note, join seems like > part of an aborted (aka never fully realized) attempt at a text based > rdb to me... I use "cut" a fair bit, rarely use "paste", but as for "join" and RDBs, just look at the man page: "join — relational database operator". As for future use, who knows? Could be a fun project for someone with time on their hands (not me!). -- Dave, who once implemented a "join" operation with BDB
> To my knowledge, I've never used paste or join outside of initially > coming across them. But, they seem to 'fit' with cut. My question for > y'all is, was there a subset of related utilities that these were > part of that served some common purpose? On a related note, join > seems like part of an aborted (aka never fully realized) attempt at a > text based rdb to me... My copy is hiding from me, so I can't be sure, but iirc Bourne's _The Unix System_ (978-0-201-13791-0) had a section on this sort of "text database" and may have discussed the `join` command. De
[-- Attachment #1: Type: text/plain, Size: 870 bytes --] On 2/16/21 3:02 PM, Dave Horsfall wrote: > On Tue, 16 Feb 2021, Will Senn wrote: > >> To my knowledge, I've never used paste or join outside of initially >> coming across them. But, they seem to 'fit' with cut. My question for >> y'all is, was there a subset of related utilities that these were >> part of that served some common purpose? On a related note, join >> seems like part of an aborted (aka never fully realized) attempt at a >> text based rdb to me... > > I use "cut" a fair bit, rarely use "paste", but as for "join" and > RDBs, just look at the man page: "join — relational database > operator". As for future use, who knows? Could be a fun project for > someone with time on their hands (not me!). > > -- Dave, who once implemented a "join" operation with BDB Oh brother! RTFM... properly... :). Still, I'm curious about the history. Will [-- Attachment #2: Type: text/html, Size: 1434 bytes --]
On Tue, 16 Feb 2021, Will Senn wrote:
> Oh brother! RTFM... properly... :). Still, I'm curious about the
> history.
We all have our moments :-) Yes, I'd like to know the history too; those
tools definitely have a database-ish look about them. All the bits seem
to be there; they just have to be, ahem, joined together...
-- Dave
Will Senn wrote,
> join seems like part of an aborted (aka never fully realized) attempt at a text based rdb to me
As the original author of join, I can attest that there was no thought
of parlaying join into a database system. It was inspired by
databases, but liberated from them, much as grep was liberated from an
editor.
Doug
[-- Attachment #1: Type: text/plain, Size: 675 bytes --] On 2/16/21 7:08 PM, M Douglas McIlroy wrote: > Will Senn wrote, >> join seems like part of an aborted (aka never fully realized) attempt at a text based rdb to me > As the original author of join, I can attest that there was no thought > of parlaying join into a database system. It was inspired by > databases, but liberated from them, much as grep was liberated from an > editor. > > Doug Nice! Thanks Doug. Too bad, though... one gets ever tired of having to log into db's and a simple text db system would be useful. Even sqlite, which I love, requires login to get at information... I'm already logged in, why can't I just ask for my info and have it returned? Will [-- Attachment #2: Type: text/html, Size: 1264 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1004 bytes --] On 2/16/21 6:16 PM, Will Senn wrote: > Nice! Thanks Doug. Too bad, though... one gets ever tired of having to > log into db's and a simple text db system would be useful. Even sqlite, > which I love, requires login to get at information... I'm already logged > in, why can't I just ask for my info and have it returned? What do you mean by "log into db's" in relation to SQLite? I've never needed to enter a username and password to access SQLite. If you /do/ mean username and password, I believe that some DBs will allow you to authenticate using Kerberos. Thus you should be able to streamline DB access along with access to many other things. If you /don't/ mean username and password, then what do you mean? Are you referring to needing to run a command to open and access the SQLite DB? Taking a quick gander at sqlite3 --help makes me think that you can append the SQL(ite) command that you want to run to the command line. -- Grant. . . . unix || die [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4013 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1793 bytes --] On 2/16/21 7:43 PM, Grant Taylor via TUHS wrote: > On 2/16/21 6:16 PM, Will Senn wrote: >> Nice! Thanks Doug. Too bad, though... one gets ever tired of having >> to log into db's and a simple text db system would be useful. Even >> sqlite, which I love, requires login to get at information... I'm >> already logged in, why can't I just ask for my info and have it >> returned? > > What do you mean by "log into db's" in relation to SQLite? I've never > needed to enter a username and password to access SQLite. > > If you /do/ mean username and password, I believe that some DBs will > allow you to authenticate using Kerberos. Thus you should be able to > streamline DB access along with access to many other things. > > If you /don't/ mean username and password, then what do you mean? Are > you referring to needing to run a command to open and access the > SQLite DB? Taking a quick gander at sqlite3 --help makes me think > that you can append the SQL(ite) command that you want to run to the > command line. > > > Oops. That's right, no username & password, but you still need to bring it up and interact with it... accept, as you say, you can enter your sql as an argument to the executable. OK, I suppose ... grump, grump... Not quite what I was thinking, but I'd be hard pressed to argue the difference between creating a handful of files in the filesystem (vs tables in sqlite) and then using some unix filter utilities to access and combine the file relations (vs passing sql to sqlite) other than, it'd be fun if there were select, col, row (grep?), join (inner, outer, natural), utils that worked with text without the need to worry about the finickiness of the database (don't stone me as a database unbeliever, I've used plenty in my day). Will [-- Attachment #2: Type: text/html, Size: 2501 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1522 bytes --] I'm not sure what you're thinking of, but there is no login in SQLite: its only access control is at the DB level, and that's Unix file permissions. Carl Strozzi's NOSQL system (not to be confused with the concept of NoSQL databases) is a relational database built using ordinary Unix utilities and pipelines. Each table is a TSV file with a header line whose fields are the column names prefixed by ^A so that they always sort to the top. It also provides commands like "jointable", which is "join" wrapped in an awk script that collects the column names from the tables and does a natural join. The package can be downloaded from < http://www.strozzi.it/shared/nosql/nosql-4.1.11.tar.gz>. The documentation is shonky, but the code works nicely. On Tue, Feb 16, 2021 at 8:17 PM Will Senn <will.senn@gmail.com> wrote: > On 2/16/21 7:08 PM, M Douglas McIlroy wrote: > > Will Senn wrote, > > join seems like part of an aborted (aka never fully realized) attempt at a text based rdb to me > > As the original author of join, I can attest that there was no thought > of parlaying join into a database system. It was inspired by > databases, but liberated from them, much as grep was liberated from an > editor. > > Doug > > Nice! Thanks Doug. Too bad, though... one gets ever tired of having to log > into db's and a simple text db system would be useful. Even sqlite, which I > love, requires login to get at information... I'm already logged in, why > can't I just ask for my info and have it returned? > > Will > [-- Attachment #2: Type: text/html, Size: 3020 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2436 bytes --] On 2/16/21 7:26 PM, Will Senn wrote: > Oops. That's right, no username & password, but you still need to bring > it up and interact with it... accept, as you say, you can enter your sql > as an argument to the executable. OK, I suppose ... grump, grump... ;-) Take a moment and grump. I know that I've made similar mistakes from unknown options. > Not quite what I was thinking, but I'd be hard pressed to argue the > difference between creating a handful of files in the filesystem > (vs tables in sqlite) and then using some unix filter utilities to > access and combine the file relations (vs passing sql to sqlite) I don't know where the line is to transition from stock text files and an actual DB. I naively suspect that by the time you need an index, you should have transitioned to a DB. > other than, it'd be fun if there were select, col, row (grep?), join > (inner, outer, natural), utils that worked with text without the need > to worry about the finickiness of the database I'm confident that it's quite possible to do similar types of, if not actually the same, operation with traditional Unix utilities vs SQL, at least for relatively simple queries. The last time I looked, join didn't want to work on more than two inputs at one time. So you're left with something like two different joins, one of which working on the output from the other one. I suspect that one of the differences is where the data lives. If it's STDIO, then traditional Unix utilities are king. If it's something application specific and only accessed by said application, then a DB is probably a better bet. Then there's the fact that some consider file systems to be a big DB that is mounted. }:-) > (don't stone me as a database unbeliever, I've used plenty in my day). Use of something does not implicitly make you a supporter of or advocate for something. ;-) I like SQLite and Berkeley DB in that they don't require a full RDBMS running. Instead, an application can load what it needs and access the DB itself. I don't remember how many files SQLite uses to store a DB. A single (or few) file(s) make it relatively easy to exchange DBs with people. E.g. someone can populate the DB and then send copies of it to coworkers for their distributed use. Something that's harder to do with a typical RDBMS. -- Grant. . . . unix || die [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4013 bytes --]
Grant Taylor via TUHS <tuhs@minnie.tuhs.org> wrote:
> I don't know where the line is to transition from stock text files and
> an actual DB. I naively suspect that by the time you need an index, you
> should have transitioned to a DB.
Didn't AT&T Research at some point write a database, called Daytona,
that worked like ordinary Unix commands? E.g. it just sat there in disk
files when you weren't using it. There was no "database server". When
you wanted to do some operation on it, you ran a command, which read the
database and did what you wanted and wrote out results and stopped and
returned to the shell prompt. How novel!
Supposedly it had high performance on large collections of data,
with millions or billions of records. Things like telephone billing
data.
I found a couple of conference papers about it, but never saw specs for
it, not even man pages. How did Daytona fit into Unix history? Was
it ever part of a Unix release?
John
daytona was always a separate commercial product. it was an extremely large, very efficient database. you should think of it as analogous to a large postgres system. rick greer was the primary author; an overview paper is http://www09.sigmod.org/sigmod/sigmod99/eproceedings/papers/greer.pdf for many years, probably now as well, it was the main way that at&t stored per-call information. as of the mid 2000s, it had over 2 trillion calls in it. > On Feb 17, 2021, at 2:14 AM, John Gilmore <gnu@toad.com> wrote: > > Grant Taylor via TUHS <tuhs@minnie.tuhs.org> wrote: >> I don't know where the line is to transition from stock text files and >> an actual DB. I naively suspect that by the time you need an index, you >> should have transitioned to a DB. > > Didn't AT&T Research at some point write a database, called Daytona, > that worked like ordinary Unix commands? E.g. it just sat there in disk > files when you weren't using it. There was no "database server". When > you wanted to do some operation on it, you ran a command, which read the > database and did what you wanted and wrote out results and stopped and > returned to the shell prompt. How novel! > > Supposedly it had high performance on large collections of data, > with millions or billions of records. Things like telephone billing > data. > > I found a couple of conference papers about it, but never saw specs for > it, not even man pages. How did Daytona fit into Unix history? Was > it ever part of a Unix release? > > John >
On Tue, 16 Feb 2021, Grant Taylor via TUHS wrote:
> Then there's the fact that some consider file systems to be a big DB
> that is mounted. }:-)
It is; it's a hierarchical DB (and is still used as such).
-- Dave, who remembers the hierarchical/relational DB wars
[-- Attachment #1: Type: text/plain, Size: 2360 bytes --] On Wed, Feb 17, 2021 at 5:16 AM John Gilmore <gnu@toad.com> wrote: > Grant Taylor via TUHS <tuhs@minnie.tuhs.org> wrote: > > I don't know where the line is to transition from stock text files and > > an actual DB. I naively suspect that by the time you need an index, you > > should have transitioned to a DB. > > Didn't AT&T Research at some point write a database, called Daytona, > that worked like ordinary Unix commands? E.g. it just sat there in disk > files when you weren't using it. There was no "database server". When > you wanted to do some operation on it, you ran a command, which read the > database and did what you wanted and wrote out results and stopped and > returned to the shell prompt. How novel! > > Supposedly it had high performance on large collections of data, > with millions or billions of records. Things like telephone billing > data. > > I found a couple of conference papers about it, but never saw specs for > it, not even man pages. How did Daytona fit into Unix history? Was > it ever part of a Unix release? > It seems that Andrew has addressed Daytona, but there was a small database package called `pq` that shipped with plan9 at one point that I believe started life on Unix. It was based on "flat" text files as the underlying data source, and one would describe relations internally using some mechanism (almost certainly another special file). An interesting feature was that it was "implicitly relational": you specified the data you wanted and it constructed and executed a query internally: no need to "JOIN" tables on attributes and so forth. I believe it supported indices that were created via a special command. I think it was used as the data source for the AT&T internal "POST" system. A big downside was that you could not add records to the database in real time. It was taken to Cibernet Inc (they did billing reconciliation for wireless carriers. That is, you have an AT&T phone but make a call that's picked up by T-Mobile's tower: T-Mobile lets you make the call but AT&T has to pay them for the service. I contracted for them for a short time when I got out of the Marine Corps---the first time) and enhanced and renamed "Eteron" and the record append issue was, I believe, solved. Sadly, I think that technology was lost when Cibernet was acquired. It was kind of cool. - Dan C. [-- Attachment #2: Type: text/html, Size: 2936 bytes --]
The last group before I left the labs in 1992 was on was the POST team. pq stood for "post query," but POST consisted of - - mailx: (from SVR3.1) as the mail user agent - UPAS: (from research UNIX) as the mail delivery agent - pq: the program to query the database - EV: (pronounced like the biblical name) the database (and the genesis program to create indices) - post: program to combine all the above to read email and to send mail via queries pq by default would looku up people pq lastname: find all people with lastname, same as pq last=lastname pq first.last: find all people with first last, same as pq first=first/last=last pq first.m.last: find all people with first m last, same as pq first=first/middle=m/last=last this how email to dennis.m.ritchie @ att.com worked to send it on to research!dmr you could send mail to a whole department via /org=45267 or the whole division via /org=45 or a whole location via /loc=mh or just the two people in a specific office via /loc=mh/room=2f-164 these are "AND"s an "OR" is just another query after it on the same line There were some special extentions - - prefix, e.g. pq mackin* got all mackin, mackintosh, mackinson, etc - soundex, e.g. pq mackin~ got all with the last name that sounding like mackin, so names such as mackin, mckinney, mckinnie, mickin, mikami, etc (mackintosh and mackinson did not match the soundex, therefore not included) The EV database was general and fairly simple. It was directory with files called "Data" and "Proto" in it. "Data" was plain text, pipe delineated fields, newline separated records - 123456|ritchie|dennis|m||r320|research!dmr|11273|mh|2c-517|908|582|3770 (used data from preserved at https://www.bell-labs.com/usr/dmr/www/) "Proto" defined the fields in a record (I didn't remember exact syntax anymore) - id n i last a i first a i middle a - suffix a - soundex a i email a i org n i loc a i room a i area n i exch n i ext n i "n" means a number so 00001 was the same as 1, and "a" means alpha, the "i" or "-" told genesis if an index should be generated or not. I think is had more but that has faded with the years. If indices are generated it would then point to the block number in Data, so an lseek(2) could get to the record quick. I beleive there was two levels of block pointing indices. (sort of like inode block pointers had direct and indirect blocks) So everytime you added records to Data you had to regenerate all the indices, that was very time consuming. The nice thing about text Data was grep(1) worked just fine, or cut -d'|' or awk -F'|' but pq was much faster with a large numer of records. -Brian Dan Cross <crossd at gmail.com> wrote: > It seems that Andrew has addressed Daytona, but there was a small database > package called `pq` that shipped with plan9 at one point that I believe > started life on Unix. It was based on "flat" text files as the underlying > data source, and one would describe relations internally using some > mechanism (almost certainly another special file). An interesting feature > was that it was "implicitly relational": you specified the data you wanted > and it constructed and executed a query internally: no need to "JOIN" > tables on attributes and so forth. I believe it supported indices that were > created via a special command. I think it was used as the data source for > the AT&T internal "POST" system. A big downside was that you could not add > records to the database in real time. > > It was taken to Cibernet Inc (they did billing reconciliation for wireless > carriers. That is, you have an AT&T phone but make a call that's picked up > by T-Mobile's tower: T-Mobile lets you make the call but AT&T has to pay > them for the service. I contracted for them for a short time when I got out > of the Marine Corps---the first time) and enhanced and renamed "Eteron" and > the record append issue was, I believe, solved. Sadly, I think that > technology was lost when Cibernet was acquired. It was kind of cool. > > - Dan C. >
The Plan 9 version of pq can be found here: https://9p.io/sources/extra/pq.tgz Cheers, Anthony
On Tue, Feb 16, 2021 at 08:26:11PM -0600, Will Senn wrote: [...] > Oops. That's right, no username & password, but you still need to > bring it up and interact with it... accept, as you say, you can > enter your sql as an argument to the executable. OK, I suppose ... > grump, grump... Not quite what I was thinking, but I'd be hard > pressed to argue the difference between creating a handful of files > in the filesystem (vs tables in sqlite) and then using some unix > filter utilities to access and combine the file relations (vs > passing sql to sqlite) other than, it'd be fun if there were select, > col, row (grep?), join (inner, outer, natural), utils that worked > with text without the need to worry about the finickiness of the > database (don't stone me as a database unbeliever, I've used plenty > in my day). I am not sure if this is what you are looking for, but sections 3 and 4 of "The AWK Programming Language" (by Aho, Kernighan and Weinberger) have a description of very nice data processing scripts written in AWK. Might even work in gawk. Might even work, actually - I had no time to write the code into files and give it a try. Personally, I would rather use awk for this rather than multiple command line utilities. Might be a bit nicer to modern system with process accounting enabled (I once wrote a shell script processing mailbox files, plenty of echos and greps, but since then have seen the light and I repented). On the other hand, on multiprocessor computer, each part of pipe runs in parallel, but I guess this had been said already. Also, found this in my notes - if you, or anybody from a future would like a quick glimpse on "what awk": :: Drinking coffee with AWK https://lobste.rs/s/hdljia/drinking_coffee_with_awk https://opensource.com/article/19/2/drinking-coffee-awk :: Using AWK and R to parse 25tb https://lobste.rs/s/kgah5l/using_awk_r_parse_25tb https://livefreeordichotomize.com/2019/06/04/using_awk_and_r_to_parse_25tb/ -- Regards, Tomasz Rola -- ** A C programmer asked whether computer had Buddha's nature. ** ** As the answer, master did "rm -rif" on the programmer's home ** ** directory. And then the C programmer became enlightened... ** ** ** ** Tomasz Rola mailto:tomasz_rola@bigfoot.com **