The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: William Corcoran <wlc@jctaylor.com>
To: Richard Tobin <richard@inf.ed.ac.uk>
Cc: "tuhs@tuhs.org" <tuhs@tuhs.org>
Subject: Re: [TUHS] Awk for CSV files
Date: Sun, 13 Oct 2019 18:46:01 +0000	[thread overview]
Message-ID: <E6A771F4-70CD-4D8C-884F-F0CD806B5B20@jctaylor.com> (raw)
In-Reply-To: <20191013135344.E0F4C292AD4E@macaroni.inf.ed.ac.uk>

Today, working with v7m, SVR1, and bsd2.11 all PDP11 ports, for example, will stay booted and operational for long periods under simulation. 

With these older UNIX variants, working with awk and even the classic shell tools is often problematic.  Moreover resource constraints seem to be a persistent annoyance under simulation.  

When dealing with even moderately sized text files, one is often left with writing a C program to ameliorate the limitations of any attempt to exclusively use awk, and the other classic shell tools. It’s not a leap to suggest that users running UNIX on actual metal instead of simulation faced the same resource challenges.  

Holy cow have things changed.   Today, awk, and the other classic shell tools are amazing.   Resource limitations are rare or even non-existent, especially so in the Cloud.  Google seems to have led the way into taming unstructured data.   Even email today is virtually one huge text stream where it’s binary element is masked by even more text.  Text, text, text!   All of this text data (CSV or whatever) has paved the way and extended the meaningful life of the classic shell tools and even newer tools that are now classics—-especially when an RDB is involved.  

Just don’t hit that null or you might need to ameliorate with C again.   

Truly,

Bill Corcoran


> On Oct 13, 2019, at 10:35 AM, Richard Tobin <richard@inf.ed.ac.uk> wrote:
> 
> I was reminded of this by Larry's comment:
> 
>> I miss Brian on this list.  I've interacted with him over the years, the
>> one I remember the most was I was trying to do an awk like interface to a
>> key/value "database".
> 
> Recently I've had to deal with a lot of data in CSV
> (comma-separated-value) format.  Awk is *almost* prefect for this, but
> of course doesn't handle the quoting of fields that contain commas.
> One can usually work around it by finding a character that doesn't
> occur in the data and converting the CSV file to use that as the
> separator, but it's not ideal.
> 
> Awk's input could easily be modified to handle CSV files, but output
> would be a bit more difficult, because you don't specify field
> boundaries explicitly on output.  One possibility would be a printf()
> format specifier that takes a field and quotes it appropriately.
> 
> -- Richard
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 

  parent reply	other threads:[~2019-10-13 18:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-13 13:53 Richard Tobin
2019-10-13 14:57 ` arnold
2019-10-13 18:46 ` William Corcoran [this message]
2019-10-13 19:36   ` Henry Bent
2019-10-13 22:30     ` William Corcoran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E6A771F4-70CD-4D8C-884F-F0CD806B5B20@jctaylor.com \
    --to=wlc@jctaylor.com \
    --cc=richard@inf.ed.ac.uk \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).