The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Awk for CSV files
@ 2019-10-13 13:53 Richard Tobin
  2019-10-13 14:57 ` arnold
  2019-10-13 18:46 ` William Corcoran
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Tobin @ 2019-10-13 13:53 UTC (permalink / raw)
  To: tuhs

I was reminded of this by Larry's comment:

> I miss Brian on this list.  I've interacted with him over the years, the
> one I remember the most was I was trying to do an awk like interface to a
> key/value "database".

Recently I've had to deal with a lot of data in CSV
(comma-separated-value) format.  Awk is *almost* prefect for this, but
of course doesn't handle the quoting of fields that contain commas.
One can usually work around it by finding a character that doesn't
occur in the data and converting the CSV file to use that as the
separator, but it's not ideal.

Awk's input could easily be modified to handle CSV files, but output
would be a bit more difficult, because you don't specify field
boundaries explicitly on output.  One possibility would be a printf()
format specifier that takes a field and quotes it appropriately.

-- Richard

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Awk for CSV files
  2019-10-13 13:53 [TUHS] Awk for CSV files Richard Tobin
@ 2019-10-13 14:57 ` arnold
  2019-10-13 18:46 ` William Corcoran
  1 sibling, 0 replies; 5+ messages in thread
From: arnold @ 2019-10-13 14:57 UTC (permalink / raw)
  To: tuhs, richard

Awk and csv isn't new. Googling 'awk csv' gets you a bunch of results.

There is also the 'csv' dynamically loadable extension for gawk to
be found in the gawkextlib project.  Contact me off-list if you want
more details.

Thanks,

Arnold

Richard Tobin <richard@inf.ed.ac.uk> wrote:

> I was reminded of this by Larry's comment:
>
> > I miss Brian on this list.  I've interacted with him over the years, the
> > one I remember the most was I was trying to do an awk like interface to a
> > key/value "database".
>
> Recently I've had to deal with a lot of data in CSV
> (comma-separated-value) format.  Awk is *almost* prefect for this, but
> of course doesn't handle the quoting of fields that contain commas.
> One can usually work around it by finding a character that doesn't
> occur in the data and converting the CSV file to use that as the
> separator, but it's not ideal.
>
> Awk's input could easily be modified to handle CSV files, but output
> would be a bit more difficult, because you don't specify field
> boundaries explicitly on output.  One possibility would be a printf()
> format specifier that takes a field and quotes it appropriately.
>
> -- Richard
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Awk for CSV files
  2019-10-13 13:53 [TUHS] Awk for CSV files Richard Tobin
  2019-10-13 14:57 ` arnold
@ 2019-10-13 18:46 ` William Corcoran
  2019-10-13 19:36   ` Henry Bent
  1 sibling, 1 reply; 5+ messages in thread
From: William Corcoran @ 2019-10-13 18:46 UTC (permalink / raw)
  To: Richard Tobin; +Cc: tuhs

Today, working with v7m, SVR1, and bsd2.11 all PDP11 ports, for example, will stay booted and operational for long periods under simulation. 

With these older UNIX variants, working with awk and even the classic shell tools is often problematic.  Moreover resource constraints seem to be a persistent annoyance under simulation.  

When dealing with even moderately sized text files, one is often left with writing a C program to ameliorate the limitations of any attempt to exclusively use awk, and the other classic shell tools. It’s not a leap to suggest that users running UNIX on actual metal instead of simulation faced the same resource challenges.  

Holy cow have things changed.   Today, awk, and the other classic shell tools are amazing.   Resource limitations are rare or even non-existent, especially so in the Cloud.  Google seems to have led the way into taming unstructured data.   Even email today is virtually one huge text stream where it’s binary element is masked by even more text.  Text, text, text!   All of this text data (CSV or whatever) has paved the way and extended the meaningful life of the classic shell tools and even newer tools that are now classics—-especially when an RDB is involved.  

Just don’t hit that null or you might need to ameliorate with C again.   

Truly,

Bill Corcoran


> On Oct 13, 2019, at 10:35 AM, Richard Tobin <richard@inf.ed.ac.uk> wrote:
> 
> I was reminded of this by Larry's comment:
> 
>> I miss Brian on this list.  I've interacted with him over the years, the
>> one I remember the most was I was trying to do an awk like interface to a
>> key/value "database".
> 
> Recently I've had to deal with a lot of data in CSV
> (comma-separated-value) format.  Awk is *almost* prefect for this, but
> of course doesn't handle the quoting of fields that contain commas.
> One can usually work around it by finding a character that doesn't
> occur in the data and converting the CSV file to use that as the
> separator, but it's not ideal.
> 
> Awk's input could easily be modified to handle CSV files, but output
> would be a bit more difficult, because you don't specify field
> boundaries explicitly on output.  One possibility would be a printf()
> format specifier that takes a field and quotes it appropriately.
> 
> -- Richard
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Awk for CSV files
  2019-10-13 18:46 ` William Corcoran
@ 2019-10-13 19:36   ` Henry Bent
  2019-10-13 22:30     ` William Corcoran
  0 siblings, 1 reply; 5+ messages in thread
From: Henry Bent @ 2019-10-13 19:36 UTC (permalink / raw)
  To: William Corcoran; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 1639 bytes --]

On Sun, 13 Oct 2019 at 14:54, William Corcoran <wlc@jctaylor.com> wrote:

> Today, working with v7m, SVR1, and bsd2.11 all PDP11 ports, for example,
> will stay booted and operational for long periods under simulation.
>

This has certainly been my experience as well with mature, for the era, BSD
ports.  Most of the problems I have encountered have been with TCP/IP
issues in 2.11BSD and Ultrix 3.1 related to traffic they were never
expecting.


> With these older UNIX variants, working with awk and even the classic
> shell tools is often problematic.  Moreover resource constraints seem to be
> a persistent annoyance under simulation.
>

I think our expectations of a 16 bit Unix are going to be well out of
proportion now.  When 16 and 32 bit Unix systems existed side by side it
was easy to consider the resource limitations of both when programming.
Now that 16 bit systems have moved completely out of general end user space
we only consider the constraints of the 32 and 64 bit systems that exist
side by side.

Some of my interests lie in preserving early '90s hardware with original
operating systems, and working within their constraints.  Porting modern
tools to SunOS 4 or Ultrix 4 or IRIX 4 (huh, everyone really was stuck on
the same version) is a challenge but it can be done, as long as those tools
do not necessarily rely on shared libraries.  Backporting C99 to C89 is
often not as difficult as it would seem, though it can sometimes be
laborious.  On the other hand, porting modern tools to my PDP 11/23 running
2.9BSD is a total non-starter for reasons that I am sure I do not have to
elaborate upon here.

-Henry

[-- Attachment #2: Type: text/html, Size: 2162 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [TUHS] Awk for CSV files
  2019-10-13 19:36   ` Henry Bent
@ 2019-10-13 22:30     ` William Corcoran
  0 siblings, 0 replies; 5+ messages in thread
From: William Corcoran @ 2019-10-13 22:30 UTC (permalink / raw)
  To: Henry Bent; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 2094 bytes --]

Agreed.  Although it seems like a lot of these processor “bits” beyond 32 are now put to use into senseless saccharin AI endeavors like autocorrect.  I type “its” and it comes out as “it’s” when clearly it should have been “its” and it’s depressing, even more so when I miss it.

Truly,

Bill Corcoran
On Oct 13, 2019, at 3:36 PM, Henry Bent <henry.r.bent@gmail.com<mailto:henry.r.bent@gmail.com>> wrote:

On Sun, 13 Oct 2019 at 14:54, William Corcoran <wlc@jctaylor.com<mailto:wlc@jctaylor.com>> wrote:
Today, working with v7m, SVR1, and bsd2.11 all PDP11 ports, for example, will stay booted and operational for long periods under simulation.

This has certainly been my experience as well with mature, for the era, BSD ports.  Most of the problems I have encountered have been with TCP/IP issues in 2.11BSD and Ultrix 3.1 related to traffic they were never expecting.

With these older UNIX variants, working with awk and even the classic shell tools is often problematic.  Moreover resource constraints seem to be a persistent annoyance under simulation.

I think our expectations of a 16 bit Unix are going to be well out of proportion now.  When 16 and 32 bit Unix systems existed side by side it was easy to consider the resource limitations of both when programming.  Now that 16 bit systems have moved completely out of general end user space we only consider the constraints of the 32 and 64 bit systems that exist side by side.

Some of my interests lie in preserving early '90s hardware with original operating systems, and working within their constraints.  Porting modern tools to SunOS 4 or Ultrix 4 or IRIX 4 (huh, everyone really was stuck on the same version) is a challenge but it can be done, as long as those tools do not necessarily rely on shared libraries.  Backporting C99 to C89 is often not as difficult as it would seem, though it can sometimes be laborious.  On the other hand, porting modern tools to my PDP 11/23 running 2.9BSD is a total non-starter for reasons that I am sure I do not have to elaborate upon here.

-Henry

[-- Attachment #2: Type: text/html, Size: 3080 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-13 22:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-13 13:53 [TUHS] Awk for CSV files Richard Tobin
2019-10-13 14:57 ` arnold
2019-10-13 18:46 ` William Corcoran
2019-10-13 19:36   ` Henry Bent
2019-10-13 22:30     ` William Corcoran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).