From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <447EF168.10209@comtv.ru> Date: Thu, 1 Jun 2006 17:53:44 +0400 From: Victor Nazarov User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) MIME-Version: 1.0 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] csv files -> embarrasing References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Topicbox-Message-UUID: 58c2cfc2-ead1-11e9-9d60-3106f5b1d025 Steve Simon wrote: >Ok, I have spent half an hour trying to parse CSV files >and it's getting embarrasing, I could do it in C but I should >be able to use rc + sed + awk. > >The problem is that some of my CSV files fields contain whitespace >and thus have double quotes around them. > >I thought rc knows about %q quotes strings so I could use it to >do my parsing, but it fails, can this be done, or is C the answer? >seems a shame to resort to sledge hammers. > >-Steve > >cpu% cat file.csv >a,b,"c,d,e",f,g >p,q,r,s,t > >cpu% >cpu% cat extract >#!/bin/rc > >sed 's/"([^"]*)"/''\1''/g; s/,/ /g' $* | > while (s=`{read}) > echo $s(1) $s(3) $s(4) > > >cpu% extract file.csv >a 'c d >p r s > > > > Thought about this case today. In native Plan9 the solution is quite easy. Programs share environment and this is the answer: sed 's/"([^"]*)"/''\1''/g; s/,/ /g' $* | while (s=`{read}) { echo 's=('$"s')' | rc echo $s(1) $s(3) $s(4) }