* Parsing CVS files @ 2019-02-02 18:59 Dominik Vogt 2019-02-02 20:14 ` Peter Stephenson 0 siblings, 1 reply; 4+ messages in thread From: Dominik Vogt @ 2019-02-02 18:59 UTC (permalink / raw) To: Zsh Users Hi folks, I'm looking for an easy way to split the lines of a .csv file into the fields of an array variable. There's a script that does that somewhore on the net. But that script parses lines character by character and just manages to parse about 100 (long) lines per second. Fields in a .csv file are separated by commas, *but* commas between a pair of quotes do not split. Or phrased differently: Commas that have an even number of double quotes left of them do split, but commas with an uneven number left of then don't split. Any ideas for a quick implementation? Ciao Dominik ^_^ ^_^ -- Dominik Vogt ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Parsing CVS files 2019-02-02 18:59 Parsing CVS files Dominik Vogt @ 2019-02-02 20:14 ` Peter Stephenson 2019-02-03 4:19 ` Sebastian Gniazdowski 0 siblings, 1 reply; 4+ messages in thread From: Peter Stephenson @ 2019-02-02 20:14 UTC (permalink / raw) To: Zsh Users > I'm looking for an easy way to split the lines of a .csv file into > the fields of an array variable. There's a script that does that > somewhore on the net. But that script parses lines character by > character and just manages to parse about 100 (long) lines per > second. > > Fields in a .csv file are separated by commas, *but* commas > between a pair of quotes do not split. Or phrased differently: > Commas that have an even number of double quotes left of them do > split, but commas with an uneven number left of then don't split. > > Any ideas for a quick implementation? Sebastian has done similar things so may have better ideas. If you're happy to use shell syntax --- in other words, the other forms of quoting are active, not just double quotes, so backslashes and single quotes might do inconvenient things --- and you're not too bothered about unquoted spaces, which will add extra lines of splitting, you can use this trick: % line='This,"is, quite possibly, a",line,"of,stuff","with,commas"' % print -rl ${(Q)${${(z)${line//,/, }}%%,}//, /,} This is, quite possibly, a line of,stuff with,commas Each comma gets a space added, then the line is split on syntactically active spaces; any comma at the end of a field is removed; the remaining commas are restored. To strip the quotes, add the (Q) flag to the outermost step. If you need to be careful about unquoted spaces, you need to be cleverer: e.g. backslash quote them and then remove the bacslashes later. E.g. up to subtle effects associated with backslashes print -rl ${${${${(z)${${line// /\\ }//,/, }}%%,}//, /,}//\\ / } will retain existing spaces. Also, if you want to keep empty fields, you'll need the final result to use "${(@}this}". Probably easiest to assign to an array as otherwise the quotes will affect the substitution. If you're worried about subtle effects with backslashes, I don't think you're ever going to be satisfied with a quick and dirty hack like this, so you'll have to decide how sophisticated you need to be. pws ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Parsing CVS files 2019-02-02 20:14 ` Peter Stephenson @ 2019-02-03 4:19 ` Sebastian Gniazdowski 2019-02-05 21:30 ` Dominik Vogt 0 siblings, 1 reply; 4+ messages in thread From: Sebastian Gniazdowski @ 2019-02-03 4:19 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh Users On Sat, 2 Feb 2019 at 21:27, Peter Stephenson > Also, if you want to keep empty fields, you'll need the final result > to use "${(@}this}". Probably easiest to assign to an array as otherwise > the quotes will affect the substitution. I've had to use "(@)" for every segment, i.e.: % line='abc,"efg, hehe,yeah",c,,d' % print -rl "${(@)${(@)${(@)${(z)${${line// /\\ }//,/, }}%%,}//, /,}//\\ / }" abc "efg, hehe,yeah" c d And with a random "(@)" missing: % print -rl "${(@)${${(@)${(z)${${line// /\\ }//,/, }}%%,}//, /,}//\\ / }" abc "efg, hehe,yeah" c d -- Sebastian Gniazdowski News: https://twitter.com/ZdharmaI IRC: https://kiwiirc.com/client/chat.freenode.net:+6697/#zplugin Blog: http://zdharma.org ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Parsing CVS files 2019-02-03 4:19 ` Sebastian Gniazdowski @ 2019-02-05 21:30 ` Dominik Vogt 0 siblings, 0 replies; 4+ messages in thread From: Dominik Vogt @ 2019-02-05 21:30 UTC (permalink / raw) To: zsh-users On Sun, Feb 03, 2019 at 05:19:08AM +0100, Sebastian Gniazdowski wrote: > On Sat, 2 Feb 2019 at 21:27, Peter Stephenson > > Also, if you want to keep empty fields, you'll need the final result > > to use "${(@}this}". Probably easiest to assign to an array as otherwise > > the quotes will affect the substitution. > > I've had to use "(@)" for every segment, i.e.: > > % line='abc,"efg, hehe,yeah",c,,d' > % print -rl "${(@)${(@)${(@)${(z)${${line// /\\ }//,/, }}%%,}//, /,}//\\ / }" > abc > "efg, hehe,yeah" > c > > d > > And with a random "(@)" missing: > % print -rl "${(@)${${(@)${(z)${${line// /\\ }//,/, }}%%,}//, /,}//\\ / }" > abc "efg, hehe,yeah" c d Uff, looks complicated but should fix my problem. Thanks! Spaces and backslashes are no real problem beacause they can be replaced with some unique character sequence before splitting. Ciao Dominik ^_^ ^_^ -- Dominik Vogt ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-02-05 21:30 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-02 18:59 Parsing CVS files Dominik Vogt 2019-02-02 20:14 ` Peter Stephenson 2019-02-03 4:19 ` Sebastian Gniazdowski 2019-02-05 21:30 ` Dominik Vogt
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).