zsh-users
 help / color / mirror / code / Atom feed
* iterating through a hierarchy with a filter
@ 2008-04-10  2:56 Alexy Khrabrov
  2008-04-10  8:51 ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Alexy Khrabrov @ 2008-04-10  2:56 UTC (permalink / raw)
  To: zsh-users

Greetings -- I have a series of XML files scattered around a  
hierarchy.  I also have a filter script which reads stdin and writes  
stdout in a standard Unix way.

I need to create a mirror hierarchy where files are results of  
applying the filter to the originals, replacing the original  
extension, say .xml, with the result extension, say .txt.  Except for  
the extension change, the hierarchy should be preserved.

Ideally, my script would not know anything about this whole process,  
so I can take any filter and use it for my transforms.  The hierarchy  
for the transformed tree must be created anew so I can easily throw it  
away (i.e. we do not output the results next to the originals).  The  
original and result extensions should be provided as a parameter to  
the process.

Which zshfoo can I use for it above and beyond find with exec helper  
script?

Cheers,
Alexy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10  2:56 iterating through a hierarchy with a filter Alexy Khrabrov
@ 2008-04-10  8:51 ` Peter Stephenson
  2008-04-10  9:03   ` Alexy Khrabrov
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2008-04-10  8:51 UTC (permalink / raw)
  To: zsh-users

Alexy Khrabrov wrote:
> Greetings -- I have a series of XML files scattered around a  
> hierarchy.  I also have a filter script which reads stdin and writes  
> stdout in a standard Unix way.
> 
> I need to create a mirror hierarchy where files are results of  
> applying the filter to the originals, replacing the original  
> extension, say .xml, with the result extension, say .txt.  Except for  
> the extension change, the hierarchy should be preserved.
> 
> Ideally, my script would not know anything about this whole process,  
> so I can take any filter and use it for my transforms.  The hierarchy  
> for the transformed tree must be created anew so I can easily throw it  
> away (i.e. we do not output the results next to the originals).  The  
> original and result extensions should be provided as a parameter to  
> the process.
> 
> Which zshfoo can I use for it above and beyond find with exec helper  
> script?

zsh doesn't have any specific code for descending hierarchies, so you
would have to do that by trickery.  If it's shallow enough that globbing
the whole thing in one go will work, you can do things along the lines
of (untested):

for file1 in source/**/*.xml; do
  file2=dest/${${file1##source/}:r}.txt
  destdir=${file2:h}
  [[ -d $destdir ]] || mkdir -p $destdir
  filter <$file1 >$file2
done

If that doesn't work even with a few small tweaks, you'll probably have
to tell us why before we can advise better.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10  8:51 ` Peter Stephenson
@ 2008-04-10  9:03   ` Alexy Khrabrov
  2008-04-10  9:31     ` Peter Stephenson
  2008-04-10 10:52     ` Thor Andreassen
  0 siblings, 2 replies; 11+ messages in thread
From: Alexy Khrabrov @ 2008-04-10  9:03 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

[-- Attachment #1: Type: text/plain, Size: 720 bytes --]


On Apr 10, 2008, at 1:51 AM, Peter Stephenson wrote:
> [...]
>
> zsh doesn't have any specific code for descending hierarchies, so you
> would have to do that by trickery.  If it's shallow enough that  
> globbing
> the whole thing in one go will work, you can do things along the lines
> of (untested):
>
> for file1 in source/**/*.xml; do
>  file2=dest/${${file1##source/}:r}.txt
>  destdir=${file2:h}
>  [[ -d $destdir ]] || mkdir -p $destdir
>  filter <$file1 >$file2
> done
>
> If that doesn't work even with a few small tweaks, you'll probably  
> have
> to tell us why before we can advise better.

Well, my hierarchy is a million small files.  So I doubt globbing will  
work -- should I try?  :)

Cheers,
Alexy

[-- Attachment #2: Type: text/html, Size: 1092 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10  9:03   ` Alexy Khrabrov
@ 2008-04-10  9:31     ` Peter Stephenson
  2008-04-10  9:34       ` Peter Stephenson
  2008-04-10 10:52     ` Thor Andreassen
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2008-04-10  9:31 UTC (permalink / raw)
  To: zsh-users

On Thu, 10 Apr 2008 02:03:21 -0700
Alexy Khrabrov <deliverable@gmail.com> wrote:
> On Apr 10, 2008, at 1:51 AM, Peter Stephenson wrote:
> >
> > for file1 in source/**/*.xml; do
> >  file2=dest/${${file1##source/}:r}.txt
> >  destdir=${file2:h}
> >  [[ -d $destdir ]] || mkdir -p $destdir
> >  filter <$file1 >$file2
> > done
> >
> > If that doesn't work even with a few small tweaks, you'll probably  
> > have to tell us why before we can advise better.
> 
> Well, my hierarchy is a million small files.  So I doubt globbing will  
> work -- should I try?  :)

You might get lucky, but it sounds like you need to do it the hard way.
You can use that code as the core, except you can move the directory
handling out of the way and end up with something like (again this is
untested):


handledir() {
  local sdir=$1 ddir=$2
  local dir file tail

  [[ -d $ddir ]] || mkdir -p $ddir

  for dir in $sdir/*(/N); do
     handledir $dir $ddir/${dir:t}
  done

  for file in $sdir/*.xml(N)); do
     filter <$file >$ddir/{$file:t}
  done
}

handledir sourcedir destdir


The only special zsh features are the globbing flags.  (N) forces
the expression to expand to nothing at all if no patterns matched.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10  9:31     ` Peter Stephenson
@ 2008-04-10  9:34       ` Peter Stephenson
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Stephenson @ 2008-04-10  9:34 UTC (permalink / raw)
  Cc: zsh-users

Peter Stephenson wrote:
>      filter <$file >$ddir/{$file:t}
                            ^^^^^^^^^

That should be ${file:t:r}.txt

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10  9:03   ` Alexy Khrabrov
  2008-04-10  9:31     ` Peter Stephenson
@ 2008-04-10 10:52     ` Thor Andreassen
  2008-04-10 10:56       ` Thor Andreassen
  2008-04-10 11:42       ` Stephane Chazelas
  1 sibling, 2 replies; 11+ messages in thread
From: Thor Andreassen @ 2008-04-10 10:52 UTC (permalink / raw)
  To: zsh-users

On Thu, Apr 10, 2008 at 02:03:21AM -0700, Alexy Khrabrov wrote:
> 
> On Apr 10, 2008, at 1:51 AM, Peter Stephenson wrote:

[...]

> >for file1 in source/**/*.xml; do
> > file2=dest/${${file1##source/}:r}.txt
> > destdir=${file2:h}
> > [[ -d $destdir ]] || mkdir -p $destdir
> > filter <$file1 >$file2
> >done

[...]

> Well, my hierarchy is a million small files.  So I doubt globbing will  
> work -- should I try?  :)

Doing it as a stream should work, e.g.:

find source/ -iname '*.xml' | while read file1; do 
  file2=dest/${${file1##source/}:r}.txt
  destdir=${file2:h}
  [[ -d $destdir ]] || mkdir -p $destdir
  filter < $file1 > $file2
done

-- 
Thor


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10 10:52     ` Thor Andreassen
@ 2008-04-10 10:56       ` Thor Andreassen
  2008-04-10 11:42       ` Stephane Chazelas
  1 sibling, 0 replies; 11+ messages in thread
From: Thor Andreassen @ 2008-04-10 10:56 UTC (permalink / raw)
  To: zsh-users

On Thu, Apr 10, 2008 at 12:52:45PM +0200, Thor Andreassen wrote:

[...]

> find source/ -iname '*.xml' | while read file1; do 

Improvement, use:

find source/ -type f -iname '*.xml'

to make sure you only filter real files.

-- 
Thor


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10 10:52     ` Thor Andreassen
  2008-04-10 10:56       ` Thor Andreassen
@ 2008-04-10 11:42       ` Stephane Chazelas
  2008-04-14 21:39         ` Alexy Khrabrov
  1 sibling, 1 reply; 11+ messages in thread
From: Stephane Chazelas @ 2008-04-10 11:42 UTC (permalink / raw)
  To: zsh-users

On Thu, Apr 10, 2008 at 12:52:45PM +0200, Thor Andreassen wrote:
[...]
> find source/ -iname '*.xml' | while read file1; do 

That should be while IFS= read -r file1

but that assumes that the file names don't contain newline
characters. -iname is a GNU extension, that's neither POSIX nor
Unix. While you're at using GNU extensions, you could use
-print0:

find source -iname '*.xml' | while IFS= read -rd$'\0' file1...

as $'\0' won't be found in a file name.

>   file2=dest/${${file1##source/}:r}.txt
>   destdir=${file2:h}
>   [[ -d $destdir ]] || mkdir -p $destdir
>   filter < $file1 > $file2
> done
[...]

This can be done in zsh with:

for file1 in **/*.(#i)xml(.NDoN); do

That builds the whole list first, but you can also do:

process() {
  file2=dest/${${1#source/}:r}.txt
  destdir=${file2:h}
  [[ -d $destdir ]] || mkdir -p -- $destdir
  filter < $file1 > $file2
  return 1
}
: **/*.(#i)xml(.NDoN+process)


POSIXly, you could do:

find source -type f -name '*.[xX][mM][lL]' -exec sh -c '
  for file1 do
    file2=${file1%.*}.txt
    file2=dest/${file2#source/}
    destdir=${file2%/*}
    [ -d "$destdir" ] || mkdir -p -- "$destdir"
    filter < "$file1" > "$file2"
  done' inline {} +

-- 
Stéphane


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-10 11:42       ` Stephane Chazelas
@ 2008-04-14 21:39         ` Alexy Khrabrov
  2008-04-14 23:33           ` Vincent Lefevre
  0 siblings, 1 reply; 11+ messages in thread
From: Alexy Khrabrov @ 2008-04-14 21:39 UTC (permalink / raw)
  To: zsh-users

I have the filter given as a parameter to my script, invoked as  
suggested,

$filter < $file1 > $file2

If I give a single existing script as a parameter, it works fine.  If,  
however, I give it

walk 'iconv -f utf8 -t cp1251' srcdir tgtdir ...

-- I get "command not found" for 'iconv -f utf8 -t cp1251' at the line  
above.  Since the walk script starts with 

#/bin/zsh
filter=$1

I wonder what kind of quoting happens and how to "dequote" it so the  
command line will look indeed like

iconv -f utf8 -t cp1251 < $file1 > $file2

E.g., doing filter="$1" doesn't change it.
Cheers,
Alexy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-14 21:39         ` Alexy Khrabrov
@ 2008-04-14 23:33           ` Vincent Lefevre
  2008-04-15  9:51             ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Vincent Lefevre @ 2008-04-14 23:33 UTC (permalink / raw)
  To: zsh-users

On 2008-04-14 14:39:25 -0700, Alexy Khrabrov wrote:
> I have the filter given as a parameter to my script, invoked as  
> suggested,
>
> $filter < $file1 > $file2
>
> If I give a single existing script as a parameter, it works fine.  If,  
> however, I give it
>
> walk 'iconv -f utf8 -t cp1251' srcdir tgtdir ...
>
> -- I get "command not found" for 'iconv -f utf8 -t cp1251' at the line  
> above.  Since the walk script starts with
>
> #/bin/zsh
> filter=$1

You need to do sh word-splitting on $1 and make $filter an array:

filter=(${=1})

Alternatively, you can enable sh word-splitting globally.

-- 
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: iterating through a hierarchy with a filter
  2008-04-14 23:33           ` Vincent Lefevre
@ 2008-04-15  9:51             ` Peter Stephenson
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Stephenson @ 2008-04-15  9:51 UTC (permalink / raw)
  To: zsh-users

On Tue, 15 Apr 2008 01:33:53 +0200
Vincent Lefevre <vincent@vinc17.org> wrote:
> On 2008-04-14 14:39:25 -0700, Alexy Khrabrov wrote:
> > If I give a single existing script as a parameter, it works fine.  If,  
> > however, I give it
> >
> > walk 'iconv -f utf8 -t cp1251' srcdir tgtdir ...
> >
> > -- I get "command not found" for 'iconv -f utf8 -t cp1251' at the line  
> > above.
> 
> You need to do sh word-splitting on $1 and make $filter an array:
> 
> filter=(${=1})
> 
> Alternatively, you can enable sh word-splitting globally.

That should work fine in this case.

More generally, I would be inclined to take the attitude that the argument
to your script is a complete command line in itself.  In that case, the
logical thing to do is to "eval" the variable that contains it.  That means
that you can put anything there you would in a normal zsh command line.  If
you want the command to be executed "at arm's length", put the eval inside
parentheses:

(eval $1) <input >output

I have a vague memory that the shell is smart enough only to fork once
if there's a single external command in the eval, but the mists of time may
be confusing me.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-04-15  9:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-10  2:56 iterating through a hierarchy with a filter Alexy Khrabrov
2008-04-10  8:51 ` Peter Stephenson
2008-04-10  9:03   ` Alexy Khrabrov
2008-04-10  9:31     ` Peter Stephenson
2008-04-10  9:34       ` Peter Stephenson
2008-04-10 10:52     ` Thor Andreassen
2008-04-10 10:56       ` Thor Andreassen
2008-04-10 11:42       ` Stephane Chazelas
2008-04-14 21:39         ` Alexy Khrabrov
2008-04-14 23:33           ` Vincent Lefevre
2008-04-15  9:51             ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).