zsh-users
 help / color / mirror / code / Atom feed
* is text file?
@ 1997-09-28 22:31 Jose Unpingco
  1997-09-29  0:06 ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Jose Unpingco @ 1997-09-28 22:31 UTC (permalink / raw)
  To: zsh

hi,

I usually use PERL's -T in a function to check if a file is ASCII
or binary. Is there a way to do this using zsh.

Something like

% ls **/*(flag here for ASCII)

would be nice.

Thank you for your time and consideration.  
----------------------------------------------------------------------
        Jose Unpingco   Mail Code ECE 0407; WK# (619) 534-5904
----------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-28 22:31 is text file? Jose Unpingco
@ 1997-09-29  0:06 ` Bart Schaefer
  1997-09-29 16:25   ` Greg Badros
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 1997-09-29  0:06 UTC (permalink / raw)
  To: Jose Unpingco, zsh

On Sep 28,  3:31pm, Jose Unpingco wrote:
} Subject: is text file?
}
} I usually use PERL's -T in a function to check if a file is ASCII
} or binary. Is there a way to do this using zsh.

I'm sure Larry Wall will forgive me for saying that perl's -T is a hack.
It reads a chunk of the file and guesses whether the whole file is ASCII
based on the contents of that fragment.

Zsh's globbing uses only information from readdir() and stat()/lstat(),
and hopefully is going to stay that way.

An approximation might be (with extendedglob set):

% ls **/*~*(${~${(j/|/)fignore}})(.)

That is, all plain files that do not have extensions listed in `fignore'.
You could change (.) to (.^*) to omit executables, but that would also
omit most shell scripts.

(Somebody tell me why the extra ${~...} is needed in that expression.)

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-29  0:06 ` Bart Schaefer
@ 1997-09-29 16:25   ` Greg Badros
  1997-09-29 17:44     ` Bart Schaefer
  1997-09-29 21:28     ` TGAPE!
  0 siblings, 2 replies; 9+ messages in thread
From: Greg Badros @ 1997-09-29 16:25 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Jose Unpingco, zsh

"Bart Schaefer" <schaefer@brasslantern.com> writes:

> On Sep 28,  3:31pm, Jose Unpingco wrote:
> } I usually use PERL's -T in a function to check if a file is ASCII
> } or binary. Is there a way to do this using zsh.
> 
> I'm sure Larry Wall will forgive me for saying that perl's -T is a hack.
> It reads a chunk of the file and guesses whether the whole file is ASCII
> based on the contents of that fragment.
> 
> Zsh's globbing uses only information from readdir() and stat()/lstat(),
> and hopefully is going to stay that way.
> 

I disagree.  The reason perl's -T exists *even though* it is such a hack
is because it is damn useful.  I'm sick of accidentally grepping through
binaries, and would love a zsh feature that would let me do:

grep foo *(T.) # search for foo in all non-binary files

Why must zsh's globbing restrict itself to only filesystem
meta-information?  Yes, I understand a (T) glob modifier would be slower
since zsh would have to read the first bit of the file (another seek)
but who cares -- users time in being able to restrict the set of files
glob more usefully is worth a lot.  Yes, there are other ways to
restrict a glob in a similar way (for example, using a script that echos
its arguments after removing names of binary files [using file]), but
they are far worse hacks than letting zsh do it for you.

Perhaps it would seem less hacky if there were a general
user-programmable glob feature that would call a function on each
filename and accept that file for the glob iff the function returns
0.  Then the way that you determine what kind of file a filename points
to is not part of the shell, but the nice glob modifier interface is
permitted. 

> An approximation might be (with extendedglob set):
> 
> % ls **/*~*(${~${(j/|/)fignore}})(.)
> 
> That is, all plain files that do not have extensions listed in `fignore'.
> You could change (.) to (.^*) to omit executables, but that would also
> omit most shell scripts.
> 
> (Somebody tell me why the extra ${~...} is needed in that expression.)

I'm fairly certain I'll never type such an incantation (how long did it
take to dream it up? :-) ).

Greg J. Badros
gjb@cs.washington.edu
Seattle, WA  USA
http://www.cs.washington.edu/homes/gjb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-29 16:25   ` Greg Badros
@ 1997-09-29 17:44     ` Bart Schaefer
  1997-09-29 21:28     ` TGAPE!
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 1997-09-29 17:44 UTC (permalink / raw)
  To: Greg Badros; +Cc: zsh

On Sep 29,  9:25am, Greg Badros wrote:
} Subject: Re: is text file?
}
} > % ls **/*~*(${~${(j/|/)fignore}})(.)
} 
} I'm fairly certain I'll never type such an incantation (how long did it
} take to dream it up? :-) ).

Oh, I dreamt it up almost immediately.  Took me about eight tries to get
the syntax right, though (mostly because of needing that ${~}).

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-29 16:25   ` Greg Badros
  1997-09-29 17:44     ` Bart Schaefer
@ 1997-09-29 21:28     ` TGAPE!
  1997-09-30  3:53       ` Bart Schaefer
  1997-09-30 16:19       ` Greg Badros
  1 sibling, 2 replies; 9+ messages in thread
From: TGAPE! @ 1997-09-29 21:28 UTC (permalink / raw)
  To: Greg Badros; +Cc: schaefer, unpingco, zsh-users

Greg Badros wrote:
>
> "Bart Schaefer" <schaefer@brasslantern.com> writes:
>
> Perhaps it would seem less hacky if there were a general
> user-programmable glob feature that would call a function on each
> filename and accept that file for the glob iff the function returns
> 0.  Then the way that you determine what kind of file a filename points
> to is not part of the shell, but the nice glob modifier interface is
> permitted. 

It thusly degenerates to the case of running a find operation which execs
file on all of your files, and greps out binaries & data.  Nothing really
gained, execept baggage.

>> An approximation might be (with extendedglob set):
>>
>> % ls **/*~*(${~${(j/|/)fignore}})(.)
>>
>> That is, all plain files that do not have extensions listed in `fignore'.
>> You could change (.) to (.^*) to omit executables, but that would also
>> omit most shell scripts.
>>
>> (Somebody tell me why the extra ${~...} is needed in that expression.)
>
> I'm fairly certain I'll never type such an incantation (how long did it
> take to dream it up? :-) ).

Leave wizard's school now.  You don't have the potential.  That
incantation is trivial compared to somethings I've done.  Remember,
everything can be done as a perl one-liner.  This can be translated into
a zsh command line.  There's an emacs minor editting mode which already
does it, however.

Ed

(well, everything except testing links, rather than what they point to.
Or am I missing something?  Please tell me I am; I've a tchell script
I don't want to admit to owning.  Overly-complicated programs should
never be written in csh-family shells.)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-29 21:28     ` TGAPE!
@ 1997-09-30  3:53       ` Bart Schaefer
  1997-09-30 16:19       ` Greg Badros
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 1997-09-30  3:53 UTC (permalink / raw)
  To: zsh-users

On Sep 29,  9:28pm, TGAPE! wrote:
} Subject: Re: is text file?
}
} everything can be done as a perl one-liner.  This can be translated into
} a zsh command line.  There's an emacs minor editting mode which already
} does it, however.

An emacs mode that does what?  Everything as a perl one-liner?  Translation
of perl into zsh?  Or grepping only in text files?  (Do you mean igrep?)

} (well, everything except testing links, rather than what they point to.
} Or am I missing something?  Please tell me I am

The `-' qualifier makes all additional qualifiers follow symlinks, e.g.:

% print -rc *(-T)

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-29 21:28     ` TGAPE!
  1997-09-30  3:53       ` Bart Schaefer
@ 1997-09-30 16:19       ` Greg Badros
  1997-09-30 18:56         ` Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Greg Badros @ 1997-09-30 16:19 UTC (permalink / raw)
  To: TGAPE!; +Cc: schaefer, unpingco, zsh-users

TGAPE! <tgape@cyberramp.net> writes:

> Greg Badros wrote:
> >
> > "Bart Schaefer" <schaefer@brasslantern.com> writes:
> >
[ NOTE: This is misattributed to Bart Scahefer, I wrote this ]
> > Perhaps it would seem less hacky if there were a general
> > user-programmable glob feature that would call a function on each
> > filename and accept that file for the glob iff the function returns
> > 0.  Then the way that you determine what kind of file a filename points
> > to is not part of the shell, but the nice glob modifier interface is
> > permitted. 
> 
> It thusly degenerates to the case of running a find operation which execs
> file on all of your files, and greps out binaries & data.  Nothing really
> gained, execept baggage.

No then you simply add a built-in test to zsh that is true iff that
argument is a text file.  No extra exec-s, but still clean.

> 
> >> An approximation might be (with extendedglob set):
> >>
> >> % ls **/*~*(${~${(j/|/)fignore}})(.)
> >>
> >> That is, all plain files that do not have extensions listed in `fignore'.
> >> You could change (.) to (.^*) to omit executables, but that would also
> >> omit most shell scripts.
> >>
> >> (Somebody tell me why the extra ${~...} is needed in that expression.)
> >
> > I'm fairly certain I'll never type such an incantation (how long did it
> > take to dream it up? :-) ).
> 
> Leave wizard's school now.  You don't have the potential.  That
> incantation is trivial compared to somethings I've done.  Remember,

But you apparently can't even attribute text in emails properly.

The point isn't whether I could figure out such a line, it's whether
being able to throw together nonsensical characters correctly after 8
attempts proves anything.  Yes, zsh can [almost] do it, but it's way
easier to just use find or a cmd-line filter that removes arguments that
aren't text files.

Greg


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-30 16:19       ` Greg Badros
@ 1997-09-30 18:56         ` Bart Schaefer
  1997-09-30 20:02           ` Greg Badros
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 1997-09-30 18:56 UTC (permalink / raw)
  To: Greg Badros; +Cc: zsh-users

On Sep 30,  9:19am, Greg Badros wrote:
} Subject: Re: is text file?
}
} ... you simply add a built-in test to zsh that is true iff that
} argument is a text file.

You could write this yourself as a module using 3.1.x.

Anyway, the following works in 3.0.5 (but not earlier because typeset -U
didn't work properly):

    text() {
      local ascii=(8 9 10 12 13 {32..126})
      local -U bytes
      for file
      do
	bytes=( $ascii $(od -An -td1 -N 256 $file) )
	[[ $#bytes -eq $#ascii ]] && echo $file
      done
    }

    grep foo $(text **/*(-.))

Increase or decrease 256 in the `od' command to get more or less accurate
guesses as to whether a file is text or not.  Adjust `ascii' if you want
to include a larger character set.  Add `[[ -f $file ]] || continue' if
you don't want to worry about the (-.) in the glob pattern.

This does assume GNU `od', I guess.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: is text file?
  1997-09-30 18:56         ` Bart Schaefer
@ 1997-09-30 20:02           ` Greg Badros
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Badros @ 1997-09-30 20:02 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Greg Badros, zsh-users

I use this in ~/zsh-fns/only-text-files:

# Filter out non-text-files from the argument list
# Usage: grep -c foo `only-text-files *`
file -f =(for i in "$@"; do print $i; done) |\
   awk -F: '{name = $1; $1 = ""; if ($0 ~ / text( |$)/) { print name }}'

I think it's important to use "file", as that is the system-supported way
of testing whether a file contains text.  The file man page also
strongly encourages the use of the string "text" in files which are
text, so it's pretty reliable to use the regexp (one possible exception
is postscript files).

Greg J. Badros
gjb@cs.washington.edu
Seattle, WA  USA
http://www.cs.washington.edu/homes/gjb


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~1997-09-30 20:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-09-28 22:31 is text file? Jose Unpingco
1997-09-29  0:06 ` Bart Schaefer
1997-09-29 16:25   ` Greg Badros
1997-09-29 17:44     ` Bart Schaefer
1997-09-29 21:28     ` TGAPE!
1997-09-30  3:53       ` Bart Schaefer
1997-09-30 16:19       ` Greg Badros
1997-09-30 18:56         ` Bart Schaefer
1997-09-30 20:02           ` Greg Badros

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).