[9fans] du and find

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] du and find
@ 2009-12-28 23:05 anonymous
  2009-12-28 23:09 ` lucio
                   ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: anonymous @ 2009-12-28 23:05 UTC (permalink / raw)
  To: 9fans

It is suggested to use
    du -a | awk '{print $2}'
instead of find. But what if filename contains spaces? For example if
file is named "foo bar" then awk will output "foo" only.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:05 [9fans] du and find anonymous
@ 2009-12-28 23:09 ` lucio
  2009-12-28 23:14 ` Steve Simon
  2009-12-29 17:59 ` Tim Newsham
  2 siblings, 0 replies; 51+ messages in thread
From: lucio @ 2009-12-28 23:09 UTC (permalink / raw)
  To: 9fans

>     du -a | awk '{print $2}'
du -a | awk '{$1=""; print}'

will be a good approximation...

++L




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:05 [9fans] du and find anonymous
  2009-12-28 23:09 ` lucio
@ 2009-12-28 23:14 ` Steve Simon
  2009-12-29 17:59 ` Tim Newsham
  2 siblings, 0 replies; 51+ messages in thread
From: Steve Simon @ 2009-12-28 23:14 UTC (permalink / raw)
  To: 9fans

> It is suggested to use
>    du -a | awk '{print $2}'
> instead of find. But what if filename contains spaces?

how about

	du -a | awk '{$1=""; print}'

This does print a leading space but is simple enough,
or perhaps

	du -a | while(s=`{read}) echo $s(2-)

which is more accurate but arguably more complex.

-Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:05 [9fans] du and find anonymous
  2009-12-28 23:09 ` lucio
  2009-12-28 23:14 ` Steve Simon
@ 2009-12-29 17:59 ` Tim Newsham
  2009-12-29 18:28   ` Don Bailey
                     ` (2 more replies)
  2 siblings, 3 replies; 51+ messages in thread
From: Tim Newsham @ 2009-12-29 17:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> It is suggested to use
>    du -a | awk '{print $2}'
> instead of find. But what if filename contains spaces? For example if
> file is named "foo bar" then awk will output "foo" only.

What about

    du -a | sed 's/^[0-9]*<tab>//g'

no loss on spaces in filenames.
no loss on tabs in filenames.

Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29 17:59 ` Tim Newsham
@ 2009-12-29 18:28   ` Don Bailey
  2009-12-29 20:16   ` Rob Pike
  2010-05-03 12:13   ` Mathieu Lonjaret
  2 siblings, 0 replies; 51+ messages in thread
From: Don Bailey @ 2009-12-29 18:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

Chicken dinner!

On Tue, Dec 29, 2009 at 10:59 AM, Tim Newsham <newsham@lava.net> wrote:

> It is suggested to use
>>   du -a | awk '{print $2}'
>> instead of find. But what if filename contains spaces? For example if
>> file is named "foo bar" then awk will output "foo" only.
>>
>
> What about
>
>   du -a | sed 's/^[0-9]*<tab>//g'
>
> no loss on spaces in filenames.
> no loss on tabs in filenames.
>
> Tim Newsham | www.thenewsh.com/~newsham<http://www.thenewsh.com/%7Enewsham>|
> thenewsh.blogspot.com
>
>

[-- Attachment #2: Type: text/html, Size: 1156 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29 17:59 ` Tim Newsham
  2009-12-29 18:28   ` Don Bailey
@ 2009-12-29 20:16   ` Rob Pike
  2009-12-30  7:44     ` anonymous
  2010-05-03 12:13   ` Mathieu Lonjaret
  2 siblings, 1 reply; 51+ messages in thread
From: Rob Pike @ 2009-12-29 20:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The 'g' is unnecessary.

-rob

On Wed, Dec 30, 2009 at 4:59 AM, Tim Newsham <newsham@lava.net> wrote:
>> It is suggested to use
>>   du -a | awk '{print $2}'
>> instead of find. But what if filename contains spaces? For example if
>> file is named "foo bar" then awk will output "foo" only.
>
> What about
>
>   du -a | sed 's/^[0-9]*<tab>//g'
>
> no loss on spaces in filenames.
> no loss on tabs in filenames.
>
> Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com
>
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29 20:16   ` Rob Pike
@ 2009-12-30  7:44     ` anonymous
  0 siblings, 0 replies; 51+ messages in thread
From: anonymous @ 2009-12-30  7:44 UTC (permalink / raw)
  To: 9fans

Ok, so it is better to use
du -a | sed 's/^.*	//'




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29 17:59 ` Tim Newsham
  2009-12-29 18:28   ` Don Bailey
  2009-12-29 20:16   ` Rob Pike
@ 2010-05-03 12:13   ` Mathieu Lonjaret
  2010-05-03 12:18     ` Akshat Kumar
  2010-05-03 14:03     ` erik quanstrom
  2 siblings, 2 replies; 51+ messages in thread
From: Mathieu Lonjaret @ 2010-05-03 12:13 UTC (permalink / raw)
  To: 9fans

Hello,

just because reviving old threads is fun...
I've just found out about this:

http://betterthangrep.com/

it does not seem to work out of the box (expecting some unix paths), but
since there's a perl port and that thing is supposed to be more or
less self contained (for the standalone version), maybe it's not too
much work for someone interested enough.

Cheers,
Mathieu

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 12:13   ` Mathieu Lonjaret
@ 2010-05-03 12:18     ` Akshat Kumar
  2010-05-03 12:26       ` Mathieu Lonjaret
  2010-05-03 13:17       ` Rudolf Sykora
  2010-05-03 14:03     ` erik quanstrom
  1 sibling, 2 replies; 51+ messages in thread
From: Akshat Kumar @ 2010-05-03 12:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>From the website:

"ack is written purely in Perl, and takes advantage
of the power of Perl's regular expressions."

Forgive my ignorance and irrelevance to this topic,
but what are the advantages of Perl's regular
expressions, over the implementation we have
currently in Plan 9?


Thanks,
ak

On Mon, May 3, 2010 at 5:13 AM, Mathieu Lonjaret
<mathieu.lonjaret@gmail.com> wrote:
> Hello,
>
> just because reviving old threads is fun...
> I've just found out about this:
>
> http://betterthangrep.com/
>
> it does not seem to work out of the box (expecting some unix paths), but
> since there's a perl port and that thing is supposed to be more or
> less self contained (for the standalone version), maybe it's not too
> much work for someone interested enough.
>
> Cheers,
> Mathieu
>
>
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 12:18     ` Akshat Kumar
@ 2010-05-03 12:26       ` Mathieu Lonjaret
  2010-05-03 12:49         ` tlaronde
  2010-05-03 13:10         ` Ethan Grammatikidis
  2010-05-03 13:17       ` Rudolf Sykora
  1 sibling, 2 replies; 51+ messages in thread
From: Mathieu Lonjaret @ 2010-05-03 12:26 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 215 bytes --]

No idea, probably none.

that would not be the interesting point, if any.  it's just that the
tool is already there and (should be) simpler to use than piping
various commands around, as they illustrate below.

[-- Attachment #2: Type: message/rfc822, Size: 5495 bytes --]

From: Akshat Kumar <akumar@mail.nanosouffle.net>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] du and find
Date: Mon, 3 May 2010 05:18:56 -0700
Message-ID: <t2nfe41879c1005030518l8ea8cbd1u78d15fe07c54006@mail.gmail.com>

>From the website:

"ack is written purely in Perl, and takes advantage
of the power of Perl's regular expressions."

Forgive my ignorance and irrelevance to this topic,
but what are the advantages of Perl's regular
expressions, over the implementation we have
currently in Plan 9?


Thanks,
ak

On Mon, May 3, 2010 at 5:13 AM, Mathieu Lonjaret
<mathieu.lonjaret@gmail.com> wrote:
> Hello,
>
> just because reviving old threads is fun...
> I've just found out about this:
>
> http://betterthangrep.com/
>
> it does not seem to work out of the box (expecting some unix paths), but
> since there's a perl port and that thing is supposed to be more or
> less self contained (for the standalone version), maybe it's not too
> much work for someone interested enough.
>
> Cheers,
> Mathieu
>
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 12:26       ` Mathieu Lonjaret
@ 2010-05-03 12:49         ` tlaronde
  2010-05-03 13:10         ` Ethan Grammatikidis
  1 sibling, 0 replies; 51+ messages in thread
From: tlaronde @ 2010-05-03 12:49 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, May 03, 2010 at 02:26:07PM +0200, Mathieu Lonjaret wrote:
> No idea, probably none.
>
> that would not be the interesting point, if any.  it's just that the
> tool is already there and (should be) simpler to use than piping
> various commands around, as they illustrate below.

> Date: Mon, 3 May 2010 05:18:56 -0700
> From: Akshat Kumar <akumar@mail.nanosouffle.net>
> Subject: Re: [9fans] du and find
> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
>
> >From the website:
>
> "ack is written purely in Perl, and takes advantage
> of the power of Perl's regular expressions."
>
> Forgive my ignorance and irrelevance to this topic,
> but what are the advantages of Perl's regular
> expressions, over the implementation we have
> currently in Plan 9?

I had in fact the answer, a long time ago: because they simply do not
know that ed(1) exists, and sed(1) etc.

A group, providing an ISDN router based on Debian, was requiring a lot
of memory and disk space. I asked why??? that much for _that_?!! The
answer: we need perl(1) installed. But what for? Answer: to replace
@@GATEWAY@@ and so on by customized values in a file... (They didn't
even thought of building a distribution on a vulcan to simply install on
the target.)

They didn't know about ed(1). So I tell them regexp were ed(1); and
ed(1) was required by POSIX.2. And tried to make the demonstration... to
see that Debian didn't provide ed(1) by default. I ask Debian, why the
f...?!!? Answer: GNU's Not Unix...

And this day I realized I was not GNU. And switch to *BSD before asking
myself some questions that lead me to Plan9...
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 12:26       ` Mathieu Lonjaret
  2010-05-03 12:49         ` tlaronde
@ 2010-05-03 13:10         ` Ethan Grammatikidis
  2010-05-03 13:41           ` Steve Simon
  1 sibling, 1 reply; 51+ messages in thread
From: Ethan Grammatikidis @ 2010-05-03 13:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On 3 May 2010, at 13:26, Mathieu Lonjaret wrote:

> No idea, probably none.
>
> that would not be the interesting point, if any.  it's just that the
> tool is already there and (should be) simpler to use than piping
> various commands around, as they illustrate below.

Ack looks cute, but I think a fairly simple shell script could do all
of what ack does without requiring perl. I imagine it would be faster
still by not using Perl's broken regular expressions, and bear in mind
on Plan 9 you'd probably want to make a wrapper for grep anyway if you
do a lot of recursive searching.

--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 12:18     ` Akshat Kumar
  2010-05-03 12:26       ` Mathieu Lonjaret
@ 2010-05-03 13:17       ` Rudolf Sykora
  2010-05-03 14:53         ` erik quanstrom
  1 sibling, 1 reply; 51+ messages in thread
From: Rudolf Sykora @ 2010-05-03 13:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 3 May 2010 14:18, Akshat Kumar <akumar@mail.nanosouffle.net> wrote:
> Forgive my ignorance and irrelevance to this topic,
> but what are the advantages of Perl's regular
> expressions, over the implementation we have
> currently in Plan 9?

Regexps in Plan9 are on one hand much less powerful than Perl's, on
the other hand they are (thanks to their simplicity) much quicker.
Often one doesn't need Perl's power and in such a case Plan9's regexps
are better. But in sometimes...

Just compare:
http://www.amk.ca/python/howto/regex/
to
regexp(7)

... particularly e.g. Lookahead Assertions, Non-capturing and Named Groups.

It's always been easier for me to use python's/perl's regular
expressions when I needed to process a text file than to use plan9's.
For simple things, e.g. while editing an ordinary text in acme/sam,
plan9's regexps are just fine.

Also read Russ Cox text:
http://swtch.com/~rsc/regexp/regexp1.html

Ruda

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 13:10         ` Ethan Grammatikidis
@ 2010-05-03 13:41           ` Steve Simon
  2010-05-03 15:18             ` Ethan Grammatikidis
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Simon @ 2010-05-03 13:41 UTC (permalink / raw)
  To: 9fans

> on Plan 9 you'd probably want to make a wrapper for grep anyway if you
> do a lot of recursive searching.

Or just apply runs grep -r patch...

-Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 12:13   ` Mathieu Lonjaret
  2010-05-03 12:18     ` Akshat Kumar
@ 2010-05-03 14:03     ` erik quanstrom
  1 sibling, 0 replies; 51+ messages in thread
From: erik quanstrom @ 2010-05-03 14:03 UTC (permalink / raw)
  To: 9fans

> http://betterthangrep.com/
>
> it does not seem to work out of the box (expecting some unix paths), but
> since there's a perl port and that thing is supposed to be more or
> less self contained (for the standalone version), maybe it's not too
> much work for someone interested enough.

don't be silly.  russ wrote something like this in pure sh(1) for p9p,
g.  i reimplemented it in pure rc(1) and added gh (grep in headers)
and gf (grep function).  since g is an engine, you can add other
specialized search functions.

contrib quanstro/g

- erik



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 13:17       ` Rudolf Sykora
@ 2010-05-03 14:53         ` erik quanstrom
  2010-05-03 18:34           ` Jorden M
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2010-05-03 14:53 UTC (permalink / raw)
  To: 9fans

> It's always been easier for me to use python's/perl's regular
> expressions when I needed to process a text file than to use plan9's.
> For simple things, e.g. while editing an ordinary text in acme/sam,
> plan9's regexps are just fine.

i find it hard to think of cases where i would need
such sophistication and where tokenization or
tokenization plus parsing wouldn't be a better idea.

for example, you could write a re to parse the output
of ls -l and or ps.  but awk '{print $field}' is so much
easier to write and read.

so in all, i view perl "regular" expressions as a tough sell.
i think they're harder to write, harder to read, require more
and more unstable code, and slower.

one could speculate that perl, by encouraging a
monolithic, rather than tools-based approach;
and cleverness over clarity made perl expressions
the logical next step.  if so, i question the assumptions.

- erik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 13:41           ` Steve Simon
@ 2010-05-03 15:18             ` Ethan Grammatikidis
  2010-05-03 15:29               ` jake
  2010-05-03 15:37               ` Steve Simon
  0 siblings, 2 replies; 51+ messages in thread
From: Ethan Grammatikidis @ 2010-05-03 15:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On 3 May 2010, at 14:41, Steve Simon wrote:

>> on Plan 9 you'd probably want to make a wrapper for grep anyway if
>> you
>> do a lot of recursive searching.
>
> Or just apply runs grep -r patch...

% man 1 grep | grep '\-r'
%

>
> -Steve
>

--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 15:18             ` Ethan Grammatikidis
@ 2010-05-03 15:29               ` jake
  2010-05-03 15:46                 ` Ethan Grammatikidis
  2010-05-03 15:37               ` Steve Simon
  1 sibling, 1 reply; 51+ messages in thread
From: jake @ 2010-05-03 15:29 UTC (permalink / raw)
  To: 9fans

> On 3 May 2010, at 14:41, Steve Simon wrote:
>> Or just apply runs grep -r patch...
>
> % man 1 grep | grep '\-r'
> %
>
Key word being patch.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 15:18             ` Ethan Grammatikidis
  2010-05-03 15:29               ` jake
@ 2010-05-03 15:37               ` Steve Simon
  1 sibling, 0 replies; 51+ messages in thread
From: Steve Simon @ 2010-05-03 15:37 UTC (permalink / raw)
  To: 9fans

> > Or just apply runs grep -r patch...
> % man 1 grep | grep '\-r'

s/runs/ron's/

see 9fans passim for the patch.

-Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 15:29               ` jake
@ 2010-05-03 15:46                 ` Ethan Grammatikidis
  0 siblings, 0 replies; 51+ messages in thread
From: Ethan Grammatikidis @ 2010-05-03 15:46 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On 3 May 2010, at 16:29, jake@9srv.net wrote:

>> On 3 May 2010, at 14:41, Steve Simon wrote:
>>> Or just apply runs grep -r patch...
>>
>> % man 1 grep | grep '\-r'
>> %
>>
> Key word being patch.

Oh right! Well, if the point of this thread is to talk about something
better than grep -r in the first place...

eh, whateva.

--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 14:53         ` erik quanstrom
@ 2010-05-03 18:34           ` Jorden M
  2010-05-04 10:01             ` Ethan Grammatikidis
  0 siblings, 1 reply; 51+ messages in thread
From: Jorden M @ 2010-05-03 18:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, May 3, 2010 at 10:53 AM, erik quanstrom <quanstro@quanstro.net> wrote:
>> It's always been easier for me to use python's/perl's regular
>> expressions when I needed to process a text file than to use plan9's.
>> For simple things, e.g. while editing an ordinary text in acme/sam,
>> plan9's regexps are just fine.
>
> i find it hard to think of cases where i would need
> such sophistication and where tokenization or
> tokenization plus parsing wouldn't be a better idea.

A lot of the `sophisticated' Perl I've seen uses some horrible regexes
when really the job would have been done better and faster by a
simple, job-specific parser.

I've yet to find out why this happens so much, but I think I can
narrow it to a combination of ignorance, laziness, and perhaps that
all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
I guess by the time you've written half a parser in line noise, it's
too late to quit while you're behind.

>
> for example, you could write a re to parse the output
> of ls -l and or ps.  but awk '{print $field}' is so much
> easier to write and read.
>
> so in all, i view perl "regular" expressions as a tough sell.
> i think they're harder to write, harder to read, require more
> and more unstable code, and slower.
>
> one could speculate that perl, by encouraging a
> monolithic, rather than tools-based approach;
> and cleverness over clarity made perl expressions
> the logical next step.  if so, i question the assumptions.
>
> - erik
>
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-03 18:34           ` Jorden M
@ 2010-05-04 10:01             ` Ethan Grammatikidis
  2010-05-04 10:29               ` Robert Raschke
  2010-05-04 15:38               ` Jorden M
  0 siblings, 2 replies; 51+ messages in thread
From: Ethan Grammatikidis @ 2010-05-04 10:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 3 May 2010, at 19:34, Jorden M wrote:

> On Mon, May 3, 2010 at 10:53 AM, erik quanstrom
> <quanstro@quanstro.net> wrote:
>>> It's always been easier for me to use python's/perl's regular
>>> expressions when I needed to process a text file than to use
>>> plan9's.
>>> For simple things, e.g. while editing an ordinary text in acme/sam,
>>> plan9's regexps are just fine.
>>
>> i find it hard to think of cases where i would need
>> such sophistication and where tokenization or
>> tokenization plus parsing wouldn't be a better idea.
>
> A lot of the `sophisticated' Perl I've seen uses some horrible regexes
> when really the job would have been done better and faster by a
> simple, job-specific parser.
>
> I've yet to find out why this happens so much, but I think I can
> narrow it to a combination of ignorance, laziness, and perhaps that
> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
> I guess by the time you've written half a parser in line noise, it's
> too late to quit while you're behind.

I think it's ignorance and something. I'm not sure what that something
is. I am sure if you tried to suggest writing a parser to many of the
open-sourcers I've talked to you would be treated as if you were
suggesting a big job rather than a small one. "Why Write a Parser,"
they would ask, "when I can just scribble a few little lines of perl?"

Maybe it's humans' natural tendencies toward hierarchy coming into
play. Stuff known by Teachers and Masters easily takes on a bizarre
kind of importance, rank is unconsciously attached, and the student
naturally but unconsciously feels he is not of sufficient rank to
attempt the Master's Way. That explanation does pre-suppose humans
have a very strong natural tendency to hierarchy. I find sufficient
evidence within myself to believe it's true, as unpopular as the idea
may be. Perhaps some people are more strongly inclined that way than
others. Anyway, it's the only explanation I can imagine for the
phenomena.

>
>>
>> for example, you could write a re to parse the output
>> of ls -l and or ps.  but awk '{print $field}' is so much
>> easier to write and read.
>>
>> so in all, i view perl "regular" expressions as a tough sell.
>> i think they're harder to write, harder to read, require more
>> and more unstable code, and slower.
>>
>> one could speculate that perl, by encouraging a
>> monolithic, rather than tools-based approach;
>> and cleverness over clarity made perl expressions
>> the logical next step.  if so, i question the assumptions.
>>
>> - erik
>>
>>
>

--
Simplicity does not precede complexity, but follows it. -- Alan Perlis

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-04 10:01             ` Ethan Grammatikidis
@ 2010-05-04 10:29               ` Robert Raschke
  2010-05-04 15:38               ` Jorden M
  1 sibling, 0 replies; 51+ messages in thread
From: Robert Raschke @ 2010-05-04 10:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1617 bytes --]

On Tue, May 4, 2010 at 11:01 AM, Ethan Grammatikidis <eekee57@fastmail.fm>wrote:

> On 3 May 2010, at 19:34, Jorden M wrote:
>
>> I've yet to find out why this happens so much, but I think I can
>> narrow it to a combination of ignorance, laziness, and perhaps that
>> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
>> I guess by the time you've written half a parser in line noise, it's
>> too late to quit while you're behind.
>>
>
> I think it's ignorance and something. I'm not sure what that something is.
> I am sure if you tried to suggest writing a parser to many of the
> open-sourcers I've talked to you would be treated as if you were suggesting
> a big job rather than a small one. "Why Write a Parser,"  they would ask,
> "when I can just scribble a few little lines of perl?"
>
>
I'd think it's simply not knowing that there are easier ways of doing it. It
is just not taught. Also, people learn about parsers in that really scary
module about compilers and never give them a second thought afterwards. And
anything else to do with strings is usually hopelessly complicated stuff
involving indices into character arrays.

Then there's the "kudos" of writing write-only code. Even the writer doesn't
understand it anymore, but nobody else knows that, so ...

I always found it a wee bit sad that Icon (http://www.cs.arizona.edu/icon/)
never really had much of an impact in the "let's take this string apart"
problem domain. If I need something quick and dirty, it's my "secret" tool
for "parsing" stuff quickly. String scanning is trivial.

Robby

[-- Attachment #2: Type: text/html, Size: 2226 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-04 10:01             ` Ethan Grammatikidis
  2010-05-04 10:29               ` Robert Raschke
@ 2010-05-04 15:38               ` Jorden M
  2010-05-04 16:56                 ` Gabriel Díaz
  1 sibling, 1 reply; 51+ messages in thread
From: Jorden M @ 2010-05-04 15:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, May 4, 2010 at 6:01 AM, Ethan Grammatikidis <eekee57@fastmail.fm> wrote:
>
> On 3 May 2010, at 19:34, Jorden M wrote:
>
>> On Mon, May 3, 2010 at 10:53 AM, erik quanstrom <quanstro@quanstro.net>
>> wrote:
>>>>
>>>> It's always been easier for me to use python's/perl's regular
>>>> expressions when I needed to process a text file than to use plan9's.
>>>> For simple things, e.g. while editing an ordinary text in acme/sam,
>>>> plan9's regexps are just fine.
>>>
>>> i find it hard to think of cases where i would need
>>> such sophistication and where tokenization or
>>> tokenization plus parsing wouldn't be a better idea.
>>
>> A lot of the `sophisticated' Perl I've seen uses some horrible regexes
>> when really the job would have been done better and faster by a
>> simple, job-specific parser.
>>
>> I've yet to find out why this happens so much, but I think I can
>> narrow it to a combination of ignorance, laziness, and perhaps that
>> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
>> I guess by the time you've written half a parser in line noise, it's
>> too late to quit while you're behind.
>
> I think it's ignorance and something. I'm not sure what that something is. I
> am sure if you tried to suggest writing a parser to many of the
> open-sourcers I've talked to you would be treated as if you were suggesting

I can attest that it's not just open-source folk.

> a big job rather than a small one. "Why Write a Parser,"  they would ask,
> "when I can just scribble a few little lines of perl?"

That phenomenon is true, and if you take it further once that person
is done writing their abominable perl, and point out that they've
written a parser anyway, but poorly (not to mention one that would
have to be totally rewritten to be modified), they look at you
crosseyed and say `whatever.'

>
> Maybe it's humans' natural tendencies toward hierarchy coming into play.
> Stuff known by Teachers and Masters easily takes on a bizarre kind of
> importance, rank is unconsciously attached, and the student naturally but
> unconsciously feels he is not of sufficient rank to attempt the Master's
> Way. That explanation does pre-suppose humans have a very strong natural
> tendency to hierarchy. I find sufficient evidence within myself to believe
> it's true, as unpopular as the idea may be. Perhaps some people are more
> strongly inclined that way than others. Anyway, it's the only explanation I
> can imagine for the phenomena.
>

Pretty much. Like Raschke mentioned, people as students are
conditioned to think that parsers are hard to do because they're a
piece of a compiler, and that Dragon book is too big and scary and
only Gods can write compilers and parsers, etc.. Another function of
the `parsers are too hard' mentality is that people don't recognize
the difference between something that's regular and something that's
CF, and spend days scratching their head wondering why their regexes
break all over the place. Situations often become complicated when
self-proclaimed perl experts drop in and go, `oh here, you just add
this case and that case and you should be fine X% of the time!', where
X is a BS figure pulled out of you know where.

I think what we have here can be construed as a failure of CS
education, which fits right in with the many failures of education at
large.

>>
>>>
>>> for example, you could write a re to parse the output
>>> of ls -l and or ps.  but awk '{print $field}' is so much
>>> easier to write and read.
>>>
>>> so in all, i view perl "regular" expressions as a tough sell.
>>> i think they're harder to write, harder to read, require more
>>> and more unstable code, and slower.
>>>
>>> one could speculate that perl, by encouraging a
>>> monolithic, rather than tools-based approach;
>>> and cleverness over clarity made perl expressions
>>> the logical next step.  if so, i question the assumptions.
>>>
>>> - erik
>>>
>>>
>>
>
> --
> Simplicity does not precede complexity, but follows it. -- Alan Perlis
>
>
>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-04 15:38               ` Jorden M
@ 2010-05-04 16:56                 ` Gabriel Díaz
  2010-05-04 18:39                   ` Karljurgen Feuerherm
  0 siblings, 1 reply; 51+ messages in thread
From: Gabriel Díaz @ 2010-05-04 16:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello

(about students/trainees and perl)

Being able to recognize what you've studied in your daily work is quite difficult in most places. Also your work objectives are rarely related to the correctness, in the sense of science. I mean something correct or well enough for the business could not be correct or well enough from the science point of view.

Speaking about non programming-related business, for me, it's enough if a student is able to use or ask for a programming language to solve a task perl, vbscript or whatever. I've seen a couple of times students matching two lists of thousands of entries by hand, either in paper or in the original excel format. And I've seen mentors and managers agree with the method. If they can write regexp, even ugly ones, that's enough, you can show them alternatives, suggest other ways, etc.

The fail is not the school, or not completely. The tools are given to you, it is not usual you can choose the tool you want to use to finish a task. In nice places, you might be able to propose one. . .

slds.

gabi

----- Original Message ----
From: Jorden M <jrm8005@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Sent: Tue, May 4, 2010 5:38:35 PM
Subject: Re: [9fans] du and find

On Tue, May 4, 2010 at 6:01 AM, Ethan Grammatikidis <eekee57@fastmail.fm> wrote:
>
> On 3 May 2010, at 19:34, Jorden M wrote:
>
>> On Mon, May 3, 2010 at 10:53 AM, erik quanstrom <quanstro@quanstro.net>
>> wrote:
>>>>
>>>> It's always been easier for me to use python's/perl's regular
>>>> expressions when I needed to process a text file than to use plan9's.
>>>> For simple things, e.g. while editing an ordinary text in acme/sam,
>>>> plan9's regexps are just fine.
>>>
>>> i find it hard to think of cases where i would need
>>> such sophistication and where tokenization or
>>> tokenization plus parsing wouldn't be a better idea.
>>
>> A lot of the `sophisticated' Perl I've seen uses some horrible regexes
>> when really the job would have been done better and faster by a
>> simple, job-specific parser.
>>
>> I've yet to find out why this happens so much, but I think I can
>> narrow it to a combination of ignorance, laziness, and perhaps that
>> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
>> I guess by the time you've written half a parser in line noise, it's
>> too late to quit while you're behind.
>
> I think it's ignorance and something. I'm not sure what that something is. I
> am sure if you tried to suggest writing a parser to many of the
> open-sourcers I've talked to you would be treated as if you were suggesting

I can attest that it's not just open-source folk.

> a big job rather than a small one. "Why Write a Parser,"  they would ask,
> "when I can just scribble a few little lines of perl?"

That phenomenon is true, and if you take it further once that person
is done writing their abominable perl, and point out that they've
written a parser anyway, but poorly (not to mention one that would
have to be totally rewritten to be modified), they look at you
crosseyed and say `whatever.'

>
> Maybe it's humans' natural tendencies toward hierarchy coming into play.
> Stuff known by Teachers and Masters easily takes on a bizarre kind of
> importance, rank is unconsciously attached, and the student naturally but
> unconsciously feels he is not of sufficient rank to attempt the Master's
> Way. That explanation does pre-suppose humans have a very strong natural
> tendency to hierarchy. I find sufficient evidence within myself to believe
> it's true, as unpopular as the idea may be. Perhaps some people are more
> strongly inclined that way than others. Anyway, it's the only explanation I
> can imagine for the phenomena.
>

Pretty much. Like Raschke mentioned, people as students are
conditioned to think that parsers are hard to do because they're a
piece of a compiler, and that Dragon book is too big and scary and
only Gods can write compilers and parsers, etc.. Another function of
the `parsers are too hard' mentality is that people don't recognize
the difference between something that's regular and something that's
CF, and spend days scratching their head wondering why their regexes
break all over the place. Situations often become complicated when
self-proclaimed perl experts drop in and go, `oh here, you just add
this case and that case and you should be fine X% of the time!', where
X is a BS figure pulled out of you know where.

I think what we have here can be construed as a failure of CS
education, which fits right in with the many failures of education at
large.

>>
>>>
>>> for example, you could write a re to parse the output
>>> of ls -l and or ps.  but awk '{print $field}' is so much
>>> easier to write and read.
>>>
>>> so in all, i view perl "regular" expressions as a tough sell.
>>> i think they're harder to write, harder to read, require more
>>> and more unstable code, and slower.
>>>
>>> one could speculate that perl, by encouraging a
>>> monolithic, rather than tools-based approach;
>>> and cleverness over clarity made perl expressions
>>> the logical next step.  if so, i question the assumptions.
>>>
>>> - erik
>>>
>>>
>>
>
> --
> Simplicity does not precede complexity, but follows it. -- Alan Perlis
>
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-05-04 16:56                 ` Gabriel Díaz
@ 2010-05-04 18:39                   ` Karljurgen Feuerherm
  0 siblings, 0 replies; 51+ messages in thread
From: Karljurgen Feuerherm @ 2010-05-04 18:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

My impression as an undergraduate in CS was that most of my peers were
mechanics, rather than artists. They could ape things, but only few
could see past what was shown and apply the principles abstractly.

This may have to do with failure in the earlier education--I remember
that again, peers could do 'picture frame problems' but without any real
comprehension of the actual algebra.

On the other hand, it may just be a question of what human beings are,
and how few artists there are, proportionately speaking....

K

>>> Gabriel Díaz <gdiaz@rejaa.com> 04/05/2010 12:56 pm >>>
Hello

(about students/trainees and perl)

Being able to recognize what you've studied in your daily work is quite
difficult in most places. Also your work objectives are rarely related
to the correctness, in the sense of science. I mean something correct or
well enough for the business could not be correct or well enough from
the science point of view.

Speaking about non programming-related business, for me, it's enough if
a student is able to use or ask for a programming language to solve a
task perl, vbscript or whatever. I've seen a couple of times students
matching two lists of thousands of entries by hand, either in paper or
in the original excel format. And I've seen mentors and managers agree
with the method. If they can write regexp, even ugly ones, that's
enough, you can show them alternatives, suggest other ways, etc.

The fail is not the school, or not completely. The tools are given to
you, it is not usual you can choose the tool you want to use to finish a
task. In nice places, you might be able to propose one. . .

slds.

gabi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-03  2:40         ` erik quanstrom
@ 2010-01-06 20:44           ` Akshat Kumar
  0 siblings, 0 replies; 51+ messages in thread
From: Akshat Kumar @ 2010-01-06 20:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 190 bytes --]

>> Given the way Unix programs
>> behave you can't replace arg list with an arg fd (I used to
>
> didn't know this was "unixfans".  will keep that in mind.
>
> - erik

Jules says...

[-- Attachment #2: topic.jpg --]
[-- Type: image/jpeg, Size: 66069 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-03  2:31       ` Bakul Shah
@ 2010-01-03  2:40         ` erik quanstrom
  2010-01-06 20:44           ` Akshat Kumar
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2010-01-03  2:40 UTC (permalink / raw)
  To: 9fans

> Given the way Unix programs
> behave you can't replace arg list with an arg fd (I used to

didn't know this was "unixfans".  will keep that in mind.

- erik



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-03  1:49     ` erik quanstrom
@ 2010-01-03  2:31       ` Bakul Shah
  2010-01-03  2:40         ` erik quanstrom
  0 siblings, 1 reply; 51+ messages in thread
From: Bakul Shah @ 2010-01-03  2:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 02 Jan 2010 20:49:39 EST erik quanstrom <quanstro@quanstro.net>  wrote:
> > And can eat up a lot of memory or even run out of it.  On a
> > 2+ year old MacBookPro "find -x /" takes 4.5 minutes for 1.6M
> > files and 155MB to hold paths.  My 11 old machine has 64MB
> > and over a million files on a rather slow disk. Your solution
> > would run out of space on it.
>
> modern cat wouldn't fit in core on the early pdps unix was
> developed on!

No point in gratuitously obsoleting old machines.  I am
running FreeBSD-7.2 on it my 11yo machine and so far it has
stood up well enough.

> just to be fair, could you fit your 1.6m files on your 11yu machine?
> i'm guessing you couldn't.

Yes. It's on its third disk. A 6yo 80G IDE disk.

> > Basically this is just streams programming for arguments
> > instead of data.
>
> that's fine.  but it's no excuse to hobble exec.  not unless
> you're prepared to be replace argument lists with an argument
> fd.

Not sure how exec is hobbled.  Given the way Unix programs
behave you can't replace arg list with an arg fd (I used to
carry around a libary to do just that but the problem is all
the standard programs). Anyway, I don't see how xargs can be
gotten rid of.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02 18:43   ` roger peppe
@ 2010-01-03  2:28     ` Anthony Sorace
  0 siblings, 0 replies; 51+ messages in thread
From: Anthony Sorace @ 2010-01-03  2:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Rog said:

> that's why breadth-first might be useful, by putting
> shallower files earlier in the search results - i often
> do grep foo *.[ch] */*.[ch] */*/*.[ch] to achieve
> a similar result, but you have to guess the depth that way.

for what it's worth, dan's walk.c has a -d option for limiting search
depth. it's not breadth-first, but is still nice.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02 23:21   ` Bakul Shah
@ 2010-01-03  1:49     ` erik quanstrom
  2010-01-03  2:31       ` Bakul Shah
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2010-01-03  1:49 UTC (permalink / raw)
  To: 9fans

> And can eat up a lot of memory or even run out of it.  On a
> 2+ year old MacBookPro "find -x /" takes 4.5 minutes for 1.6M
> files and 155MB to hold paths.  My 11 old machine has 64MB
> and over a million files on a rather slow disk. Your solution
> would run out of space on it.

modern cat wouldn't fit in core on the early pdps unix was
developed on!

just to be fair, could you fit your 1.6m files on your 11yu machine?
i'm guessing you couldn't.

> Basically this is just streams programming for arguments
> instead of data.

that's fine.  but it's no excuse to hobble exec.  not unless
you're prepared to be replace argument lists with an argument
fd.

- erik



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02 19:47 ` erik quanstrom
@ 2010-01-02 23:21   ` Bakul Shah
  2010-01-03  1:49     ` erik quanstrom
  0 siblings, 1 reply; 51+ messages in thread
From: Bakul Shah @ 2010-01-02 23:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 02 Jan 2010 14:47:26 EST erik quanstrom <quanstro@quanstro.net>  wrote:
>
> my beef with xargs is only that it is used as an excuse
> for not fixing exec in unix.  it's also used to bolster the
> "that's a rare case" argument.

I often do something like the following:

  find . -type f <condition> | xargs grep -l <pattern> | xargs <command>

If by "fixing exec in unix" you mean allowing something like

  <command> $(grep -l <pattern> $(find . -type f <condition>))

then <command> would take far too long to even get started.
And can eat up a lot of memory or even run out of it.  On a
2+ year old MacBookPro "find -x /" takes 4.5 minutes for 1.6M
files and 155MB to hold paths.  My 11 old machine has 64MB
and over a million files on a rather slow disk. Your solution
would run out of space on it.  Now granted I should update it
to a more balanced system but mechanisms should continue
working even if one doesn't have an optimal system.  At least
xargs gives me that choice.

Basically this is just streams programming for arguments
instead of data. Ideally all the args would be taken from a
stream (and specifying args on a command line would be just a
convenience) but it is too late for that.  Often unix
commands have a -r option to walk a file tree but it would've
been nicer to have the tree walk factored out. Then you can
do things like breadth first walk etc. and have everyone
benefit.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<df49a7371001021043p2a990207od65457a068b7828@mail.gmail.com>
@ 2010-01-02 19:47 ` erik quanstrom
  2010-01-02 23:21   ` Bakul Shah
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2010-01-02 19:47 UTC (permalink / raw)
  To: 9fans

> i'm not saying it can't be passed in an argument list, just that
> xargs gives you a lazy evaluation of the walk
> of the file tree which can result in a faster result
> when the result is found earlier in the file list.

i have no problem with breadth-first.

my beef with xargs is only that it is used as an excuse
for not fixing exec in unix.  it's also used to bolster the
"that's a rare case" argument.

imho "rare case" arguments work best if the downside
is that the "rare case" is slow or awkward.  in this case
it's broken.

> that's why breadth-first might be useful, by putting
> shallower files earlier in the search results - i often
> do grep foo *.[ch] */*.[ch] */*/*.[ch] to achieve
> a similar result, but you have to guess the depth that way.

clearly i'm not in your league.  my source trees are
smaller than that.  no more than two levels.  or, it
doesn't matter.  on a few machines laying around
the house (details on the poorly-chosen disks here:
http://www.quanstro.net/plan9/fs.html)

i7 2666		0.36u 0.42s 7.35r
Atom 1605	0.83u 1.88s 8.85r
AMD64 2007	0.94u 0.97s 12.51r

and on the fast 10gbe stuff at coraid
Xeon5000 1865 	0.45u 0.83s 4.33r	10gbe myricom
PIV/Xeon 3003 	0.66u 1.41s 7.32r	i82573

(it would be fun to put the i7 with 10gbe
together!)

it's easy to try the completely uncached case at
coraid since the working set is about 7gb and the
cache is only 3.5

Xeon5000 1874 	0.50u 0.84s 27.63r

- erik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02  2:02 ` erik quanstrom
  2010-01-02  5:29   ` anonymous
@ 2010-01-02 18:43   ` roger peppe
  2010-01-03  2:28     ` Anthony Sorace
  1 sibling, 1 reply; 51+ messages in thread
From: roger peppe @ 2010-01-02 18:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2010/1/2 erik quanstrom <quanstro@quanstro.net>:
>> and /sys/src isn't by any means the largest tree i like to grep
>> (for instance, searching for lost files with a name i longer remember,
>> i've been known to search through all the files in my home directory,
>> ~425000 files at last count)
>>
>> sometimes i think it would be nice if du had a breadth-first option.
>
> aren't you contridicting yourself?  at 128 characters/file,
> that's only 52mb -- 2% of memory on a typical system these days.
> why can't it be passed as an argument list?

i'm not saying it can't be passed in an argument list, just that
xargs gives you a lazy evaluation of the walk
of the file tree which can result in a faster result
when the result is found earlier in the file list.

that's why breadth-first might be useful, by putting
shallower files earlier in the search results - i often
do grep foo *.[ch] */*.[ch] */*/*.[ch] to achieve
a similar result, but you have to guess the depth that way.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02 17:05 ` erik quanstrom
@ 2010-01-02 18:18   ` anonymous
  0 siblings, 0 replies; 51+ messages in thread
From: anonymous @ 2010-01-02 18:18 UTC (permalink / raw)
  To: 9fans

Yes, you are right. I have forgot about cache. But probably cache is
the reason why du -a takes 25s?




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<20100102052943.GA9871@machine>
@ 2010-01-02 17:05 ` erik quanstrom
  2010-01-02 18:18   ` anonymous
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2010-01-02 17:05 UTC (permalink / raw)
  To: 9fans

> On Fri, Jan 01, 2010 at 09:02:28PM -0500, erik quanstrom wrote:
> > > you've got a fast system.
> > > in at least one system i use, du -a of /sys/src takes about 25s.
> >
> > i have a humble 2y.o. single-core 35w celeron as a fileserver.
> >
> Speed of `du' depends on I/O, not CPU.

really?  have you tested this?  i've always found the
two to be related.

first, most fileservers have an in-memory block cache.
unless your active set is really big, most directories should
be in the in-memory cache.

when in memory cache, the time it takes to acquire
block cache locks and copy data dominates.  for
fossil+venti this factor is multiplied by 2 plus 2 trips
through the kernel.  so for memory-cached blocks,
fileserver speed is entirely dependent on network+
cpu.  the proportion of memory cache is of course
proportional to one's cache sizes.

sure, you can take this to extremes where the size of
the memory cache is so small, that the memory cache
doesn't matter, or the speed of the network is so slow
(find /n/sources) that nothing other than disk io or
network speed matters.

second, each directory read requires a number of 9p messages.
in the current system, each incurs the full rtt penality.
so the network latency is a really big factor in du
performance.

you can test to see how the du speed is related to
network performance very easily if you have the right
sort of network card.  just adjust the interrupt coalesing
values.  (Tidv/Tadv in ether82563, or
echo coal $µs>/net/ether$n/clone for etherm10g)

and the cool thing is that especially for tcp, i've
found cpu speed and network latency to be pretty
imporant.

- erik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02  2:02 ` erik quanstrom
@ 2010-01-02  5:29   ` anonymous
  2010-01-02 18:43   ` roger peppe
  1 sibling, 0 replies; 51+ messages in thread
From: anonymous @ 2010-01-02  5:29 UTC (permalink / raw)
  To: 9fans

On Fri, Jan 01, 2010 at 09:02:28PM -0500, erik quanstrom wrote:
> > you've got a fast system.
> > in at least one system i use, du -a of /sys/src takes about 25s.
>
> i have a humble 2y.o. single-core 35w celeron as a fileserver.
>
Speed of `du' depends on I/O, not CPU.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<df49a7371001011744o6687fd59l451d690ea56edea5@mail.gmail.com>
@ 2010-01-02  2:02 ` erik quanstrom
  2010-01-02  5:29   ` anonymous
  2010-01-02 18:43   ` roger peppe
  0 siblings, 2 replies; 51+ messages in thread
From: erik quanstrom @ 2010-01-02  2:02 UTC (permalink / raw)
  To: 9fans

> because the limit is big enough that cases that break the
> limit almost never happen except in this case?

we can easily fit all the files in most any system in memory.
why shouldn't that be the limit?   see below.

> > i'm not sure i understand when and why this would be useful.  nobody
> > has a real worm anymore.  i can walk /sys/src in 0.5s.
>
> you've got a fast system.
> in at least one system i use, du -a of /sys/src takes about 25s.

i have a humble 2y.o. single-core 35w celeron as a fileserver.

> and /sys/src isn't by any means the largest tree i like to grep
> (for instance, searching for lost files with a name i longer remember,
> i've been known to search through all the files in my home directory,
> ~425000 files at last count)
>
> sometimes i think it would be nice if du had a breadth-first option.

aren't you contridicting yourself?  at 128 characters/file,
that's only 52mb -- 2% of memory on a typical system these days.
why can't it be passed as an argument list?

- erik



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2010-01-02  0:51 ` erik quanstrom
@ 2010-01-02  1:44   ` roger peppe
  0 siblings, 0 replies; 51+ messages in thread
From: roger peppe @ 2010-01-02  1:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2010/1/2 erik quanstrom <quanstro@quanstro.net>:
> using xargs does work around the problem.  but then why not
> go all the way and remove ` from rc?  after all, ` only works some
> of the time?

because the limit is big enough that cases that break the
limit almost never happen except in this case?

> i'm not sure i understand when and why this would be useful.  nobody
> has a real worm anymore.  i can walk /sys/src in 0.5s.

you've got a fast system.
in at least one system i use, du -a of /sys/src takes about 25s.

and /sys/src isn't by any means the largest tree i like to grep
(for instance, searching for lost files with a name i longer remember,
i've been known to search through all the files in my home directory,
~425000 files at last count)

sometimes i think it would be nice if du had a breadth-first option.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<df49a7371001011234r3aaaf961n8ce253a6681e74b1@mail.gmail.com>
@ 2010-01-02  0:51 ` erik quanstrom
  2010-01-02  1:44   ` roger peppe
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2010-01-02  0:51 UTC (permalink / raw)
  To: 9fans

> i don't really see why xargs (the idea, not the usual unix implementations)
> is inherently such a bad idea. years ago i wrote an ultra simple version
> with no options, and it's about 80 lines of code, which i use to grep
> through all of /sys/src for example.

that's interesting.  my objection to xargs is all about the idea.
the usual way of doing things breaks if there are too many files.
using xargs does work around the problem.  but then why not
go all the way and remove ` from rc?  after all, ` only works some
of the time?

> if you always split on \n (which is banned in filenames in plan 9) and
> don't interpret any other metacharacters, what's the problem?
>
> it's also nice because you often get some results before you've
> walked the entire file tree.

i'm not sure i understand when and why this would be useful.  nobody
has a real worm anymore.  i can walk /sys/src in 0.5s.  grepping takes
about 12s; saving 1/24th the time (best case) doesn't seem like a
big win.  (the ratio is 10:1 on coraid's fileserver.)

- erik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29  0:41 erik quanstrom
  2009-12-29  1:03 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2010-01-01 20:34 ` roger peppe
  1 sibling, 0 replies; 51+ messages in thread
From: roger peppe @ 2010-01-01 20:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/12/29 erik quanstrom <quanstro@quanstro.net>:
> what seems more important to me is a way to unlimit the size
> of argv.  otherwise we'll need to go down the hideous xargs path.
> (apoligizes to hideous functions everywhere for the slur.)

i don't really see why xargs (the idea, not the usual unix implementations)
is inherently such a bad idea. years ago i wrote an ultra simple version
with no options, and it's about 80 lines of code, which i use to grep
through all of /sys/src for example.

if you always split on \n (which is banned in filenames in plan 9) and
don't interpret any other metacharacters, what's the problem?

it's also nice because you often get some results before you've
walked the entire file tree.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<af7cc2a5be1668a4f2cb708f5bd96f67@yyc.orthanc.ca>
@ 2009-12-29  1:22 ` erik quanstrom
  0 siblings, 0 replies; 51+ messages in thread
From: erik quanstrom @ 2009-12-29  1:22 UTC (permalink / raw)
  To: 9fans

On Mon Dec 28 20:04:48 EST 2009, lyndon@orthanc.ca wrote:
> > what seems more important to me is a way to unlimit the size
> > of argv.  otherwise we'll need to go down the hideous xargs path.
>
> How often have you run up against the current limit?  I've yet to hit
> it in anything other than contrived tests.  And even those took work.

several times a month; just often enough to be irritating.  since storage
is still going exponential, i expect this to get worse.

minooka; grep pattern `{find /sys}
grep: virtual memory allocation failed

- erik



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29  1:00     ` anonymous
@ 2009-12-29  1:13       ` Don Bailey
  0 siblings, 0 replies; 51+ messages in thread
From: Don Bailey @ 2009-12-29  1:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

To be fair, the correct script on Plan 9 is academic. Just do what gets the
job done for you now. Don't go down an academic black hole. These guys have
been arguing about `find` since 2002.

D

On Mon, Dec 28, 2009 at 6:00 PM, anonymous <aim0shei@lavabit.com> wrote:

> > While it's true that you'll have misses on tabs in filenames, it's much
> more
> > rare to have a tab in a filename than it is to have a space, yes?
> >
>
> I don't have spaces too, but correct script should not make any
> assumptions.
>
> There is interesting date on http://swtch.com/plan9history/:
> March 23, 1999   allow spaces in file names
>
> I think it will be better to just disallow whitespaces (spaces and
> tabs) in file names. Looks like that idea with using awk was there
> before whitespaces allowed, so there was no problem.
>
>
>

[-- Attachment #2: Type: text/html, Size: 1242 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-29  0:41 erik quanstrom
@ 2009-12-29  1:03 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-01-01 20:34 ` roger peppe
  1 sibling, 0 replies; 51+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2009-12-29  1:03 UTC (permalink / raw)
  To: 9fans

> what seems more important to me is a way to unlimit the size
> of argv.  otherwise we'll need to go down the hideous xargs path.

How often have you run up against the current limit?  I've yet to hit
it in anything other than contrived tests.  And even those took work.

> find and walk are about the same program.

There are a few versions about.  Dan's has the exactly right lack of
options to meet my needs. Others might too, but his is the version I
found first.

--lyndon

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:39   ` Don Bailey
@ 2009-12-29  1:00     ` anonymous
  2009-12-29  1:13       ` Don Bailey
  0 siblings, 1 reply; 51+ messages in thread
From: anonymous @ 2009-12-29  1:00 UTC (permalink / raw)
  To: 9fans

> While it's true that you'll have misses on tabs in filenames, it's much more
> rare to have a tab in a filename than it is to have a space, yes?
>

I don't have spaces too, but correct script should not make any assumptions.

There is interesting date on http://swtch.com/plan9history/:
March 23, 1999	 allow spaces in file names

I think it will be better to just disallow whitespaces (spaces and
tabs) in file names. Looks like that idea with using awk was there
before whitespaces allowed, so there was no problem.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [9fans] du and find
@ 2009-12-29  0:41 erik quanstrom
  2009-12-29  1:03 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-01-01 20:34 ` roger peppe
  0 siblings, 2 replies; 51+ messages in thread
From: erik quanstrom @ 2009-12-29  0:41 UTC (permalink / raw)
  To: 9fans

>> ; du -a | awk '-F\t' '{print $2}' -
>
>All this nonsense because the dogmatists refuse to accept
>/n/sources/contrib/cross/walk.c into the distribution.

find and walk are about the same program.  my version of
find started with andrey's.  his find page (http://mirtchovski.com/p9/find/)
is dated 31-jul-2004, predating the given walk.c by ~18 months,
though i don't know which was written first.

the reason i started fiddling with find was to see if it couldn't
go a bit faster than du. (it did.)

my cannonical examples of its use are
	find | grep whereisthatfile
and
	grep whereisthatfunction `{find /sys/src|grep '\.[chlsy]$'}

i don't think it's that important that it absolutely needs to
be in the distribution; it's a convience.

what seems more important to me is a way to unlimit the size
of argv.  otherwise we'll need to go down the hideous xargs path.
(apoligizes to hideous functions everywhere for the slur.)

- erik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:31   ` Don Bailey
@ 2009-12-28 23:50     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  0 siblings, 0 replies; 51+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2009-12-28 23:50 UTC (permalink / raw)
  To: 9fans

> du -a | awk '-F\t' '{print $2}' -

All this nonsense because the dogmatists refuse to accept
/n/sources/contrib/cross/walk.c into the distribution.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:35 ` erik quanstrom
@ 2009-12-28 23:39   ` Don Bailey
  2009-12-29  1:00     ` anonymous
  0 siblings, 1 reply; 51+ messages in thread
From: Don Bailey @ 2009-12-28 23:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

While it's true that you'll have misses on tabs in filenames, it's much more
rare to have a tab in a filename than it is to have a space, yes? There is
no loss on a single quote character. You're quoting the command line
argument.

On Mon, Dec 28, 2009 at 4:35 PM, erik quanstrom <quanstro@quanstro.net>wrote:

> On Mon Dec 28 18:32:36 EST 2009, don.bailey@gmail.com wrote:
>
> > du -a | awk '-F\t' '{print $2}' -
> >
>
> lossage on tabs and ' in filenames.
>
> - erik
>
>

[-- Attachment #2: Type: text/html, Size: 902 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<68eb39920912281531jd0e4661j56adfc589a370dfc@mail.gmail.com>
@ 2009-12-28 23:35 ` erik quanstrom
  2009-12-28 23:39   ` Don Bailey
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2009-12-28 23:35 UTC (permalink / raw)
  To: 9fans

On Mon Dec 28 18:32:36 EST 2009, don.bailey@gmail.com wrote:

> du -a | awk '-F\t' '{print $2}' -
>

lossage on tabs and ' in filenames.

- erik



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
  2009-12-28 23:25 ` erik quanstrom
@ 2009-12-28 23:31   ` Don Bailey
  2009-12-28 23:50     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  0 siblings, 1 reply; 51+ messages in thread
From: Don Bailey @ 2009-12-28 23:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]

du -a | awk '-F\t' '{print $2}' -

On Mon, Dec 28, 2009 at 4:25 PM, erik quanstrom <quanstro@quanstro.net>wrote:

> i agree that du -a has a few holes.  too bad whitespace
> is allowed in file names.  i use the attached find.c.
> it's also available as contrib quanstro/find.  by default
> the output is quoted so that it can be reparsed properly
> with rc or gettokens.
>
> - erik

[-- Attachment #2: Type: text/html, Size: 709 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [9fans] du and find
       [not found] <<20091228230510.GA25423@machine>
@ 2009-12-28 23:25 ` erik quanstrom
  2009-12-28 23:31   ` Don Bailey
  0 siblings, 1 reply; 51+ messages in thread
From: erik quanstrom @ 2009-12-28 23:25 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 260 bytes --]

i agree that du -a has a few holes.  too bad whitespace
is allowed in file names.  i use the attached find.c.
it's also available as contrib quanstro/find.  by default
the output is quoted so that it can be reparsed properly
with rc or gettokens.

- erik

[-- Attachment #2: find.c --]
[-- Type: text/plain, Size: 2491 bytes --]

#include <u.h>
#include <libc.h>
#include <bio.h>

char 	*defargv[] = {".", 0};
char	*fmt = "%q\n";
int	flag[256];
uint	dev ;
uint	type;
Biobuf	out;

void
warn(char *s)
{
	if(flag['f'] == 0)
		fprint(2, "find: %s: %r\n", s);
}

void
usage(void)
{
	fprint(2, "usage: find [-1dfq] [path ...]\n");
	exits("usage");
}

/*  if you think this scales you'd be wrong.  this is is 1/128th of a linear search.  */

enum{
	Ncache		= 128,		/* must be power of two */
	Cachebits	= Ncache-1,
};

typedef struct{
	vlong	qpath;
	uint	dev;
	uchar	type;
} Fsig;

typedef	struct	Cache	Cache;
struct Cache{
	Fsig	*cache;
	int	n;
	int	nalloc;
} cache[Ncache];

void
clearcache(void)
{
	int i;

	for(i = 0; i < nelem(cache); i++)
		free(cache[i].cache);
	memset(cache, 0, nelem(cache)*sizeof cache[0]);
}

int
seen(Dir *dir)
{
	Fsig 	*f;
	Cache 	*c;
	int	i;

	c = &cache[dir->qid.path&Cachebits];
	f = c->cache;
	for(i = 0; i < c->n; i++)
		if(dir->qid.path == f[i].qpath
			&& dir->type == f[i].type
			&& dir->dev == f[i].dev)
			return 1;
	if(i == c->nalloc){
		c->nalloc += 20;
		f = c->cache = realloc(c->cache, c->nalloc*sizeof *f);
	}
	f[c->n].qpath = dir->qid.path;
	f[c->n].type = dir->type;
	f[c->n].dev = dir->dev;
	c->n++;
	return 0;
}

int
dskip(Dir *d)
{
	if(flag['1']){
		if(dev == 0 && type == 0){
			dev = d->dev;
			type = d->type;
		}
		if(d->dev != dev || d->type != type)
			return 0;
	}
	return 1;
}

int
skip(Dir *d)
{
	if(strcmp(d->name, ".") == 0|| strcmp(d->name, "..") == 0 || seen(d))
		return 1;
	return 0;
}

void
find(char *name)
{
	int fd, n;
	Dir *buf, *p, *e;
	char file[256];

	if((fd = open(name, OREAD)) < 0) {
		warn(name);
		return;
	}
	Bprint(&out, fmt, name);
	for(; (n = dirread(fd, &buf)) > 0; free(buf))
		for(p = buf, e = p+n; p < e; p++){
			snprint(file, sizeof file, "%s/%s", name, p->name);
			if((p->qid.type&QTDIR) == 0 || !dskip(p)){
				if(!flag['d'])
					Bprint(&out, fmt, file);
			}else if(!skip(p))
				find(file);
		}
	close(fd);
}

void
main(int argc, char *argv[])
{
	doquote = needsrcquote;
	quotefmtinstall();

	ARGBEGIN{
	case 'd':
	case 'f':
	case '1':
		flag[ARGC()] = 1;
		break;
	case 'q':
		fmt = "%s\n";
		break;
	default:
		usage();
	}ARGEND

	Binit(&out, 1, OWRITE);
	if(argc == 0)
		argv = defargv;
	for(; *argv; argv++){
		find(*argv);
		clearcache();
	}
	Bterm(&out);
	exits(0);
}


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2010-05-04 18:39 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-28 23:05 [9fans] du and find anonymous
2009-12-28 23:09 ` lucio
2009-12-28 23:14 ` Steve Simon
2009-12-29 17:59 ` Tim Newsham
2009-12-29 18:28   ` Don Bailey
2009-12-29 20:16   ` Rob Pike
2009-12-30  7:44     ` anonymous
2010-05-03 12:13   ` Mathieu Lonjaret
2010-05-03 12:18     ` Akshat Kumar
2010-05-03 12:26       ` Mathieu Lonjaret
2010-05-03 12:49         ` tlaronde
2010-05-03 13:10         ` Ethan Grammatikidis
2010-05-03 13:41           ` Steve Simon
2010-05-03 15:18             ` Ethan Grammatikidis
2010-05-03 15:29               ` jake
2010-05-03 15:46                 ` Ethan Grammatikidis
2010-05-03 15:37               ` Steve Simon
2010-05-03 13:17       ` Rudolf Sykora
2010-05-03 14:53         ` erik quanstrom
2010-05-03 18:34           ` Jorden M
2010-05-04 10:01             ` Ethan Grammatikidis
2010-05-04 10:29               ` Robert Raschke
2010-05-04 15:38               ` Jorden M
2010-05-04 16:56                 ` Gabriel Díaz
2010-05-04 18:39                   ` Karljurgen Feuerherm
2010-05-03 14:03     ` erik quanstrom
     [not found] <<20091228230510.GA25423@machine>
2009-12-28 23:25 ` erik quanstrom
2009-12-28 23:31   ` Don Bailey
2009-12-28 23:50     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
     [not found] <<68eb39920912281531jd0e4661j56adfc589a370dfc@mail.gmail.com>
2009-12-28 23:35 ` erik quanstrom
2009-12-28 23:39   ` Don Bailey
2009-12-29  1:00     ` anonymous
2009-12-29  1:13       ` Don Bailey
2009-12-29  0:41 erik quanstrom
2009-12-29  1:03 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-01-01 20:34 ` roger peppe
     [not found] <<af7cc2a5be1668a4f2cb708f5bd96f67@yyc.orthanc.ca>
2009-12-29  1:22 ` erik quanstrom
     [not found] <<df49a7371001011234r3aaaf961n8ce253a6681e74b1@mail.gmail.com>
2010-01-02  0:51 ` erik quanstrom
2010-01-02  1:44   ` roger peppe
     [not found] <<df49a7371001011744o6687fd59l451d690ea56edea5@mail.gmail.com>
2010-01-02  2:02 ` erik quanstrom
2010-01-02  5:29   ` anonymous
2010-01-02 18:43   ` roger peppe
2010-01-03  2:28     ` Anthony Sorace
     [not found] <<20100102052943.GA9871@machine>
2010-01-02 17:05 ` erik quanstrom
2010-01-02 18:18   ` anonymous
     [not found] <<df49a7371001021043p2a990207od65457a068b7828@mail.gmail.com>
2010-01-02 19:47 ` erik quanstrom
2010-01-02 23:21   ` Bakul Shah
2010-01-03  1:49     ` erik quanstrom
2010-01-03  2:31       ` Bakul Shah
2010-01-03  2:40         ` erik quanstrom
2010-01-06 20:44           ` Akshat Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).