Gnus development mailing list
 help / color / mirror / Atom feed
* nnml compression: state of the art?
@ 2001-03-29  8:22 Bill White
  2001-03-29 13:04 ` Karl Kleinpaste
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Bill White @ 2001-03-29  8:22 UTC (permalink / raw)


Is it possible nowadays to have gnus automatically gzip nnml files
when they're written?  If so, is it then possible for some tool to
grep the unzipped files?  Maybe I could add gunzip to this thing
somewhere?

   find . -path '/billw/Mail/*' -type f -print0 | xargs -0 -e grep -ni <search-string>

Yes, I just ran out of disk space.

Cheers -

bw
-- 
Bill White . billw@wolfram.com . http://members.wri.com/billw
"No ma'am, we're musicians."


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29  8:22 nnml compression: state of the art? Bill White
@ 2001-03-29 13:04 ` Karl Kleinpaste
  2001-03-29 15:47 ` Stainless Steel Rat
  2001-03-30 10:44 ` Bill White
  2 siblings, 0 replies; 12+ messages in thread
From: Karl Kleinpaste @ 2001-03-29 13:04 UTC (permalink / raw)


See zgrep(1).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29  8:22 nnml compression: state of the art? Bill White
  2001-03-29 13:04 ` Karl Kleinpaste
@ 2001-03-29 15:47 ` Stainless Steel Rat
  2001-03-29 17:12   ` Randal L. Schwartz
  2001-03-30 10:44 ` Bill White
  2 siblings, 1 reply; 12+ messages in thread
From: Stainless Steel Rat @ 2001-03-29 15:47 UTC (permalink / raw)


* Bill White <billw@wolfram.com>  on Thu, 29 Mar 2001
| Is it possible nowadays to have gnus automatically gzip nnml files
| when they're written?

I don't know.  I haven't used it myself in years.  Back then it was read
only.  Though if you are using nnml you may not get the results you might
expect.  Depends on your filesystem's block size.  For example, if you are
using 4K blocks, the smallest chunk of disk that can be allocated is 4K.
If a message is 3.9K it will take one 4K block.  If you compress it to 2.5K
it will still consume one 4K block.  So you save nothing.

The most efficient thing would be to occasionally run a find command like
this:

  find $HOME/Mail \
    -name ".overview" -prune -o \
    -name "*.gz" -prune -o \
    -type f -size +4k -print | xargs gzip -1

substituting your filesystem's block size in the +4k segment.  This will
skip compressing .overview files and files that are already compressed.
Anything more than minimal compression will not gain you much.

| If so, is it then possible for some tool to grep the unzipped files?

zgrep
-- 
Rat <ratinox@peorth.gweep.net>    \ Happy Fun Ball contains a liquid core,
Minion of Nathan - Nathan says Hi! \ which, if exposed due to rupture, should
PGP Key: at a key server near you!  \ not be touched, inhaled, or looked at.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 15:47 ` Stainless Steel Rat
@ 2001-03-29 17:12   ` Randal L. Schwartz
  2001-03-29 19:15     ` Stainless Steel Rat
  0 siblings, 1 reply; 12+ messages in thread
From: Randal L. Schwartz @ 2001-03-29 17:12 UTC (permalink / raw)
  Cc: (ding)

>>>>> "Rat" == Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

Rat> The most efficient thing would be to occasionally run a find command like
Rat> this:

Rat>   find $HOME/Mail \
Rat>     -name ".overview" -prune -o \
Rat>     -name "*.gz" -prune -o \
Rat>     -type f -size +4k -print | xargs gzip -1

Rat> substituting your filesystem's block size in the +4k segment.  This will
Rat> skip compressing .overview files and files that are already compressed.
Rat> Anything more than minimal compression will not gain you much.

    #!/usr/bin/perl
    use strict;
    $|++;

    use File::Find;

    find sub {
      return unless /^(\d+)$/ and -f and -s _ > 65535 and -A _ > 0.1;
      system '/bin/gzip', "-9v", $File::Find::name;
    }, "/home/merlyn/Mail";


-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 17:12   ` Randal L. Schwartz
@ 2001-03-29 19:15     ` Stainless Steel Rat
  2001-03-29 19:21       ` Paul Jarc
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Stainless Steel Rat @ 2001-03-29 19:15 UTC (permalink / raw)


* merlyn@stonehenge.com (Randal L. Schwartz)  on Thu, 29 Mar 2001
|       return unless /^(\d+)$/ and -f and -s _ > 65535 and -A _ > 0.1;

What, exactly, does this do?

|       system '/bin/gzip', "-9v", $File::Find::name;

For the most part all that -9 will gain you over -1 is everything runs
slower.
-- 
Rat <ratinox@peorth.gweep.net>    \ Caution: Happy Fun Ball may suddenly
Minion of Nathan - Nathan says Hi! \ accelerate to dangerous speeds.
PGP Key: at a key server near you!  \ 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 19:15     ` Stainless Steel Rat
@ 2001-03-29 19:21       ` Paul Jarc
  2001-03-29 22:01         ` Stainless Steel Rat
  2001-03-29 19:46       ` Michael Livshin
  2001-03-29 19:47       ` Alan Shutko
  2 siblings, 1 reply; 12+ messages in thread
From: Paul Jarc @ 2001-03-29 19:21 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:
> * merlyn@stonehenge.com (Randal L. Schwartz)  on Thu, 29 Mar 2001
> |       return unless /^(\d+)$/ and -f and -s _ > 65535 and -A _ > 0.1;
> 
> What, exactly, does this do?

Skips the file if its name is not numeric, or if it is not a regular
file, or if it is smaller than 65536 bytes, or if it has been accessed
within the last 0.1 days.


paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 19:15     ` Stainless Steel Rat
  2001-03-29 19:21       ` Paul Jarc
@ 2001-03-29 19:46       ` Michael Livshin
  2001-03-29 19:47       ` Alan Shutko
  2 siblings, 0 replies; 12+ messages in thread
From: Michael Livshin @ 2001-03-29 19:46 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> * merlyn@stonehenge.com (Randal L. Schwartz)  on Thu, 29 Mar 2001
> |       return unless /^(\d+)$/ and -f and -s _ > 65535 and -A _ > 0.1;
> 
> What, exactly, does this do?

shows you yet *another* way to do the proverbial "it" in Perl.  not
sure what, though.

-- 
(only legal replies to this address are accepted)

In an experiment to determine the precise amount of beer required to
enjoy this film, I passed out.     -- dave o'brien, on Highlander II


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 19:15     ` Stainless Steel Rat
  2001-03-29 19:21       ` Paul Jarc
  2001-03-29 19:46       ` Michael Livshin
@ 2001-03-29 19:47       ` Alan Shutko
  2001-03-29 22:26         ` Stainless Steel Rat
  2 siblings, 1 reply; 12+ messages in thread
From: Alan Shutko @ 2001-03-29 19:47 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

> * merlyn@stonehenge.com (Randal L. Schwartz)  on Thu, 29 Mar 2001

> |       system '/bin/gzip', "-9v", $File::Find::name;
> 
> For the most part all that -9 will gain you over -1 is everything runs
> slower.

I disagree.  I tried it on a mail folder I have around:

20299155        Unzipped
 5719754        Gzipped -1
 4761790        Gzipped -9
 4795281        Gzipped (default)

Not much win in size over the default, but definately better than -1.

In terms of time:

time gzip [14:45:05] wesley:~/Library/MailArchive $ time gzip \#linux.kernel.2001-02.gz#

real	0m7.041s
user	0m4.870s
sys	0m0.150s
[14:45:14] wesley:~/Library/MailArchive $ !gun
gunzip \#linux.kernel.2001-02.gz#.gz
[14:45:21] wesley:~/Library/MailArchive $ time gzip -9 \#linux.kernel.2001-02.gz#

real	0m7.732s
user	0m5.530s
sys	0m0.160s
[14:45:34] wesley:~/Library/MailArchive $ gunzip \#linux.kernel.2001-02.gz#.gz
[14:46:26] wesley:~/Library/MailArchive $ time gzip -1 \#linux.kernel.2001-02.gz#

real	0m5.661s
user	0m2.830s
sys	0m0.220s


So, in general, I just leave things alone unless I want things to run
faster.

-- 
Alan Shutko <ats@acm.org> - In a variety of flavors!
A CONS is an object which cares.  -- Bernie Greenberg


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 19:21       ` Paul Jarc
@ 2001-03-29 22:01         ` Stainless Steel Rat
  2001-03-30 13:01           ` Randal L. Schwartz
  0 siblings, 1 reply; 12+ messages in thread
From: Stainless Steel Rat @ 2001-03-29 22:01 UTC (permalink / raw)


* prj@po.cwru.edu (Paul Jarc)  on Thu, 29 Mar 2001
| Skips the file if its name is not numeric, or if it is not a regular
| file, or if it is smaller than 65536 bytes, or if it has been accessed
| within the last 0.1 days.

Ah, well, that 65k size minimum makes it useless for nnml, where the vast
majority of messages are 5-10k.  When compressing nnml files you want to
hit every file that is larger than the block size.  I don't know anything
that uses 64k blocks other than Windows.
-- 
Rat <ratinox@peorth.gweep.net>    \ Caution: Happy Fun Ball may suddenly
Minion of Nathan - Nathan says Hi! \ accelerate to dangerous speeds.
PGP Key: at a key server near you!  \ 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 19:47       ` Alan Shutko
@ 2001-03-29 22:26         ` Stainless Steel Rat
  0 siblings, 0 replies; 12+ messages in thread
From: Stainless Steel Rat @ 2001-03-29 22:26 UTC (permalink / raw)


* Alan Shutko <ats@acm.org>  on Thu, 29 Mar 2001
| I disagree.  I tried it on a mail folder I have around:

Context: nnml files are generally ~4k, not 20MB.  A more relevant example:

-rw-------    1 ratinox  ratinox      4974 Mar 29 17:04 test
-rw-------    1 ratinox  ratinox      2640 Mar 29 17:04 test.1.gz
-rw-------    1 ratinox  ratinox      2557 Mar 29 17:04 test.9.gz

On a system like mine with 4k blocks, test requires two blocks (8k), while
test.1.gz and test.9.gz require one block (4k) each.  The net gain in free
space is one block (4k) regardess of -1 or -9.

On a system with 2k blocks, test requires three blocks (6k), while
test.1.gz and test.9.gz require two blocks (4k) each.  The net gain in free
space is one block (2k) regardless of -1 or -9.

On a system with 1k blocks, test requires five blocks (5k), while test.1.gz
and test.9.gz require three blocks (3k) each.  The net gain in free space
is two blocks (2k) regardless of -1 or -9.

On a system with 512 byte blocks, test requires 10 blocks (5k), while
test.1.gz requires 6 blocks (3k) and test.2.gz requires 5 blocks (2.5k).  A
net savings of 512 bytes.  This is reaching the point of diminishing
returns, because if test were 7 bytes longer then the gzip -9 file would be
slightly over 512 * 5 and require 6 blocks (3k), again for no net gain.

If you have a few very large files then gzip -9 is usually better if space
is the primary concern.  If you have lots and lots of small files then gzip
-1 is almost always better, except in a few odd cases.

By the way:

-rw-------    1 ratinox  ratinox      4974 Mar 29 17:04 test
-rw-------    1 ratinox  ratinox      2640 Mar 29 17:04 test.1.gz
-rw-------    1 ratinox  ratinox      2623 Mar 29 17:04 test.2.gz
-rw-------    1 ratinox  ratinox      2607 Mar 29 17:04 test.3.gz
-rw-------    1 ratinox  ratinox      2561 Mar 29 17:04 test.4.gz
-rw-------    1 ratinox  ratinox      2557 Mar 29 17:04 test.5.gz
-rw-------    1 ratinox  ratinox      2557 Mar 29 17:04 test.6.gz
-rw-------    1 ratinox  ratinox      2557 Mar 29 17:04 test.7.gz
-rw-------    1 ratinox  ratinox      2557 Mar 29 17:04 test.8.gz
-rw-------    1 ratinox  ratinox      2557 Mar 29 17:04 test.9.gz

Anything beyond -5 nets you no extra space savings for small files.
-- 
Rat <ratinox@peorth.gweep.net>    \ Do not use Happy Fun Ball on concrete.
Minion of Nathan - Nathan says Hi! \ 
PGP Key: at a key server near you!  \ 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29  8:22 nnml compression: state of the art? Bill White
  2001-03-29 13:04 ` Karl Kleinpaste
  2001-03-29 15:47 ` Stainless Steel Rat
@ 2001-03-30 10:44 ` Bill White
  2 siblings, 0 replies; 12+ messages in thread
From: Bill White @ 2001-03-30 10:44 UTC (permalink / raw)


On Thu Mar 29 2001 at 02:22, Bill White <billw@wolfram.com> said:

    bw> Is it possible nowadays to have gnus automatically gzip nnml
    bw> files when they're written?  If so, is it then possible for
    bw> some tool to grep the unzipped files?  Maybe I could add
    bw> gunzip to this thing somewhere?

Thanks for all the replies and discussion, and especially the pointers
to zgrep.  I learn about the coolest software here on the ding list.

Cheers -

bw
-- 
Bill White . billw@wolfram.com . http://members.wri.com/billw
"No ma'am, we're musicians."


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nnml compression: state of the art?
  2001-03-29 22:01         ` Stainless Steel Rat
@ 2001-03-30 13:01           ` Randal L. Schwartz
  0 siblings, 0 replies; 12+ messages in thread
From: Randal L. Schwartz @ 2001-03-30 13:01 UTC (permalink / raw)
  Cc: (ding)

>>>>> "Rat" == Stainless Steel Rat <ratinox@peorth.gweep.net> writes:

Rat> * prj@po.cwru.edu (Paul Jarc)  on Thu, 29 Mar 2001
Rat> | Skips the file if its name is not numeric, or if it is not a regular
Rat> | file, or if it is smaller than 65536 bytes, or if it has been accessed
Rat> | within the last 0.1 days.

Rat> Ah, well, that 65k size minimum makes it useless for nnml, where the vast
Rat> majority of messages are 5-10k.  When compressing nnml files you want to
Rat> hit every file that is larger than the block size.  I don't know anything
Rat> that uses 64k blocks other than Windows.

Well, for my purposes, I seem to get a lot of MIME crap that I of
course archive because I'm a packrat (I keep bumping up against the
65536 file limit in glimpseindex!).  So this script is tailored to
compress these huge friggin attachements that people now seem to think
is OK to send.

I meant it as a model.  Lower that 65k down to 4k, and it does what
y'all were talking about.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-03-30 13:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-29  8:22 nnml compression: state of the art? Bill White
2001-03-29 13:04 ` Karl Kleinpaste
2001-03-29 15:47 ` Stainless Steel Rat
2001-03-29 17:12   ` Randal L. Schwartz
2001-03-29 19:15     ` Stainless Steel Rat
2001-03-29 19:21       ` Paul Jarc
2001-03-29 22:01         ` Stainless Steel Rat
2001-03-30 13:01           ` Randal L. Schwartz
2001-03-29 19:46       ` Michael Livshin
2001-03-29 19:47       ` Alan Shutko
2001-03-29 22:26         ` Stainless Steel Rat
2001-03-30 10:44 ` Bill White

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).