[9fans] fossil pb: a clue?

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] fossil pb: a clue?
@ 2012-01-13 11:30 tlaronde
  2012-01-13 13:30 ` erik quanstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 11:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I'm still trying to find why I have fossil data twice the size of the
files served.

Since this is almost exactly (minus the MB of kerTeX) twice the size, I
wonder if fossil has, from the installation, still a recorded plan9.iso,
not visible when mounting main.

Since I have done an installation locally, without CDROM (doc explaining
howto under review), the bzip2 has been uncompressed in fossil area by
the installation scripts. But does the iso have been removed after
copying the data, and perhaps unmouting /n/dist/ or whatever? And if it
has not been removed, what is its path under fossil?

I tried brutally a grep(1) on /dev/sdC0/fossil. Found not something
looking like a "plan9.iso" filename entry, but the same data matched
was printed twice...

Is there a way to print the list of fossil registered pathnames (an
absolute lstree on what fossil has)? To find if there is "somewhere" the
iso registered, and not showing.

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: a clue?
  2012-01-13 11:30 [9fans] fossil pb: a clue? tlaronde
@ 2012-01-13 13:30 ` erik quanstrom
  2012-01-13 13:38 ` [9fans] fossil pb: FOUND! tlaronde
  2012-01-13 13:59 ` [9fans] fossil pb: a clue? David du Colombier
  2 siblings, 0 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 13:30 UTC (permalink / raw)
  To: 9fans

have you already done something like
	du -a | sort -nr | sed 20q
in the main tree?.  (it may make sense to
remount /srv/boot someplace else to avoid device, etc.)

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [9fans] fossil pb: FOUND!
  2012-01-13 11:30 [9fans] fossil pb: a clue? tlaronde
  2012-01-13 13:30 ` erik quanstrom
@ 2012-01-13 13:38 ` tlaronde
  2012-01-13 13:59   ` erik quanstrom
  2012-01-13 13:59 ` [9fans] fossil pb: a clue? David du Colombier
  2 siblings, 1 reply; 57+ messages in thread
From: tlaronde @ 2012-01-13 13:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Summary of the previous episodes: after having reinstalled Plan9, with
almost only the vanilla distribution, du(1) announces 325 MB, while
fossil uses twice the place to store.

I suspected, since it was clearly almost precisely twice (minus some tmp
files and kerTeX), that the problem was the plan9.iso was still there,
at least in fossil, and not showing.

On the console, I tried in turn (looking at the scripts in pc/inst/)
where it could be, and found:

stat /active/dist/plan9.iso plan9.iso glenda sys 664 289990656

But the surprise is that it was _not_ hidden. It _was_ here under /dist
... but apparently not added to the summary made by du(1)? Does du(1)
"know" that some dir are mount point taking (normally) no real space,
and skipping them? Because this means one can add whatever files in
there and fill fossil with du(1) ignoring all...

On a side note, the print from du(1) is not accurate with the "-h" flag:

term% du -sh /
347.8285G	/

I have megabytes, not gigabytes.

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 13:38 ` [9fans] fossil pb: FOUND! tlaronde
@ 2012-01-13 13:59   ` erik quanstrom
  2012-01-13 14:08     ` tlaronde
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 13:59 UTC (permalink / raw)
  To: 9fans

> But the surprise is that it was _not_ hidden. It _was_ here under /dist
> ... but apparently not added to the summary made by du(1)? Does du(1)
> "know" that some dir are mount point taking (normally) no real space,
> and skipping them? Because this means one can add whatever files in
> there and fill fossil with du(1) ignoring all...
>
> On a side note, the print from du(1) is not accurate with the "-h" flag:
>
> term% du -sh /
> 347.8285G	/
>
> I have megabytes, not gigabytes.

i think your / has mounts and binds that are confusing du.  you
need to remount your root file system someplace free of mounts
or binds on top, e.g.:

	; mount /srv/boot /n/boot; cd /n/boot
	; du -s .>[2=]
	4049232	.
	; du -sh .>[2=]
	3.861649G	.
	; hoc
	4049232 / 1024 / 1024
	3.86164855957

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: a clue?
  2012-01-13 11:30 [9fans] fossil pb: a clue? tlaronde
  2012-01-13 13:30 ` erik quanstrom
  2012-01-13 13:38 ` [9fans] fossil pb: FOUND! tlaronde
@ 2012-01-13 13:59 ` David du Colombier
  2012-01-13 14:11   ` tlaronde
  2 siblings, 1 reply; 57+ messages in thread
From: David du Colombier @ 2012-01-13 13:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

How have you deleted the plan9.iso file?

If you have used rm(1) or fossilcons(4) remove, the blocks
should be properly unallocated from Fossil. But if you used
fossilcons(4) clri for example, you have to manually reclaim
the abandoned storage with clre and bfree, with the help of
fossil/flchk or fossilcons(4) check.

See the following illustration:

# our current empty fossil
main: df
main: 40,960 used + 1,071,710,208 free = 1,071,751,168 (0% used)

# we copy a file on fossil, then remove it properly
% cp /386/9pcf /n/fossil
main: df
main: 3,661,824 used + 1,068,089,344 free = 1,071,751,168 (0% used)
main: remove /active/9pcf
main: df
main: 57,344 used + 1,071,693,824 free = 1,071,751,168 (0% used)

# we copy a file on fossil, then remove it with clri
% cp /386/9pcf /n/fossil
main: df
main: 3,661,824 used + 1,068,089,344 free = 1,071,751,168 (0% used)
main: check
checking epoch 1...
check: visited 1/130829 blocks (0%)
fsys blocks: total=130829 used=447(0.3%) free=130382(99.7%) lost=0(0.0%)
fsck: 0 clri, 0 clre, 0 clrp, 0 bclose
main: clri /active/9pcf
main: df
main: 3,661,824 used + 1,068,089,344 free = 1,071,751,168 (0% used)
main: check
checking epoch 1...
check: visited 1/130829 blocks (0%)
fsys blocks: total=130829 used=447(0.3%) free=130382(99.7%) lost=0(0.0%)
error: non referenced entry in source /active[0]
fsck: 0 clri, 1 clre, 0 clrp, 0 bclose

# we identify the abandoned storage and reclaim it with bfree
term% fossil/flchk -f fossil.img | sed -n 's/^# //p'
clre 0x5 0
term% fossil/flchk -f fossil.img | sed -n 's/^# bclose (.*) .*/bfree \1/p'
bfree 0x7
bfree 0x8
[...]
main: clre 0x5 0
block 0x5 0 40
000000001FF420002900000000000000003691B900000000000000000000000071875E60000001A1
main: bfree 0x7
label 0x7 0 1 1 4294967295 0x71875e60
main: bfree 0x8
label 0x8 1 1 1 4294967295 0x71875e60
[...]
main: check
checking epoch 1...
check: visited 1/130829 blocks (0%)
fsys blocks: total=130829 used=7(0.0%) free=130822(100.0%) lost=0(0.0%)
fsck: 0 clri, 0 clre, 0 clrp, 0 bclose
main: df main: 3,661,824 used + 1,068,089,344 free = 1,071,751,168 (0%
used)

Note that it doesn't update c->fl->nused, reported by df.

--
David du Colombier



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 13:59   ` erik quanstrom
@ 2012-01-13 14:08     ` tlaronde
  2012-01-13 14:47       ` erik quanstrom
  0 siblings, 1 reply; 57+ messages in thread
From: tlaronde @ 2012-01-13 14:08 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 08:59:32AM -0500, erik quanstrom wrote:
> > But the surprise is that it was _not_ hidden. It _was_ here under /dist
> > ... but apparently not added to the summary made by du(1)? Does du(1)
> > "know" that some dir are mount point taking (normally) no real space,
> > and skipping them? Because this means one can add whatever files in
> > there and fill fossil with du(1) ignoring all...
> >
> > On a side note, the print from du(1) is not accurate with the "-h" flag:
> >
> > term% du -sh /
> > 347.8285G	/
> >
> > I have megabytes, not gigabytes.
>
> i think your / has mounts and binds that are confusing du.  you
> need to remount your root file system someplace free of mounts
> or binds on top, e.g.:
>
> 	; mount /srv/boot /n/boot; cd /n/boot
> 	; du -s .>[2=]
> 	4049232	.
> 	; du -sh .>[2=]
> 	3.861649G	.
> 	; hoc
> 	4049232 / 1024 / 1024
> 	3.86164855957

Do you spot only the 347.8285G? or altogether the /dist/plan9.iso that
was not seen? Because, for the gigabytes, it is just a format error,
since without the option it reports correctly 350Mb. Since your test is
with gigabytes, the G suffix is correct. But it may be simply (I didn't
look at the source) that for Megabytes, it prints a G suffix too...
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: a clue?
  2012-01-13 13:59 ` [9fans] fossil pb: a clue? David du Colombier
@ 2012-01-13 14:11   ` tlaronde
  0 siblings, 0 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 14:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 02:59:52PM +0100, David du Colombier wrote:
> How have you deleted the plan9.iso file?
>

I have used in user space rm(1), and then on the console: check, since
it is said to reclaim the space. And it did.

I have now half the size of the previous one, and this matches the real
data here.

> If you have used rm(1) or fossilcons(4) remove, the blocks
> should be properly unallocated from Fossil. But if you used
> fossilcons(4) clri for example, you have to manually reclaim
> the abandoned storage with clre and bfree, with the help of
> fossil/flchk or fossilcons(4) check.
>
> See the following illustration:
>
>[...]

Thanks for the clarifications.

What puzzles me for now, is the du(1) hole...

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 14:08     ` tlaronde
@ 2012-01-13 14:47       ` erik quanstrom
  2012-01-13 16:01         ` tlaronde
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 14:47 UTC (permalink / raw)
  To: 9fans

> > i think your / has mounts and binds that are confusing du.  you
> > need to remount your root file system someplace free of mounts
> > or binds on top, e.g.:
> >
> > 	; mount /srv/boot /n/boot; cd /n/boot
> > 	; du -s .>[2=]
> > 	4049232	.
> > 	; du -sh .>[2=]
> > 	3.861649G	.
> > 	; hoc
> > 	4049232 / 1024 / 1024
> > 	3.86164855957
>
> Do you spot only the 347.8285G? or altogether the /dist/plan9.iso that
> was not seen? Because, for the gigabytes, it is just a format error,
> since without the option it reports correctly 350Mb. Since your test is
> with gigabytes, the G suffix is correct. But it may be simply (I didn't
> look at the source) that for Megabytes, it prints a G suffix too...

please try what i suggested.  i've shown that du -h works properly
on my system.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 14:47       ` erik quanstrom
@ 2012-01-13 16:01         ` tlaronde
  2012-01-13 16:16           ` erik quanstrom
  2012-01-13 16:17           ` David du Colombier
  0 siblings, 2 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 16:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 09:47:25AM -0500, erik quanstrom wrote:
>
> please try what i suggested.  i've shown that du -h works properly
> on my system.
>

Indeed, remounting yields the correct Mb result.

What I missed, is that the result of du(1) is not in bytes, but in
_kilobytes_ (since I know there is appx. 300 Mb, I thought it was bytes).

So this means that the plan9.iso, being "only" 250Mb, had a small impact
on a printed result	wrongly multiplied by 1000 or so. So the file was
not hidden, but this is the whole du(1) count that is wrong.

The mounts in my profile are the vanilla ones (the only customizations
are for the network, the mouse, the keyboard). I do not play with the
namespace.

Have you an idea where to look to find what are the offending
instructions? /boot(8)?
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:01         ` tlaronde
@ 2012-01-13 16:16           ` erik quanstrom
  2012-01-13 16:34             ` tlaronde
  2012-01-13 16:17           ` David du Colombier
  1 sibling, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 16:16 UTC (permalink / raw)
  To: 9fans

> The mounts in my profile are the vanilla ones (the only customizations
> are for the network, the mouse, the keyboard). I do not play with the
> namespace.
>
> Have you an idea where to look to find what are the offending
> instructions? /boot(8)?

they're not offending!  not all files in a typical plan 9 namespace
make sense to du.  for example, if you're runing rio, it doesn't make
sense to add that to the file total.  also, if you have a disk in
/dev/sdXX/data, that file will be added to the total, as will any
partitions of that disk, etc.

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:01         ` tlaronde
  2012-01-13 16:16           ` erik quanstrom
@ 2012-01-13 16:17           ` David du Colombier
  2012-01-13 16:41             ` tlaronde
  1 sibling, 1 reply; 57+ messages in thread
From: David du Colombier @ 2012-01-13 16:17 UTC (permalink / raw)
  To: 9fans

> The mounts in my profile are the vanilla ones (the only customizations
> are for the network, the mouse, the keyboard). I do not play with the
> namespace.
>
> Have you an idea where to look to find what are the offending
> instructions? /boot(8)?

It's probably simply because /root is a recursive bind.

See /lib/namespace.

--
David du Colombier



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:16           ` erik quanstrom
@ 2012-01-13 16:34             ` tlaronde
  2012-01-13 16:42               ` David du Colombier
  2012-01-13 16:44               ` Vivien MOREAU
  0 siblings, 2 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 16:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 11:16:11AM -0500, erik quanstrom wrote:
>
> they're not offending!  not all files in a typical plan 9 namespace
> make sense to du.  for example, if you're runing rio, it doesn't make
> sense to add that to the file total.  also, if you have a disk in
> /dev/sdXX/data, that file will be added to the total, as will any
> partitions of that disk, etc.

This means that du(1) is for listing, but as far as size goes, the
"du -s" does not make a lot of sense?

On a side note. When using mount(8) without arguments on a typical Unix,
one can see what is mounted where. Is there some way to find the
"organization" of the namespace on Plan9? (What is mount'ed and what is
bind'ed?
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:17           ` David du Colombier
@ 2012-01-13 16:41             ` tlaronde
  2012-01-13 16:50               ` Charles Forsyth
                                 ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 16:41 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 05:17:34PM +0100, David du Colombier wrote:
> > The mounts in my profile are the vanilla ones (the only customizations
> > are for the network, the mouse, the keyboard). I do not play with the
> > namespace.
> >
> > Have you an idea where to look to find what are the offending
> > instructions? /boot(8)?
>
> It's probably simply because /root is a recursive bind.
>
> See /lib/namespace.

Yes, but reading "Getting Dot-Dot right" by Rob Pike, I thought that the
solution was to have, underneath, one uniq pathname for a file. Date(1)
can format UTC; whatever the user presentation, underneath there is only
the UTC. Namespace is a way to manage the nicknames, or the presentation
of data; to manage different views of the "real" thing, but underneath
there is an uniq pathname; a pathname finally resolved to something
(no infinite recursion).

So I'm wrong?
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:34             ` tlaronde
@ 2012-01-13 16:42               ` David du Colombier
  2012-01-13 16:44               ` Vivien MOREAU
  1 sibling, 0 replies; 57+ messages in thread
From: David du Colombier @ 2012-01-13 16:42 UTC (permalink / raw)
  To: 9fans

> On a side note. When using mount(8) without arguments on a typical
> Unix, one can see what is mounted where. Is there some way to find the
> "organization" of the namespace on Plan9? (What is mount'ed and what
> is bind'ed?

ns(1)

--
David du Colombier



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:34             ` tlaronde
  2012-01-13 16:42               ` David du Colombier
@ 2012-01-13 16:44               ` Vivien MOREAU
  2012-01-13 16:50                 ` tlaronde
  1 sibling, 1 reply; 57+ messages in thread
From: Vivien MOREAU @ 2012-01-13 16:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

tlaronde@polynum.com writes:

> On a side note. When using mount(8) without arguments on a typical
> Unix, one can see what is mounted where. Is there some way to find the
> "organization" of the namespace on Plan9? (What is mount'ed and what
> is bind'ed?

Sure... ns(1) :-)

--
Vivien



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:41             ` tlaronde
@ 2012-01-13 16:50               ` Charles Forsyth
  2012-01-13 17:05                 ` tlaronde
  2012-01-13 17:02               ` tlaronde
  2012-01-13 17:37               ` erik quanstrom
  2 siblings, 1 reply; 57+ messages in thread
From: Charles Forsyth @ 2012-01-13 16:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

The name space can contain loops. Du and a few others try to detect that,
using Qids, to avoid being annoying, but the loops are there.
Open (and chdir etc) do indeed record the name used to open the file, and
that helps
resolve the ".." problem (now done slightly differently from the paper, I
think, but I'm not certain),
but that name won't have loops because it's a finite string interpreted
from left to right.

You can easily build a looped space to test it:
 mkdir /tmp/y
 mkdir /tmp/y/z
 bind /tmp /tmp/y/z
# have fun

On 13 January 2012 16:41, <tlaronde@polynum.com> wrote:

> So I'm wrong?
>

[-- Attachment #2: Type: text/html, Size: 1074 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:44               ` Vivien MOREAU
@ 2012-01-13 16:50                 ` tlaronde
  0 siblings, 0 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 16:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 05:44:36PM +0100, Vivien MOREAU wrote:
> tlaronde@polynum.com writes:
>
> > On a side note. When using mount(8) without arguments on a typical
> > Unix, one can see what is mounted where. Is there some way to find the
> > "organization" of the namespace on Plan9? (What is mount'ed and what
> > is bind'ed?
>
> Sure... ns(1) :-)

Missed this one... Thanks!
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:41             ` tlaronde
  2012-01-13 16:50               ` Charles Forsyth
@ 2012-01-13 17:02               ` tlaronde
  2012-01-13 17:11                 ` Charles Forsyth
  2012-01-13 17:24                 ` Nicolas Bercher
  2012-01-13 17:37               ` erik quanstrom
  2 siblings, 2 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 17:02 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 05:41:01PM +0100, tlaronde@polynum.com wrote:
> >
> > See /lib/namespace.
>
> Yes, but reading "Getting Dot-Dot right" by Rob Pike, I thought that the
> solution was to have, underneath, one uniq pathname for a file. Date(1)
> can format UTC; whatever the user presentation, underneath there is only
> the UTC. Namespace is a way to manage the nicknames, or the presentation
> of data; to manage different views of the "real" thing, but underneath
> there is an uniq pathname; a pathname finally resolved to something
> (no infinite recursion).

Answering to myself: du(1) -s make the sum of each entry it has printed.
If the entries are repeated (because of multiple binds), it appears in
the sum.

So du(1) does what it says; the sum.

I never thought that perhaps, under Unices, du(1) with hard links will
produce the same misleading result...
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:50               ` Charles Forsyth
@ 2012-01-13 17:05                 ` tlaronde
  0 siblings, 0 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 17:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 04:50:34PM +0000, Charles Forsyth wrote:
>[...]
> You can easily build a looped space to test it:
>  mkdir /tmp/y
>  mkdir /tmp/y/z
>  bind /tmp /tmp/y/z
> # have fun
>

Since this is by error'ing that one learns, I learned a lot today!
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 17:02               ` tlaronde
@ 2012-01-13 17:11                 ` Charles Forsyth
  2012-01-13 17:24                 ` Nicolas Bercher
  1 sibling, 0 replies; 57+ messages in thread
From: Charles Forsyth @ 2012-01-13 17:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 289 bytes --]

It was a long time ago, but I think some versions of du used dev/ino to
avoid counting the same file twice.

On 13 January 2012 17:02, <tlaronde@polynum.com> wrote:

> I never thought that perhaps, under Unices, du(1) with hard links will
> produce the same misleading result...
>

[-- Attachment #2: Type: text/html, Size: 541 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 17:02               ` tlaronde
  2012-01-13 17:11                 ` Charles Forsyth
@ 2012-01-13 17:24                 ` Nicolas Bercher
  2012-01-13 17:44                   ` tlaronde
  1 sibling, 1 reply; 57+ messages in thread
From: Nicolas Bercher @ 2012-01-13 17:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

tlaronde@polynum.com a écrit :
> I never thought that perhaps, under Unices, du(1) with hard links will
> produce the same misleading result...

And fortunately, Unices 'du' handles this correctly!
(-l option toggles the counting of hardlinked files several times or
not)

Nicolas



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 16:41             ` tlaronde
  2012-01-13 16:50               ` Charles Forsyth
  2012-01-13 17:02               ` tlaronde
@ 2012-01-13 17:37               ` erik quanstrom
  2012-01-13 17:58                 ` tlaronde
  2 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 17:37 UTC (permalink / raw)
  To: 9fans

> the UTC. Namespace is a way to manage the nicknames, or the presentation
> of data; to manage different views of the "real" thing, but underneath
> there is an uniq pathname; a pathname finally resolved to something
> (no infinite recursion).

what "real" thing?  from the perspective of a user program,
neglecting #, every file access is through the namespace.

each file server has a unique path name, called the qid (as charles
mentioned).  but between (instances of) file servers, qids are not unique.
in general, the problem i think is hard.  but fortunately most reasonable
questions of unique files can be answered straightforwardly.

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 17:24                 ` Nicolas Bercher
@ 2012-01-13 17:44                   ` tlaronde
  0 siblings, 0 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 17:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 06:24:39PM +0100, Nicolas Bercher wrote:
> tlaronde@polynum.com a écrit :
> >I never thought that perhaps, under Unices, du(1) with hard links will
> >produce the same misleading result...
> 
> And fortunately, Unices 'du' handles this correctly!
> (-l option toggles the counting of hardlinked files several times or
> not)

But one could argue that in this case the only possibility would be 
du(1) with some flag producing the grand "true" total without a 
detailed listing.

Because, if it is not the case, whether another instance of an already 
seen hardlink is not printed in the listing (but this is arbitrary); 
whether the sum is incorrect (in the meaning not the sum of the sizes
displayed) ;)

The devil is in the details.
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 17:37               ` erik quanstrom
@ 2012-01-13 17:58                 ` tlaronde
  2012-01-13 18:14                   ` erik quanstrom
                                     ` (3 more replies)
  0 siblings, 4 replies; 57+ messages in thread
From: tlaronde @ 2012-01-13 17:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jan 13, 2012 at 12:37:17PM -0500, erik quanstrom wrote:
>[...]
> each file server has a unique path name, called the qid (as charles
> mentioned).  but between (instances of) file servers, qids are not unique.

But a "fully qualified qid", I mean, at this very moment, for the
kernel, a resource is some qid served by some server. So (srv,qid) is
an uniq identifier, even if only in a local context.

We are a lot to call ourselves: "I" or "me". But in the context, this is
an uniq identifier (because all other mees are not me!).

For this, IP has found an elegant solution. There are identifiers that
are only local. As long as there is no interconnexion, the identifiers
are not absolutely uniq, but relatively uniq. And this is sufficient.

But I realize that the problem is hard. And that all in all, the correct
information is available from the file servers, and that when the
namespace is concerned, we have all access potentially to huge
resources; so by the nature of interconnexions, the answer is fuzzy.

I will not exchange the distributed nature of Plan9; and the namespace;
and the everything is a file etc. against the ability to have du(1)
telling me "acurately" what is stored here and only here (since I have
other means to know with the console).

But this was obviously not clear for me till now!
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 17:58                 ` tlaronde
@ 2012-01-13 18:14                   ` erik quanstrom
  2012-01-13 21:00                     ` Yaroslav
       [not found]                   ` <CAG3N4d8c56DRSbt30k3EkgnyvrPSLbFkWH-kKapm7CVmKsu9og@mail.gmail.c>
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 18:14 UTC (permalink / raw)
  To: 9fans

> > each file server has a unique path name, called the qid (as charles
> > mentioned).  but between (instances of) file servers, qids are not unique.

/dev/pid has a different size depending on what your pid happens to be.  so
i think your statement is still too strong.

> I will not exchange the distributed nature of Plan9; and the namespace;
> and the everything is a file etc. against the ability to have du(1)
> telling me "acurately" what is stored here and only here (since I have 
> other means to know with the console).

taking the original problem — "how much disk space am i using", i think
you're in a better position than a unix user would be.  you can always
mount the fileserver serving the on-disk files from / someplace unique
and get an accurate count from there.  you're right that that's not a
general solution.  but then again, you have a specific question to which
there's an easy (if specific) answer.

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 18:14                   ` erik quanstrom
@ 2012-01-13 21:00                     ` Yaroslav
  2012-01-13 22:14                       ` Charles Forsyth
  0 siblings, 1 reply; 57+ messages in thread
From: Yaroslav @ 2012-01-13 21:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

please note that the sum du returns may be bigger than the actual
storage used anyway - think deduping and compression done at venti
level.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                   ` <CAG3N4d8c56DRSbt30k3EkgnyvrPSLbFkWH-kKapm7CVmKsu9og@mail.gmail.c>
@ 2012-01-13 21:02                     ` erik quanstrom
  0 siblings, 0 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 21:02 UTC (permalink / raw)
  To: 9fans

On Fri Jan 13 16:01:35 EST 2012, yarikos@gmail.com wrote:
> please note that the sum du returns may be bigger than the actual
> storage used anyway - think deduping and compression done at venti
> level.

not everyone has a venti.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 21:00                     ` Yaroslav
@ 2012-01-13 22:14                       ` Charles Forsyth
  0 siblings, 0 replies; 57+ messages in thread
From: Charles Forsyth @ 2012-01-13 22:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 246 bytes --]

that's very true. i rely on that quite a bit, generating new copies
frequently, expecting that it won't consume much more space.

On 13 January 2012 21:00, Yaroslav <yarikos@gmail.com> wrote:

> think deduping and compression done at venti

[-- Attachment #2: Type: text/html, Size: 476 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                   ` <CAOw7k5hU=F2tynnFHtoz=AJ=HiFq2oLYhz4Rg-QgM+rv_gu5Ow@mail.gmail.c>
@ 2012-01-13 22:17                     ` erik quanstrom
  2012-01-13 23:10                       ` Aram Hăvărneanu
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-13 22:17 UTC (permalink / raw)
  To: 9fans

On Fri Jan 13 17:16:01 EST 2012, charles.forsyth@gmail.com wrote:

> that's very true. i rely on that quite a bit, generating new copies
> frequently, expecting that it won't consume much more space.

an extra copy or 100 of the distribution
will be <1% of a new hard drive, even with
no de-dup.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 22:17                     ` erik quanstrom
@ 2012-01-13 23:10                       ` Aram Hăvărneanu
  2012-01-13 23:14                         ` Francisco J Ballesteros
  2012-01-13 23:24                         ` cinap_lenrek
  0 siblings, 2 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-13 23:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

erik quanstrom wrote:
> an extra copy or 100 of the distribution
> will be <1% of a new hard drive, even with
> no de-dup.

Sure, but there's other data than that. I do music, as a hobby.  A
project for an electronic track can have 20GB because everything I use
is "statically linked" into it.  Doing it this way has all the
advantages static linking for binaries has.

When your tracks have 20GB but 90% the data is shared, and you keep
full history for your track, dedup becomes invaluable.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 23:10                       ` Aram Hăvărneanu
@ 2012-01-13 23:14                         ` Francisco J Ballesteros
  2012-01-13 23:23                           ` Aram Hăvărneanu
  2012-01-14  0:30                           ` Bakul Shah
  2012-01-13 23:24                         ` cinap_lenrek
  1 sibling, 2 replies; 57+ messages in thread
From: Francisco J Ballesteros @ 2012-01-13 23:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

but if you insert extra music in front of your track dedup in venti won't help.
or would it?

On Sat, Jan 14, 2012 at 12:10 AM, Aram Hăvărneanu <aram.h@mgk.ro> wrote:
> erik quanstrom wrote:
>> an extra copy or 100 of the distribution
>> will be <1% of a new hard drive, even with
>> no de-dup.
>
> Sure, but there's other data than that. I do music, as a hobby.  A
> project for an electronic track can have 20GB because everything I use
> is "statically linked" into it.  Doing it this way has all the
> advantages static linking for binaries has.
>
> When your tracks have 20GB but 90% the data is shared, and you keep
> full history for your track, dedup becomes invaluable.
>
> --
> Aram Hăvărneanu
>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 23:14                         ` Francisco J Ballesteros
@ 2012-01-13 23:23                           ` Aram Hăvărneanu
  2012-01-14  0:30                           ` Bakul Shah
  1 sibling, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-13 23:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, Jan 14, 2012 at 12:14 AM, Francisco J Ballesteros <nemo@lsub.org> wrote:
> but if you insert extra music in front of your track dedup in venti won't help.
> or would it?

It wouldn't. In practice it seems that it usually appends, probably
for performance reasons, so for me it had worked so far absolutely
great.

-- 
Aram Hăvărneanu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 23:10                       ` Aram Hăvărneanu
  2012-01-13 23:14                         ` Francisco J Ballesteros
@ 2012-01-13 23:24                         ` cinap_lenrek
  1 sibling, 0 replies; 57+ messages in thread
From: cinap_lenrek @ 2012-01-13 23:24 UTC (permalink / raw)
  To: 9fans

dedubstep!

--
cinap



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-13 23:14                         ` Francisco J Ballesteros
  2012-01-13 23:23                           ` Aram Hăvărneanu
@ 2012-01-14  0:30                           ` Bakul Shah
  2012-01-14  1:01                             ` dexen deVries
  1 sibling, 1 reply; 57+ messages in thread
From: Bakul Shah @ 2012-01-14  0:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 14 Jan 2012 00:14:25 +0100 Francisco J Ballesteros <nemo@lsub.org>  wrote:
> but if you insert extra music in front of your track dedup in venti won't help.
> or would it?

No. Venti operates at block level.

You are better off using an SCM like mercurial (though commits
are likely to be slow).  In case you were wondering, the
mercurial repo format does seem to be `dedup' friendly as new
data is appended at the end.

$ du -sh .hg
100M	.hg
$ ls -l .hg/store/data/foo.d
-rw-r--r--  1 xxxxx  xxxxx  104857643 Jan 13 16:13 .hg/store/data/foo.d
$ cp .hg/store/data/foo.d xxx 	# save a copy of repo data for foo
$ echo 1 | cat - foo > bar && mv bar foo # prepend a couple of bytes to foo
$ hg commit -m'test4'
$ ls -l .hg/store/data/foo.d
-rw-r--r--  1 xxxxx  xxxxx  104857657 Jan 13 16:16 .hg/store/data/foo.d
$ cmp xxx .hg/store/data/foo.d		# compare old repo data with new
cmp: EOF on xxx
$ du -sh .hg
100M	.hg



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14  0:30                           ` Bakul Shah
@ 2012-01-14  1:01                             ` dexen deVries
  2012-01-14 13:26                               ` erik quanstrom
  2012-01-14 13:27                               ` erik quanstrom
  0 siblings, 2 replies; 57+ messages in thread
From: dexen deVries @ 2012-01-14  1:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Saturday 14 January 2012 01:30:32 Bakul Shah wrote:
> On Sat, 14 Jan 2012 00:14:25 +0100 Francisco J Ballesteros <nemo@lsub.org>  
wrote:
> > but if you insert extra music in front of your track dedup in venti won't
> > help. or would it?
> 
> No. Venti operates at block level.

there are two ways around it available:

0)
use of rolling-checksum enables decent block-level deduplication on files that 
are modified in the middle; some info:
http://svana.org/kleptog/rgzip.html
http://blog.kodekabuki.com/post/11135148692/rsync-internals

in short, a rolling checksum is used to find reasonable restart points; for 
us, block boundaries. probably could be overlayed over Venti; 
rollingchecksumfs anybody?

1)
Git uses diff-based format for long-term compacted storage, plus some gzip 
compression. i don't know specifics, but IIRC it's pretty much starndard diff.

it's fairly CPU- and memory-intensive on larger (10...120MB in my case) text 
files, but produces beautiful result:

i have a cronjob take dump of a dozen MySQL databases; each some 10...120MB of 
SQL (textual). each daily dump collection is committed into Git; the overall 
daily collection size grew from some 10MB two years ago to about 410MB today; 
over two years about 700 commits.

each dump differ slightly in content from yesterday's and the changes are 
scattered all over the files; it would not de-duplicate block-level too well.

yet the Git storage, after compaction (which takes a few minutes on a slow 
desktop), totals about 200MB, all the commits included. yep; less storage 
taken by two years' worth of Git storage than by one daily dump.

perhaps Git's current diff format would not handle binary files very well, but 
there are binary diffs available out there.

-- 
dexen deVries

> Gresham’s Law for Computing:
>   The Fast drives out the Slow even if the Fast is Wrong.

William Kahan in
http://www.cs.berkeley.edu/~wkahan/Stnfrd50.pdf

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                   ` <CAEAzY39VJhaWD03PruMoS2A+bCP62XDTdgob1hgtjp6qHtjdSA@mail.gmail.c>
@ 2012-01-14 13:07                     ` erik quanstrom
  0 siblings, 0 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 13:07 UTC (permalink / raw)
  To: 9fans

On Fri Jan 13 18:24:59 EST 2012, aram.h@mgk.ro wrote:
> On Sat, Jan 14, 2012 at 12:14 AM, Francisco J Ballesteros <nemo@lsub.org> wrote:
> > but if you insert extra music in front of your track dedup in venti won't help.
> > or would it?
>
> It wouldn't. In practice it seems that it usually appends, probably
> for performance reasons, so for me it had worked so far absolutely
> great.

ken's file server will work the same way for appends.  you won't get a new
copy of the whole file in the worm, just the additional blocks + a copy of
the last partial + a new copy of some metadata.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14  1:01                             ` dexen deVries
@ 2012-01-14 13:26                               ` erik quanstrom
  2012-01-14 15:00                                 ` hiro
  2012-01-14 13:27                               ` erik quanstrom
  1 sibling, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 13:26 UTC (permalink / raw)
  To: 9fans

> 0)
> use of rolling-checksum enables decent block-level deduplication on files that
> are modified in the middle; some info:
> http://svana.org/kleptog/rgzip.html
> http://blog.kodekabuki.com/post/11135148692/rsync-internals
>
> in short, a rolling checksum is used to find reasonable restart points; for
> us, block boundaries. probably could be overlayed over Venti;
> rollingchecksumfs anybody?
>
> 1)
> Git uses diff-based format for long-term compacted storage, plus some gzip
> compression. i don't know specifics, but IIRC it's pretty much starndard diff.
>
> it's fairly CPU- and memory-intensive on larger (10...120MB in my case) text
> files, but produces beautiful result:
>
> i have a cronjob take dump of a dozen MySQL databases; each some 10...120MB of
> SQL (textual). each daily dump collection is committed into Git; the overall
> daily collection size grew from some 10MB two years ago to about 410MB today;
> over two years about 700 commits.
>
> each dump differ slightly in content from yesterday's and the changes are
> scattered all over the files; it would not de-duplicate block-level too well.
>
> yet the Git storage, after compaction (which takes a few minutes on a slow
> desktop), totals about 200MB, all the commits included. yep; less storage
> taken by two years' worth of Git storage than by one daily dump.

given the fact that most disks are very large, and most people's non-media
storage requirements are very small, why is the compelling.

from what i've seen, people have the following requirements for storage:
1.  speed
2.  speed
3.  speed
4.  large caches.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14  1:01                             ` dexen deVries
  2012-01-14 13:26                               ` erik quanstrom
@ 2012-01-14 13:27                               ` erik quanstrom
  1 sibling, 0 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 13:27 UTC (permalink / raw)
  To: 9fans

> given the fact that most disks are very large, and most people's non-media
> storage requirements are very small, why is the compelling.

shoot. ready.  aim.  i ment, "why is this compelling?".  sorry.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 13:26                               ` erik quanstrom
@ 2012-01-14 15:00                                 ` hiro
  2012-01-14 15:06                                   ` Charles Forsyth
       [not found]                                   ` <CAOw7k5h2T+xuxbJhwTxPMOjG3K14KarrJPXFmH9EHdHJnXFpPA@mail.gmail.c>
  0 siblings, 2 replies; 57+ messages in thread
From: hiro @ 2012-01-14 15:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

venti is too big, buy bigger disks and forget venti.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 15:00                                 ` hiro
@ 2012-01-14 15:06                                   ` Charles Forsyth
       [not found]                                   ` <CAOw7k5h2T+xuxbJhwTxPMOjG3K14KarrJPXFmH9EHdHJnXFpPA@mail.gmail.c>
  1 sibling, 0 replies; 57+ messages in thread
From: Charles Forsyth @ 2012-01-14 15:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 410 bytes --]

Although drives are larger now, even SSDs, there is great satisfaction in
being able to make copies of large trees arbitrarily, without having to
worry about them adding any more than just the changed files to the
write-once stored set.
I do this fairly often during testing.

On 14 January 2012 15:00, hiro <23hiro@googlemail.com> wrote:

> venti is too big, buy bigger disks and forget venti.
>
>

[-- Attachment #2: Type: text/html, Size: 672 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                   ` <CAOw7k5h2T+xuxbJhwTxPMOjG3K14KarrJPXFmH9EHdHJnXFpPA@mail.gmail.c>
@ 2012-01-14 15:29                                     ` erik quanstrom
  2012-01-14 16:16                                       ` Aram Hăvărneanu
                                                         ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 15:29 UTC (permalink / raw)
  To: 9fans

On Sat Jan 14 10:07:25 EST 2012, charles.forsyth@gmail.com wrote:

> Although drives are larger now, even SSDs, there is great satisfaction in
> being able to make copies of large trees arbitrarily, without having to
> worry about them adding any more than just the changed files to the
> write-once stored set.
> I do this fairly often during testing.

(as an aside, one assumes changed files + directory tree as the
a/mtimes are changed.)

such satisfaction is not denyable, but is it a good tradeoff?

my /sys/src is 191mb.  i can make a complete copy and never deleted
it every day for the next 2739 years (give or take :)) and not run out
of disk space.  (most of the copies i make are deleted before they are
commited to the worm.)

i think it would be fair to argue that source and executables are
negligeable users of storage.  media files, which are already compressed,
tend to dominate.

the tradeoff for this compression is a large amount of memory,
fragmentation, and cpu usage.  that is to say, storage latency.

so i wonder if we're not spending all our resources trying to optimize
only a few percent of our storage needs.

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 15:29                                     ` erik quanstrom
@ 2012-01-14 16:16                                       ` Aram Hăvărneanu
       [not found]                                       ` <CAEAzY3_9jpi6j-C1u87OKaEazajOBwkvbEBdO5f1eUJysJbH1A@mail.gmail.c>
  2012-01-14 18:39                                       ` Charles Forsyth
  2 siblings, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-14 16:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

erik quanstrom wrote:
> i think it would be fair to argue that source and executables are
> negligeable users of storage.  media files, which are already compressed,
> tend to dominate.

What about virtual machine images?

> the tradeoff for this compression is a large amount of memory,
> fragmentation, and cpu usage.  that is to say, storage latency.

I have 24GB RAM. My primary laptops have 8GB RAM. I have all this RAM
not because of dedup but because I do memory intensive tasks, like
running virtual machines. I believe this is true for many users.

I'm of a completely different opinion regarding fragmentation. On
SSDs, it's a non issue. Historically, one of the hardest things to do
right in a filesystem was minimizing fragmentation. Today you don't
have to do it so there's less complexity to manage in the file system.
Even if you still have rotating rust to store the bulk of the data, a
small SSD cache in front of it renders fragmentation irrelevant.

My CPU can SHA-1 hash orders of magnitude faster than it can read from
disk, and that's using only generic instructions, plus, it's sitting
idle anyway.

> so i wonder if we're not spending all our resources trying to optimize
> only a few percent of our storage needs.

Dedup is certainly not a panacea, but it's certainly useful for many workloads.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                       ` <CAEAzY3_9jpi6j-C1u87OKaEazajOBwkvbEBdO5f1eUJysJbH1A@mail.gmail.c>
@ 2012-01-14 16:32                                         ` erik quanstrom
  2012-01-14 18:01                                           ` Aram Hăvărneanu
                                                             ` (5 more replies)
  0 siblings, 6 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 16:32 UTC (permalink / raw)
  To: 9fans

> What about virtual machine images?
> 
> > the tradeoff for this compression is a large amount of memory,
> > fragmentation, and cpu usage.  that is to say, storage latency.
> 
> I have 24GB RAM. My primary laptops have 8GB RAM. I have all this RAM
> not because of dedup but because I do memory intensive tasks, like
> running virtual machines. I believe this is true for many users.

russ posted some notes how how much memory and disk bandwidth are
required to write at a constant b/w of Xmb/s to venti.  venti requires
enormous resources to perform this capability.

also, 24gb isn't really much storage.  that's 1000 vm images/disk, assuming
that you store the regions with all zeros.

one thing to note is that we're silently comparing block (ish) storage (venti)
to file systems.  this isn't really a useful comparison.  i don't know of many
folks who store big disk images on file systems.

we have some customers who do do this, and they use the vsx to clone
a base vm image.  there's no de-dup, but only the change extents get
stored.

> I'm of a completely different opinion regarding fragmentation. On
> SSDs, it's a non issue. 

that's not correct.  a very good ssd will do only about 10,000 r/w random
iops.  (certainly they show better numbers for the easy case of compressable
100% write work loads.)  that's less than 40mb/s.  on the other hand, a good ssd will do
about 10x, if eading sequentially.

> My CPU can SHA-1 hash orders of magnitude faster than it can read from
> disk, and that's using only generic instructions, plus, it's sitting
> idle anyway.

it's not clear to me that the sha-1 hash in venti has any real bearing on
venti's end performance.  do you have any data or references for this?

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 16:32                                         ` erik quanstrom
@ 2012-01-14 18:01                                           ` Aram Hăvărneanu
       [not found]                                           ` <CAEAzY39pUNCTs6kMYnYoukx3TH8OuhgcmhSF+nVW5jX0iTCYvA@mail.gmail.c>
                                                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-14 18:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> russ posted some notes how how much memory and disk bandwidth are
> required to write at a constant b/w of Xmb/s to venti.  venti requires
> enormous resources to perform this capability.

Maybe, I was talking generally about the concept of
content-addressable storage, not venti in particular. I believe it's
possible to do CAS without a major performance hit, look at ZFS, for
example.

> one thing to note is that we're silently comparing block (ish) storage (venti)
> to file systems.  this isn't really a useful comparison.  i don't know of many
> folks who store big disk images on file systems.

But many want to back up these images somewhere, and venti makes a
good candidate.

In my experience, a machine serving iSCSI or AoE to VMs running on
different machines is pretty common, and iSCSI or AoE is often done in
software, sometimes using big files on a local file system.  I don't
know any other way to do it in Linux, if you export block storage
directly, you lose a lot of flexibility.

On Solaris, ZFS takes a different approach, you can ask ZFS to give
you a virtual LUN bypassing the VFS completely.

>> I'm of a completely different opinion regarding fragmentation. On
>> SSDs, it's a non issue.
>
> that's not correct.  a very good ssd will do only about 10,000 r/w random
> iops.  (certainly they show better numbers for the easy case of compressable
> 100% write work loads.)  that's less than 40mb/s.  on the other hand, a good ssd will do
> about 10x, if eading sequentially.

Sure, but 1,000 iops gives you only a 10% performance hit. With
rotating rust 10 iops give you the same 10% hit, two orders of
magnitude difference. In my experience, even if you are ignoring the
fragmentation issue completely, your files will be less than 100 times
more fragmented compared with a traditional filesystem so your system
overall will be less affected by fragmentation.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 15:29                                     ` erik quanstrom
  2012-01-14 16:16                                       ` Aram Hăvărneanu
       [not found]                                       ` <CAEAzY3_9jpi6j-C1u87OKaEazajOBwkvbEBdO5f1eUJysJbH1A@mail.gmail.c>
@ 2012-01-14 18:39                                       ` Charles Forsyth
  2 siblings, 0 replies; 57+ messages in thread
From: Charles Forsyth @ 2012-01-14 18:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

That only affects the directories, which are tiny, not the files.

On 14 January 2012 15:29, erik quanstrom <quanstro@quanstro.net> wrote:

> (as an aside, one assumes changed files + directory tree as the
> a/mtimes are changed.)
>

[-- Attachment #2: Type: text/html, Size: 486 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                           ` <CAEAzY39pUNCTs6kMYnYoukx3TH8OuhgcmhSF+nVW5jX0iTCYvA@mail.gmail.c>
@ 2012-01-14 20:43                                             ` erik quanstrom
  2012-01-14 21:39                                               ` Aram Hăvărneanu
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 20:43 UTC (permalink / raw)
  To: 9fans

> Maybe, I was talking generally about the concept of
> content-addressable storage, not venti in particular. I believe it's
> possible to do CAS without a major performance hit, look at ZFS, for
> example.

do you have any reference to ZFS being content-addressed storage?

> >> I'm of a completely different opinion regarding fragmentation. On
> >> SSDs, it's a non issue.
> >
> > that's not correct.  a very good ssd will do only about 10,000 r/w random
> > iops.  (certainly they show better numbers for the easy case of compressable
> > 100% write work loads.)  that's less than 40mb/s.  on the other hand, a good ssd will do
> > about 10x, if eading sequentially.
> 
> Sure, but 1,000 iops gives you only a 10% performance hit. With
> rotating rust 10 iops give you the same 10% hit, two orders of
> magnitude difference. In my experience, even if you are ignoring the
> fragmentation issue completely, your files will be less than 100 times
> more fragmented compared with a traditional filesystem so your system
> overall will be less affected by fragmentation.

your claim was, random access is free on ssds.  and i don't see how these
numbers bolster your claim at all.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 20:43                                             ` erik quanstrom
@ 2012-01-14 21:39                                               ` Aram Hăvărneanu
  0 siblings, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-14 21:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> do you have any reference to ZFS being content-addressed storage?

It's not purely content-addressed storage, but it implements
deduplication in the same way venti does:
http://blogs.oracle.com/bonwick/entry/zfs_dedup (the blog post offers
only a high level overview, you have to dig into the code to see the
implementation).

> your claim was, random access is free on ssds.  and i don't see how these
> numbers bolster your claim at all.

Maybe my wording wasn't the best, so I'll try again.  SSDs can do
100times more iops than HDDs for a given performance hit ratio.  In my
experience, ignoring fragmentation completely leads to average
fragmentation being significantly less than 100 times worse compared
to a traditional filesystem that tries hard to avoid it. In practice,
I've found that a fragmented filesystem on a SSD performs at worst 10%
behind the non-fragmented best case scenario. I'd trade 10%
performance for significantly simpler code anytime.

The phrase above is ignoring caching.  I've ran ZFS for many years
without a SSD and I haven't noticed the fragmentation because of very
aggressive caching (see the ARC algorithm).

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                           ` <CAEAzY3842EWp=WByCPAm7yGEK2h5b+1hkbwm5NoRPTH_2F5CVA@mail.gmail.c>
@ 2012-01-14 21:54                                             ` erik quanstrom
  2012-01-14 22:11                                               ` Aram Hăvărneanu
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 21:54 UTC (permalink / raw)
  To: 9fans

On Sat Jan 14 16:40:46 EST 2012, aram.h@mgk.ro wrote:
> > do you have any reference to ZFS being content-addressed storage?
> 
> It's not purely content-addressed storage, but it implements
> deduplication in the same way venti does:
> http://blogs.oracle.com/bonwick/entry/zfs_dedup (the blog post offers
> only a high level overview, you have to dig into the code to see the
> implementation).

content addressed means given the content, you can generate the address.
this is NOT true of zfs at all.

> > your claim was, random access is free on ssds.  and i don't see how these
> > numbers bolster your claim at all.
> 
> Maybe my wording wasn't the best, so I'll try again.  SSDs can do
> 100times more iops than HDDs for a given performance hit ratio.  In my
> experience, ignoring fragmentation completely leads to average
> fragmentation being significantly less than 100 times worse compared
> to a traditional filesystem that tries hard to avoid it. In practice,
> I've found that a fragmented filesystem on a SSD performs at worst 10%
> behind the non-fragmented best case scenario. I'd trade 10%
> performance for significantly simpler code anytime.
> 
> The phrase above is ignoring caching.  I've ran ZFS for many years
> without a SSD and I haven't noticed the fragmentation because of very
> aggressive caching (see the ARC algorithm).

you keep changing the subject.  your original claim was that random
access is not slower than sequential access for ssds.  you haven't backed
this argument up.  the relative performance of ssds vs hard drives
and caching are completely irrelevant.

- erik



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 21:54                                             ` erik quanstrom
@ 2012-01-14 22:11                                               ` Aram Hăvărneanu
  0 siblings, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-14 22:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> content addressed means given the content, you can generate the address.
> this is NOT true of zfs at all.

How come? With venti, the address is the SHA-1 hash, with ZFS, you get
to chose the hash, but it can still be a hash.

> you keep changing the subject.  your original claim was that random
> access is not slower than sequential access for ssds.  you haven't backed
> this argument up.  the relative performance of ssds vs hard drives
> and caching are completely irrelevant.

My original claim was that fragmentation is a non issue if you have
SSDs.  I still claim this and I expanded on the context in my previous
post.  Of course that random I/O is slower than sequential I/O, SSD or
not, but in practice, filesystem fragmentation causes an amount or
random I/O much less than what a SSD can handle, so throughput in the
fragmented case is close to the throughput in the sequential case.

I don't think that caching is completely irrelevant.  If I have to
chose between a complex scheme that avoids fragmentation and a simple
caching scheme that renders it irrelevant for a particular workload,
I'll chose the caching scheme because it's simpler.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                           ` <CAEAzY39GD4QoTRf2S0Nd4rN+vSyG1tsKyP3rFWu5a0mFN=sH6w@mail.gmail.c>
@ 2012-01-14 22:42                                             ` erik quanstrom
  2012-01-14 23:03                                               ` Aram Hăvărneanu
  2012-01-14 23:32                                               ` Bakul Shah
  0 siblings, 2 replies; 57+ messages in thread
From: erik quanstrom @ 2012-01-14 22:42 UTC (permalink / raw)
  To: 9fans

On Sat Jan 14 17:12:49 EST 2012, aram.h@mgk.ro wrote:
> > content addressed means given the content, you can generate the address.
> > this is NOT true of zfs at all.
>
> How come? With venti, the address is the SHA-1 hash, with ZFS, you get
> to chose the hash, but it can still be a hash.

because in zfs the hash is not used as an address (lba).

> My original claim was that fragmentation is a non issue if you have
> SSDs.  I still claim this and I expanded on the context in my previous
> post.  Of course that random I/O is slower than sequential I/O, SSD or
> not, but in practice, filesystem fragmentation causes an amount or
> random I/O much less than what a SSD can handle, so throughput in the
> fragmented case is close to the throughput in the sequential case.
>
> I don't think that caching is completely irrelevant.  If I have to
> chose between a complex scheme that avoids fragmentation and a simple
> caching scheme that renders it irrelevant for a particular workload,
> I'll chose the caching scheme because it's simpler.

by all means, show us the numbers.  personally, i believe the mfgrs are not
lying when they say that random i/o yields 1/10th the performance (at best)
of sequential i/o.

since you assert that it is ssds that make random i/o a non issue, and
not caching, logically caching is not relevant to your point.

- erik

ps.  you're presenting a false choice between caching and fragmentation.
case in point, ken's fs doesn't fragment as much as venti (and does not
increase fragmentation over time) and yet it caches.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 22:42                                             ` erik quanstrom
@ 2012-01-14 23:03                                               ` Aram Hăvărneanu
  2012-01-14 23:32                                               ` Bakul Shah
  1 sibling, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-14 23:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> How come? With venti, the address is the SHA-1 hash, with ZFS, you get
>> to chose the hash, but it can still be a hash.
>
> because in zfs the hash is not used as an address (lba).

But by this definition neither is venti. In venti, the hash is
translated to a lba by the index cache. In ZFS, the hash is translated
to a lba by the DDT (dedup-table). Both the Venti index cache and the
DDT can be regenerated from the data if they become corrupted.

> by all means, show us the numbers.  personally, i believe the mfgrs are not
> lying when they say that random i/o yields 1/10th the performance (at best)
> of sequential i/o.

I will, tomorrow.

> since you assert that it is ssds that make random i/o a non issue, and
> not caching, logically caching is not relevant to your point.

I've been trying to claim two things at the same time, these two
things are unrelated. Both SSDs and caching alleviate fragmentation
issues, in different ways.

> ps.  you're presenting a false choice between caching and fragmentation.
> case in point, ken's fs doesn't fragment as much as venti (and does not
> increase fragmentation over time) and yet it caches.

If that's the impression I made, I'm sorry for the misunderstanding
but that's not what I wanted. Of course there's no choice between
caching and fragmentation. Every filesystem caches.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 22:42                                             ` erik quanstrom
  2012-01-14 23:03                                               ` Aram Hăvărneanu
@ 2012-01-14 23:32                                               ` Bakul Shah
  2012-01-14 23:45                                                 ` Aram Hăvărneanu
  1 sibling, 1 reply; 57+ messages in thread
From: Bakul Shah @ 2012-01-14 23:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 14 Jan 2012 17:42:12 EST erik quanstrom <quanstro@quanstro.net>  wrote:
> On Sat Jan 14 17:12:49 EST 2012, aram.h@mgk.ro wrote:
> > > content addressed means given the content, you can generate the address.
> > > this is NOT true of zfs at all.
> >
> > How come? With venti, the address is the SHA-1 hash, with ZFS, you get
> > to chose the hash, but it can still be a hash.
>
> because in zfs the hash is not used as an address (lba).

True.

> > My original claim was that fragmentation is a non issue if you have
> > SSDs.  I still claim this and I expanded on the context in my previous
> > post.  Of course that random I/O is slower than sequential I/O, SSD or
> > not, but in practice, filesystem fragmentation causes an amount or
> > random I/O much less than what a SSD can handle, so throughput in the
> > fragmented case is close to the throughput in the sequential case.
> >
> > I don't think that caching is completely irrelevant.  If I have to
> > chose between a complex scheme that avoids fragmentation and a simple
> > caching scheme that renders it irrelevant for a particular workload,
> > I'll chose the caching scheme because it's simpler.
>
> by all means, show us the numbers.  personally, i believe the mfgrs are not
> lying when they say that random i/o yields 1/10th the performance (at best)
> of sequential i/o.


Intel 320 300GB SSD numbers (for example):
seq read:  270MBps
rnd read:  39.5Kiops == 158Mbps @ 4KB
seq write: 205MBps
rnd write: 23.0Kiops == 92MBps  @ 4KB

SSDs don't have to contend with seek times but you have to pay
the erase cost (which can not be hidden in "backround GC" when
you are going full tilt).

For venti you'd pick 8k at least so the write throughput will
be higher that 92MBps (but not double). IIRC ZFS picks a much
larger block size so it suffers less here.

You have to check the numbers using the blocksizes relevant to
you.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-14 23:32                                               ` Bakul Shah
@ 2012-01-14 23:45                                                 ` Aram Hăvărneanu
  0 siblings, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-14 23:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> > How come? With venti, the address is the SHA-1 hash, with ZFS, you get
>> > to chose the hash, but it can still be a hash.
>>
>> because in zfs the hash is not used as an address (lba).
>
> True.

As I said, neither in Venti. The hash is translated to a lba by the
index, a table that can be recreated from the data if it's missing or
if it's corrupt. ZFS also uses an index table called DDT. It also has
the same properties as Venti's index, it can be created by reading the
data.

> Intel 320 300GB SSD numbers (for example):
> seq read:  270MBps
> rnd read:  39.5Kiops == 158Mbps @ 4KB
> seq write: 205MBps
> rnd write: 23.0Kiops == 92MBps  @ 4KB

Those are just random I/O stats, you are not interpreting them to see
what the penalty would be for some chosen fragmentation/block size.
I'm writing that interpretation tomorrow.

> IIRC ZFS picks a much
> larger block size so it suffers less here.

ZFS will try to use big blocks if it can, maximum 128kB right now, but
I don't see how this is relevant if you read random 4K logical blocks.

Please notice that both Venti and ZFS will write a big file
sequentially on the disk, that's (partially) what the index table/DDT
is for. In this case, the hash of sequential blocks within a file will
simply map (via the index) to sequential disk addresses.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                           ` <CAEAzY39ddvxRSwSP+Eh7kujJEs3nYh4kQegfuSisgdgB9qMQ4g@mail.gmail.c>
@ 2012-01-15 13:12                                             ` erik quanstrom
  2012-01-15 14:07                                               ` Aram Hăvărneanu
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-15 13:12 UTC (permalink / raw)
  To: 9fans

On Sat Jan 14 18:46:44 EST 2012, aram.h@mgk.ro wrote:
> >> > How come? With venti, the address is the SHA-1 hash, with ZFS, you get
> >> > to chose the hash, but it can still be a hash.
> >>
> >> because in zfs the hash is not used as an address (lba).
> >
> > True.
>
> As I said, neither in Venti. The hash is translated to a lba by the
> index, a table that can be recreated from the data if it's missing or
> if it's corrupt. ZFS also uses an index table called DDT. It also has
> the same properties as Venti's index, it can be created by reading the
> data.

you've confused the internal implementation
with the public programming interface.

venti IS content-addressed.  this is because in the programming interface,
one passes addresses data by its hash.  venti could internally store its bits
in the holes in swiss cheese and it would still be content addressed,
not cheese addressed.

on the other hand, zfs is not content addressed.  this is because, as
an iscsi target, zfs will be addressing data by block offset rather
than hash.  zfs could store its bits in venti, and it still would NOT be
content addressed nor would it be venti-addressed.

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-15 13:12                                             ` erik quanstrom
@ 2012-01-15 14:07                                               ` Aram Hăvărneanu
  0 siblings, 0 replies; 57+ messages in thread
From: Aram Hăvărneanu @ 2012-01-15 14:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> you've confused the internal implementation
> with the public programming interface.

The properties we're discussing here (dedup, fragmentation,
performance) are an artifact of the implementation, not of the
interface. Fossil+Venti would perform the same if Venti exported its
interface only to Fossil, and not to the whole world.

In ZFS terms, Fossil is akin to ZPL (ZFS POSIX layer), it implements
filesystem semantics over another layer, DMU for ZFS and Venti for
Plan9.

Please not that ZFS exports more than the filesystem interface,
there's also iSCSI (as you noticed), zfs send/recv (remarcably similar
to venti/write venti/read, even in their use of standard input/output
interface), even bits of the DMU are exported. Hell, even the VFS
concrete implementation is exported (though only in kernel mode), else
you could not write layered filesystems (not as nice and simple as
Plan9 bind(2) but akin to STREAMS filters at filesystem layer). You
know I've worked at writing the SMB/CIFS in-kernel filesystem that
sits on top of ZFS, right? Well, when you're doing this you have to be
careful around this content-addressed thing. I've used the interface
you claim it doesn't exist. It's not public? Why is this relevant to
the properties of the system?

> on the other hand, zfs is not content addressed.  this is because, as
> an iscsi target, zfs will be addressing data by block offset rather
> than hash.  zfs could store its bits in venti, and it still would NOT be
> content addressed nor would it be venti-addressed.

That's not fair at all, I could be claiming that venti is not content
addressed because you use hierarchical names to address data in a vac
archive. Vac is just a layer over venti, like ZPL and iSCSI is a layer
over DMU.

In this case the properties of the system depend on the properties of
the underlying layer, not of the interface, and this layer is content
addressed, even by your definition.

-- 
Aram Hăvărneanu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
       [not found]                                           ` <CAEAzY3-mPRW3jGYWbXeTk7Sc+nC6Xn1NhK2a=Tb2yKkm06EStQ@mail.gmail.c>
@ 2012-01-15 14:25                                             ` erik quanstrom
  2012-01-15 14:39                                               ` Charles Forsyth
  0 siblings, 1 reply; 57+ messages in thread
From: erik quanstrom @ 2012-01-15 14:25 UTC (permalink / raw)
  To: 9fans

> sits on top of ZFS, right? Well, when you're doing this you have to be
> careful around this content-addressed thing. I've used the interface
> you claim it doesn't exist. It's not public?  Why is this relevant to

well then, please provide a pointer if this is a public interface.

the reason why this is relevant is because we don't call ssds
fancy wafl-addressed storage (assuming that's how they do it),
because that's now the interface one gets.  we don't call
raid appliances raid-addressed storage (assuming they're using
raid), because that's not the interface presented.

- erik

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [9fans] fossil pb: FOUND!
  2012-01-15 14:25                                             ` erik quanstrom
@ 2012-01-15 14:39                                               ` Charles Forsyth
  0 siblings, 0 replies; 57+ messages in thread
From: Charles Forsyth @ 2012-01-15 14:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 382 bytes --]

On 15 January 2012 14:25, erik quanstrom <quanstro@quanstro.net> wrote:

> I've used the interface
> > you claim it doesn't exist. It's not public?  ...
>
> well then, please provide a pointer if this is a public interface.


I think he was saying you might not know about it because it isn't public,
although he's used it. "It's not public?" read with rising intonation?

[-- Attachment #2: Type: text/html, Size: 694 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2012-01-15 14:39 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-13 11:30 [9fans] fossil pb: a clue? tlaronde
2012-01-13 13:30 ` erik quanstrom
2012-01-13 13:38 ` [9fans] fossil pb: FOUND! tlaronde
2012-01-13 13:59   ` erik quanstrom
2012-01-13 14:08     ` tlaronde
2012-01-13 14:47       ` erik quanstrom
2012-01-13 16:01         ` tlaronde
2012-01-13 16:16           ` erik quanstrom
2012-01-13 16:34             ` tlaronde
2012-01-13 16:42               ` David du Colombier
2012-01-13 16:44               ` Vivien MOREAU
2012-01-13 16:50                 ` tlaronde
2012-01-13 16:17           ` David du Colombier
2012-01-13 16:41             ` tlaronde
2012-01-13 16:50               ` Charles Forsyth
2012-01-13 17:05                 ` tlaronde
2012-01-13 17:02               ` tlaronde
2012-01-13 17:11                 ` Charles Forsyth
2012-01-13 17:24                 ` Nicolas Bercher
2012-01-13 17:44                   ` tlaronde
2012-01-13 17:37               ` erik quanstrom
2012-01-13 17:58                 ` tlaronde
2012-01-13 18:14                   ` erik quanstrom
2012-01-13 21:00                     ` Yaroslav
2012-01-13 22:14                       ` Charles Forsyth
     [not found]                   ` <CAG3N4d8c56DRSbt30k3EkgnyvrPSLbFkWH-kKapm7CVmKsu9og@mail.gmail.c>
2012-01-13 21:02                     ` erik quanstrom
     [not found]                   ` <CAOw7k5hU=F2tynnFHtoz=AJ=HiFq2oLYhz4Rg-QgM+rv_gu5Ow@mail.gmail.c>
2012-01-13 22:17                     ` erik quanstrom
2012-01-13 23:10                       ` Aram Hăvărneanu
2012-01-13 23:14                         ` Francisco J Ballesteros
2012-01-13 23:23                           ` Aram Hăvărneanu
2012-01-14  0:30                           ` Bakul Shah
2012-01-14  1:01                             ` dexen deVries
2012-01-14 13:26                               ` erik quanstrom
2012-01-14 15:00                                 ` hiro
2012-01-14 15:06                                   ` Charles Forsyth
     [not found]                                   ` <CAOw7k5h2T+xuxbJhwTxPMOjG3K14KarrJPXFmH9EHdHJnXFpPA@mail.gmail.c>
2012-01-14 15:29                                     ` erik quanstrom
2012-01-14 16:16                                       ` Aram Hăvărneanu
     [not found]                                       ` <CAEAzY3_9jpi6j-C1u87OKaEazajOBwkvbEBdO5f1eUJysJbH1A@mail.gmail.c>
2012-01-14 16:32                                         ` erik quanstrom
2012-01-14 18:01                                           ` Aram Hăvărneanu
     [not found]                                           ` <CAEAzY39pUNCTs6kMYnYoukx3TH8OuhgcmhSF+nVW5jX0iTCYvA@mail.gmail.c>
2012-01-14 20:43                                             ` erik quanstrom
2012-01-14 21:39                                               ` Aram Hăvărneanu
     [not found]                                           ` <CAEAzY3842EWp=WByCPAm7yGEK2h5b+1hkbwm5NoRPTH_2F5CVA@mail.gmail.c>
2012-01-14 21:54                                             ` erik quanstrom
2012-01-14 22:11                                               ` Aram Hăvărneanu
     [not found]                                           ` <CAEAzY39GD4QoTRf2S0Nd4rN+vSyG1tsKyP3rFWu5a0mFN=sH6w@mail.gmail.c>
2012-01-14 22:42                                             ` erik quanstrom
2012-01-14 23:03                                               ` Aram Hăvărneanu
2012-01-14 23:32                                               ` Bakul Shah
2012-01-14 23:45                                                 ` Aram Hăvărneanu
     [not found]                                           ` <CAEAzY39ddvxRSwSP+Eh7kujJEs3nYh4kQegfuSisgdgB9qMQ4g@mail.gmail.c>
2012-01-15 13:12                                             ` erik quanstrom
2012-01-15 14:07                                               ` Aram Hăvărneanu
     [not found]                                           ` <CAEAzY3-mPRW3jGYWbXeTk7Sc+nC6Xn1NhK2a=Tb2yKkm06EStQ@mail.gmail.c>
2012-01-15 14:25                                             ` erik quanstrom
2012-01-15 14:39                                               ` Charles Forsyth
2012-01-14 18:39                                       ` Charles Forsyth
2012-01-14 13:27                               ` erik quanstrom
2012-01-13 23:24                         ` cinap_lenrek
     [not found]                   ` <CAEAzY39VJhaWD03PruMoS2A+bCP62XDTdgob1hgtjp6qHtjdSA@mail.gmail.c>
2012-01-14 13:07                     ` erik quanstrom
2012-01-13 13:59 ` [9fans] fossil pb: a clue? David du Colombier
2012-01-13 14:11   ` tlaronde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).