9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] FS to skip/put-together duplicate files
@ 2007-08-10 11:39 Enrico Weigelt
  2007-08-10 11:51 ` Gabriel Diaz
  2007-08-10 12:26 ` erik quanstrom
  0 siblings, 2 replies; 18+ messages in thread
From: Enrico Weigelt @ 2007-08-10 11:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


Hi folks,


I'm host a lot of web applications which share 99% of their code.
Disk space is not the issue, but bandwidth on remote backup. 
So my idea is to let an filesystem automatically link together
equal files in the storage, but present them as separate ones.
Once an file gets changed, it will be unlinked/copied automatically.

Is there already such an filesystem ?


thx
-- 
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service - http://www.metux.de/
---------------------------------------------------------------------
 Please visit the OpenSource QM Taskforce:
 	http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
	http://patches.metux.de/
---------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-10 11:39 [9fans] FS to skip/put-together duplicate files Enrico Weigelt
@ 2007-08-10 11:51 ` Gabriel Diaz
  2007-08-10 12:26 ` erik quanstrom
  1 sibling, 0 replies; 18+ messages in thread
From: Gabriel Diaz @ 2007-08-10 11:51 UTC (permalink / raw)
  To: weigelt, Fans of the OS Plan 9 from Bell Labs

hello

i think venti compression works at block level, so if the file
contents are the same you will have two files with references to the
same blocks, but read venti paper to be sure :)

slds.

gabi


On 8/10/07, Enrico Weigelt <weigelt@metux.de> wrote:
>
> Hi folks,
>
>
> I'm host a lot of web applications which share 99% of their code.
> Disk space is not the issue, but bandwidth on remote backup.
> So my idea is to let an filesystem automatically link together
> equal files in the storage, but present them as separate ones.
> Once an file gets changed, it will be unlinked/copied automatically.
>
> Is there already such an filesystem ?
>
>
> thx
> --
> ---------------------------------------------------------------------
>  Enrico Weigelt    ==   metux IT service - http://www.metux.de/
> ---------------------------------------------------------------------
>  Please visit the OpenSource QM Taskforce:
>         http://wiki.metux.de/public/OpenSource_QM_Taskforce
>  Patches / Fixes for a lot dozens of packages in dozens of versions:
>         http://patches.metux.de/
> ---------------------------------------------------------------------
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-10 11:39 [9fans] FS to skip/put-together duplicate files Enrico Weigelt
  2007-08-10 11:51 ` Gabriel Diaz
@ 2007-08-10 12:26 ` erik quanstrom
  2007-08-13  3:32   ` YAMANASHI Takeshi
  1 sibling, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2007-08-10 12:26 UTC (permalink / raw)
  To: weigelt, 9fans

> I'm host a lot of web applications which share 99% of their code.
> Disk space is not the issue, but bandwidth on remote backup. 
> So my idea is to let an filesystem automatically link together
> equal files in the storage, but present them as separate ones.
> Once an file gets changed, it will be unlinked/copied automatically.
> 
> Is there already such an filesystem ?

no.

however there are updatedb/compactdb which can be used to
create a list of changed files and replica/applylog which can
be used to apply them.

i used these tools to copy history from one kenfs to a new one.
i actually used cphist (/n/sources/patch/saved/cphist) and not
applylog.

you could also use the log on the generating machine to build
a mkfs archive and compress that, ftp it and apply it on the
other end.

- erik


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-10 12:26 ` erik quanstrom
@ 2007-08-13  3:32   ` YAMANASHI Takeshi
  2007-08-13 12:01     ` erik quanstrom
  2007-08-14 13:16     ` erik quanstrom
  0 siblings, 2 replies; 18+ messages in thread
From: YAMANASHI Takeshi @ 2007-08-13  3:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

how about mounting venti-backed fossil files from a linux as an AoE drives?
vblade on sources exports the plan 9 file and the aoe driver for linux
does the mounting.

Venti would be compressing and condensing duplicated blocks to a single block.
I'm not sure if same files are boundaried in same manner to each other though.


On 8/10/07, erik quanstrom <quanstro@quanstro.net> wrote:
> > I'm host a lot of web applications which share 99% of their code.
> > Disk space is not the issue, but bandwidth on remote backup.
> > So my idea is to let an filesystem automatically link together
> > equal files in the storage, but present them as separate ones.
> > Once an file gets changed, it will be unlinked/copied automatically.
> >
> > Is there already such an filesystem ?
>
> no.
>
> however there are updatedb/compactdb which can be used to
> create a list of changed files and replica/applylog which can
> be used to apply them.
>
> i used these tools to copy history from one kenfs to a new one.
> i actually used cphist (/n/sources/patch/saved/cphist) and not
> applylog.
>
> you could also use the log on the generating machine to build
> a mkfs archive and compress that, ftp it and apply it on the
> other end.
>
> - erik
>


-- 
YAMANASHI Takeshi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-13  3:32   ` YAMANASHI Takeshi
@ 2007-08-13 12:01     ` erik quanstrom
  2007-08-13 13:43       ` Francisco J Ballesteros
  2007-08-14 13:16     ` erik quanstrom
  1 sibling, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2007-08-13 12:01 UTC (permalink / raw)
  To: 9fans

On Sun Aug 12 23:35:45 EDT 2007, 9.nashi@gmail.com wrote:
> how about mounting venti-backed fossil files from a linux as an AoE drives?
> vblade on sources exports the plan 9 file and the aoe driver for linux
> does the mounting.
> 
> Venti would be compressing and condensing duplicated blocks to a single block.
> I'm not sure if same files are boundaried in same manner to each other though.

we are going to do this with kenfs and aoe.  our main filesystem is going to look
something like

	cm0f{(m1m2m3)e99.0e100.1}

where e99.0 will be a local shelf and 100.1 will be remote.  there is no compression,
but only changed blocks are dumped and kenfs doesn't really care how long the
dump takes.  

- erik


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-13 12:01     ` erik quanstrom
@ 2007-08-13 13:43       ` Francisco J Ballesteros
  2007-08-13 13:52         ` erik quanstrom
  0 siblings, 1 reply; 18+ messages in thread
From: Francisco J Ballesteros @ 2007-08-13 13:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

We use venti-fossil on Coraid´s SR aoe drives just fine.
The frontend is a separate Plan 9 machine that uses fs(3) to partition the aoe
drives (which are raid-1 lblades). It works great.


On 8/13/07, erik quanstrom <quanstro@quanstro.net> wrote:
> On Sun Aug 12 23:35:45 EDT 2007, 9.nashi@gmail.com wrote:
> > how about mounting venti-backed fossil files from a linux as an AoE drives?
> > vblade on sources exports the plan 9 file and the aoe driver for linux
> > does the mounting.
> >
> > Venti would be compressing and condensing duplicated blocks to a single block.
> > I'm not sure if same files are boundaried in same manner to each other though.
>
> we are going to do this with kenfs and aoe.  our main filesystem is going to look
> something like
>
>         cm0f{(m1m2m3)e99.0e100.1}
>
> where e99.0 will be a local shelf and 100.1 will be remote.  there is no compression,
> but only changed blocks are dumped and kenfs doesn't really care how long the
> dump takes.
>
> - erik
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-13 13:43       ` Francisco J Ballesteros
@ 2007-08-13 13:52         ` erik quanstrom
  2007-08-13 14:40           ` Francisco J Ballesteros
  0 siblings, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2007-08-13 13:52 UTC (permalink / raw)
  To: 9fans

if you're setting up a new venti+fossil+aoe fs, i would recommend using
sdaoe.

(i don't recommend fidding with something that's already working, though.)

- erik

> We use venti-fossil on Coraid´s SR aoe drives just fine.
> The frontend is a separate Plan 9 machine that uses fs(3) to partition the aoe
> drives (which are raid-1 lblades). It works great.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-13 13:52         ` erik quanstrom
@ 2007-08-13 14:40           ` Francisco J Ballesteros
  0 siblings, 0 replies; 18+ messages in thread
From: Francisco J Ballesteros @ 2007-08-13 14:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Ours is working from the day after we got the SR.
However, I'll try sdaoe just for fun, for a while, and then will go back to
our "production" scheme.

thanks for the hint

On 8/13/07, erik quanstrom <quanstro@coraid.com> wrote:
> if you're setting up a new venti+fossil+aoe fs, i would recommend using
> sdaoe.
>
> (i don't recommend fidding with something that's already working, though.)
>
> - erik
>
> > We use venti-fossil on Coraid´s SR aoe drives just fine.
> > The frontend is a separate Plan 9 machine that uses fs(3) to partition the aoe
> > drives (which are raid-1 lblades). It works great.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-13  3:32   ` YAMANASHI Takeshi
  2007-08-13 12:01     ` erik quanstrom
@ 2007-08-14 13:16     ` erik quanstrom
  2007-08-14 13:57       ` Steve Simon
  2007-08-14 16:18       ` Uriel
  1 sibling, 2 replies; 18+ messages in thread
From: erik quanstrom @ 2007-08-14 13:16 UTC (permalink / raw)
  To: 9fans

> how about mounting venti-backed fossil files from a linux as an AoE drives?
> vblade on sources exports the plan 9 file and the aoe driver for linux
> does the mounting.
> 
> Venti would be compressing and condensing duplicated blocks to a single block.
> I'm not sure if same files are boundaried in same manner to each other though.
> 

this is always cited as the "killer functionality" of venti.  essentially it trades cpu time
for disk space.  however, the couple of times where a concrete venti solution was
discussed (seperating attachments into seperate files, e.g. for de-duping), it was
deemed to be slower because attachments need to be split out to be recognized
as the same.  (9fans.net/archive/2005/10 and 9fans.net/archive/2005/11/1).

i wonder about this today with such large disks.

does anyone have an example of a case where compression and uniquing are required?

- erik


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-14 13:16     ` erik quanstrom
@ 2007-08-14 13:57       ` Steve Simon
  2007-08-14 14:13         ` erik quanstrom
  2007-08-14 16:18       ` Uriel
  1 sibling, 1 reply; 18+ messages in thread
From: Steve Simon @ 2007-08-14 13:57 UTC (permalink / raw)
  To: 9fans

> does anyone have an example of a case where compression and uniquing are required?

the compression is nice to have of course but the uniqing is very
neat. I have always though of it as plan9's answer to CSV et al.

When you do a release of a software package you copy the files to
a new directory with the name of the release (the equivilent of 
tagging your release in CVS) - and continue working. this tne takes
up the space for the directory entries and all releases are always
available. branching is trivial (dircp) only a pretty merge tool
is missing - I have diff3 from edition7 in my contrib area
but some sort of interactive differencting GUI tool would be very
hand somtimes.

have I drifted off topic I wonder...

-Steve


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-14 13:57       ` Steve Simon
@ 2007-08-14 14:13         ` erik quanstrom
  2007-08-14 15:02           ` David Leimbach
  0 siblings, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2007-08-14 14:13 UTC (permalink / raw)
  To: 9fans

> > does anyone have an example of a case where compression and uniquing are required?
> 
> the compression is nice to have of course but the uniqing is very
> neat. I have always though of it as plan9's answer to CSV et al.
> 
> When you do a release of a software package you copy the files to
> a new directory with the name of the release (the equivilent of 
> tagging your release in CVS) - and continue working. this tne takes
> up the space for the directory entries and all releases are always
> available. branching is trivial (dircp) only a pretty merge tool
> is missing - I have diff3 from edition7 in my contrib area
> but some sort of interactive differencting GUI tool would be very
> hand somtimes.
> 
> have I drifted off topic I wonder...

seems on topic to me.

the extra disk space used for a copy of source should be tiny. 
if you have a 500GB disk (< $150 at newegg), making a several of extra
copies of /sys/src would cost you 1/1000th of your disk space if not uniqued.

the old way to do versioning is to remember the date of the release
and use history.  this also doesn't use any disk space, except for the
deltas.

- erik


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-14 14:13         ` erik quanstrom
@ 2007-08-14 15:02           ` David Leimbach
  0 siblings, 0 replies; 18+ messages in thread
From: David Leimbach @ 2007-08-14 15:02 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]

Anyone run Venti on flash or eeprom before?  It might be more suitable there
than on giant disks.
Dave

On 8/14/07, erik quanstrom <quanstro@coraid.com> wrote:
>
> > > does anyone have an example of a case where compression and uniquing
> are required?
> >
> > the compression is nice to have of course but the uniqing is very
> > neat. I have always though of it as plan9's answer to CSV et al.
> >
> > When you do a release of a software package you copy the files to
> > a new directory with the name of the release (the equivilent of
> > tagging your release in CVS) - and continue working. this tne takes
> > up the space for the directory entries and all releases are always
> > available. branching is trivial (dircp) only a pretty merge tool
> > is missing - I have diff3 from edition7 in my contrib area
> > but some sort of interactive differencting GUI tool would be very
> > hand somtimes.
> >
> > have I drifted off topic I wonder...
>
> seems on topic to me.
>
> the extra disk space used for a copy of source should be tiny.
> if you have a 500GB disk (< $150 at newegg), making a several of extra
> copies of /sys/src would cost you 1/1000th of your disk space if not
> uniqued.
>
> the old way to do versioning is to remember the date of the release
> and use history.  this also doesn't use any disk space, except for the
> deltas.
>
> - erik
>

[-- Attachment #2: Type: text/html, Size: 1796 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-14 13:16     ` erik quanstrom
  2007-08-14 13:57       ` Steve Simon
@ 2007-08-14 16:18       ` Uriel
  2007-08-14 16:25         ` erik quanstrom
  1 sibling, 1 reply; 18+ messages in thread
From: Uriel @ 2007-08-14 16:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 8/14/07, erik quanstrom <quanstro@coraid.com> wrote:
> > how about mounting venti-backed fossil files from a linux as an AoE drives?
> > vblade on sources exports the plan 9 file and the aoe driver for linux
> > does the mounting.
> >
> > Venti would be compressing and condensing duplicated blocks to a single block.
> > I'm not sure if same files are boundaried in same manner to each other though.
> >
>
> this is always cited as the "killer functionality" of venti.  essentially it trades cpu time
> for disk space.  however, the couple of times where a concrete venti solution was
> discussed (seperating attachments into seperate files, e.g. for de-duping), it was
> deemed to be slower because attachments need to be split out to be recognized
> as the same.  (9fans.net/archive/2005/10 and 9fans.net/archive/2005/11/1).

I think Mechiel Lukkien GSoC project might be helpful with such
issues, see http://gsoc.cat-v.org/people/mjl/blog//2007-08-06-1_Rabin_fingerprints

uriel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-14 16:18       ` Uriel
@ 2007-08-14 16:25         ` erik quanstrom
  2007-08-15  0:27           ` Uriel
  0 siblings, 1 reply; 18+ messages in thread
From: erik quanstrom @ 2007-08-14 16:25 UTC (permalink / raw)
  To: 9fans

> I think Mechiel Lukkien GSoC project might be helpful with such
> issues, see http://gsoc.cat-v.org/people/mjl/blog//2007-08-06-1_Rabin_fingerprints
> 

; hget http://gsoc.cat-v.org/people/mjl/blog//2007-08-06-1_Rabin_fingerprints
hget: Not found on server

- erik


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-14 16:25         ` erik quanstrom
@ 2007-08-15  0:27           ` Uriel
  0 siblings, 0 replies; 18+ messages in thread
From: Uriel @ 2007-08-15  0:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Sorry, apparently my web server doesn't like HTTP 1.0 clients. You
know that the nice thing about standards is... *sigh*

I'll try to fix it, but I don't really have a clue what the problem
is, and HTTP stinks.

Thanks for the report anyway. Best wishes

uriel

On 8/14/07, erik quanstrom <quanstro@coraid.com> wrote:
> > I think Mechiel Lukkien GSoC project might be helpful with such
> > issues, see http://gsoc.cat-v.org/people/mjl/blog//2007-08-06-1_Rabin_fingerprints
> >
>
> ; hget http://gsoc.cat-v.org/people/mjl/blog//2007-08-06-1_Rabin_fingerprints
> hget: Not found on server
>
> - erik
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
@ 2007-08-15  0:49 YAMANASHI Takeshi
  0 siblings, 0 replies; 18+ messages in thread
From: YAMANASHI Takeshi @ 2007-08-15  0:49 UTC (permalink / raw)
  To: 9fans

By p2p overlaid venti, I meant something like this. 

	http://project-iris.net/isw-2003/papers/sit.pdf

-- 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
  2007-08-15  0:42 YAMANASHI Takeshi
@ 2007-08-15  0:47 ` erik quanstrom
  0 siblings, 0 replies; 18+ messages in thread
From: erik quanstrom @ 2007-08-15  0:47 UTC (permalink / raw)
  To: 9fans

> > does anyone have an example of a case where compression and uniquing are required?
> 
> I'm not sure about compression, but uniquing must be a very neat feature
> when you want to build a P2P overlaid venti.

what do you mean by p2p overlaid venti?

- erik


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] FS to skip/put-together duplicate files
@ 2007-08-15  0:42 YAMANASHI Takeshi
  2007-08-15  0:47 ` erik quanstrom
  0 siblings, 1 reply; 18+ messages in thread
From: YAMANASHI Takeshi @ 2007-08-15  0:42 UTC (permalink / raw)
  To: 9fans

> does anyone have an example of a case where compression and uniquing are required?

I'm not sure about compression, but uniquing must be a very neat feature
when you want to build a P2P overlaid venti.
-- 
"on travel, off the network ... and a fossil in my pocket"



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-08-15  0:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-10 11:39 [9fans] FS to skip/put-together duplicate files Enrico Weigelt
2007-08-10 11:51 ` Gabriel Diaz
2007-08-10 12:26 ` erik quanstrom
2007-08-13  3:32   ` YAMANASHI Takeshi
2007-08-13 12:01     ` erik quanstrom
2007-08-13 13:43       ` Francisco J Ballesteros
2007-08-13 13:52         ` erik quanstrom
2007-08-13 14:40           ` Francisco J Ballesteros
2007-08-14 13:16     ` erik quanstrom
2007-08-14 13:57       ` Steve Simon
2007-08-14 14:13         ` erik quanstrom
2007-08-14 15:02           ` David Leimbach
2007-08-14 16:18       ` Uriel
2007-08-14 16:25         ` erik quanstrom
2007-08-15  0:27           ` Uriel
2007-08-15  0:42 YAMANASHI Takeshi
2007-08-15  0:47 ` erik quanstrom
2007-08-15  0:49 YAMANASHI Takeshi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).