9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Venti over DHT
@ 2009-10-13 17:55 Roman Shaposhnik
  2009-10-14  1:20 ` Russ Cox
  0 siblings, 1 reply; 9+ messages in thread
From: Roman Shaposhnik @ 2009-10-13 17:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Guys,

I remember Russ authoring a paper on running Venti over distributed hash tables,
but I can't find the pdf anymore. All Google gives me is this:
   http://74.125.155.132/scholar?q=cache:6Wu_j9JaaUcJ:scholar.google.com/&hl=en

Help?

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-13 17:55 [9fans] Venti over DHT Roman Shaposhnik
@ 2009-10-14  1:20 ` Russ Cox
       [not found]   ` <e763acc10910132148qbfd3a07q60d268a02d66c04e@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Russ Cox @ 2009-10-14  1:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I remember Russ authoring a paper on running Venti over distributed hash tables,
> but I can't find the pdf anymore. All Google gives me is this:
>    http://74.125.155.132/scholar?q=cache:6Wu_j9JaaUcJ:scholar.google.com/&hl=en

The paper you've found there was an internal MIT workshop submission,
a draft of a draft of a draft.

I never wrote any paper like you describe.
The only paper I've ever written about Venti was
http://swtch.com/~rsc/papers/fndn-usenix2008.pdf

Russ


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
       [not found]   ` <e763acc10910132148qbfd3a07q60d268a02d66c04e@mail.gmail.com>
@ 2009-10-15 15:52     ` Roman Shaposhnik
  2009-10-16  3:32       ` ron minnich
  0 siblings, 1 reply; 9+ messages in thread
From: Roman Shaposhnik @ 2009-10-15 15:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Well, since Russ is silent (and since this is not the first time this
question has come up: http://9fans.net/archive/2008/05/401) here's
a reliable link for anybody who might still be interested:
     http://web.archive.org/web/20060308015519/http://project-iris.net/isw-2003/papers/sit.pdf

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-15 15:52     ` Roman Shaposhnik
@ 2009-10-16  3:32       ` ron minnich
  2009-10-16  4:03         ` Russ Cox
  0 siblings, 1 reply; 9+ messages in thread
From: ron minnich @ 2009-10-16  3:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Now I remember this paper. Was the code ever released anywhere?

ron



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-16  3:32       ` ron minnich
@ 2009-10-16  4:03         ` Russ Cox
  2009-10-20 12:53           ` Enrico Weigelt
  0 siblings, 1 reply; 9+ messages in thread
From: Russ Cox @ 2009-10-16  4:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Oct 15, 2009 at 8:32 PM, ron minnich <rminnich@gmail.com> wrote:
> Now I remember this paper. Was the code ever released anywhere?

There was no real code to speak of.  It was a draft of a draft.
I did some calculations of block-level commonality using a
few trivial programs that hashed each block of every file in
a tree, but you could recreate that in 100 lines of C or shell script.
We never stored any blocks in the DHT.

Russ


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-16  4:03         ` Russ Cox
@ 2009-10-20 12:53           ` Enrico Weigelt
  2009-10-20 13:39             ` Eric Van Hensbergen
  2009-10-21 11:53             ` Roman Shaposhnik
  0 siblings, 2 replies; 9+ messages in thread
From: Enrico Weigelt @ 2009-10-20 12:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Russ Cox wrote:

Hi,

> There was no real code to speak of.  It was a draft of a draft.
> I did some calculations of block-level commonality using a
> few trivial programs that hashed each block of every file in
> a tree, but you could recreate that in 100 lines of C or shell script.
> We never stored any blocks in the DHT.

I've also done some bits of works in that area
(nothing usable yet ;-o), but with different
requirements:

* storage near to the user (at least local mirrors)
* equal data should get equal score (even w/ encryption)
* automatic removal of stale blocks -> garbage collection
* efficient also on small data blocks

Especially the distributed GC together w/ encryption turned out
not be quite trivial ;-o
On the one hand we need things like timestamps, on the other hand
we need to trace the tree structures w/o decryption.

So I added several block types: eg. blob (payload data) and inode
(holding the tree). blobs are encrypted w/ their score, thus
reading takes scores (outer for retrieval, inner for decryption).
inodes have an public area, listing just the outer scores and an
encrypted one holding the inner scores. The tree formed by inodes
doesn't necessarily has to be fully balanced, so incremental writes
with special heuristics for preventing unncessary writes are possible.


If someone likes to hear more about this, just let me know :)


cu
--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: info@metux.de   skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-20 12:53           ` Enrico Weigelt
@ 2009-10-20 13:39             ` Eric Van Hensbergen
  2009-10-21 11:53             ` Roman Shaposhnik
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Van Hensbergen @ 2009-10-20 13:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Oct 20, 2009, at 7:53 AM, Enrico Weigelt wrote:
>
> I've also done some bits of works in that area
> (nothing usable yet ;-o), but with different
> requirements:
>
> * storage near to the user (at least local mirrors)
> * equal data should get equal score (even w/ encryption)
> * automatic removal of stale blocks -> garbage collection
> * efficient also on small data blocks
>
> Especially the distributed GC together w/ encryption turned out
> not be quite trivial ;-o
> On the one hand we need things like timestamps, on the other hand
> we need to trace the tree structures w/o decryption.
>
> So I added several block types: eg. blob (payload data) and inode
> (holding the tree). blobs are encrypted w/ their score, thus
> reading takes scores (outer for retrieval, inner for decryption).
> inodes have an public area, listing just the outer scores and an
> encrypted one holding the inner scores. The tree formed by inodes
> doesn't necessarily has to be fully balanced, so incremental writes
> with special heuristics for preventing unncessary writes are possible.
>
>
> If someone likes to hear more about this, just let me know :)
>

I'm interested.  Particularly in the multi-layer caching/mirroring and
the encryption/same-score details.

        -eric



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-20 12:53           ` Enrico Weigelt
  2009-10-20 13:39             ` Eric Van Hensbergen
@ 2009-10-21 11:53             ` Roman Shaposhnik
  2009-10-29 19:13               ` Enrico Weigelt
  1 sibling, 1 reply; 9+ messages in thread
From: Roman Shaposhnik @ 2009-10-21 11:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Oct 20, 2009 at 8:53 PM, Enrico Weigelt <weigelt@metux.de> wrote:
> So I added several block types: eg. blob (payload data) and inode
> (holding the tree).

>From these I infer that you've build an object store, not just a block sotre.
How close was it to this:
   http://oceanstore.cs.berkeley.edu/publications/papers/pdf/asplos00.pdf

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Venti over DHT
  2009-10-21 11:53             ` Roman Shaposhnik
@ 2009-10-29 19:13               ` Enrico Weigelt
  0 siblings, 0 replies; 9+ messages in thread
From: Enrico Weigelt @ 2009-10-29 19:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Roman Shaposhnik wrote:
> On Tue, Oct 20, 2009 at 8:53 PM, Enrico Weigelt <weigelt@metux.de> wrote:
>> So I added several block types: eg. blob (payload data) and inode
>> (holding the tree).
>
>>From these I infer that you've build an object store, not just a block sotre.
> How close was it to this:
>    http://oceanstore.cs.berkeley.edu/publications/papers/pdf/asplos00.pdf

Not really, NB is somewhere in the middle between venti and O.S.

The actual storage system is quite dumb: it knows nothing about
file/object lookups, routing, replicas or updates. All it does is
just storing several types of blocks and collects some bits of
statistical data. When a requested block is not found locally,
it simply asks it's neighbors. Retrieved blocks are stored locally
for a while (until GC catches them up).

Inode blocks are nothing more than score lists, which holds a
bunch of data blocks (belonging to some bigger entity, eg. file)
together, so GC operate on that higher layer and doesnt have to
look at each single block (which even might not be present locally).

Each node keeps track of the inodes which it's interested in (eg.
has a local client or a friend neighbour node operating on them).
Here we collect things like TTLs, usage patterns, etc.

Compare it a bit to GIT:

* data blocks = blobs
* inodes = commits
* inode-refs = commit-refs

Local GC goes like this:

1. purge old inode-refs (eg. long-unused)
2. purge unref'ed inodes
3. purge unref'ed blobs

To stop GC from removing some inode: simply "touch" it (add it to
reflist / update atime in reflist).
To stop GC from removing data blocks: simply create a inode
referencing them.


cu
--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 cellphone: +49 174 7066481   email: info@metux.de   skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-10-29 19:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-13 17:55 [9fans] Venti over DHT Roman Shaposhnik
2009-10-14  1:20 ` Russ Cox
     [not found]   ` <e763acc10910132148qbfd3a07q60d268a02d66c04e@mail.gmail.com>
2009-10-15 15:52     ` Roman Shaposhnik
2009-10-16  3:32       ` ron minnich
2009-10-16  4:03         ` Russ Cox
2009-10-20 12:53           ` Enrico Weigelt
2009-10-20 13:39             ` Eric Van Hensbergen
2009-10-21 11:53             ` Roman Shaposhnik
2009-10-29 19:13               ` Enrico Weigelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).