Re: [9fans] Different representations of the same file/resource in a synthetic FS

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
@ 2009-06-09 19:41 erik quanstrom
  0 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2009-06-09 19:41 UTC (permalink / raw)
  To: mirtchovski, 9fans

> On Tue, Jun 9, 2009 at 1:16 PM, erik quanstrom<quanstro@quanstro.net> wrote:
> >> still a hash. i'm not doing anything particularly clever for speed,
> >> and it shows in places.
>
> I lied a bit here: in some cases, for example where a particular query
> would involve going through several (up to a couple of thousand) files
> and subdirectories to compose, i provide a single file that gives me
> that information much faster and in only a fraction of the 9p queries
> it would normally would. it's by no means a general solution to the
> speed problem, but it does get me the data 30-50 times faster...
>
> but i digress...

interstingly, i considered mentioning the old upas/fs trick of the info
file, which i believe is approximately the same hack.  the "xxx" hack
i recently added to the file has table of upas/fs is somewhat the
mirror image of the info file.

i suppose that in the language of standardized assessment tests
technique is to hack as 30x is to 5%.

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
  2009-06-09 17:27 ` andrey mirtchovski
  2009-06-09 18:10   ` erik quanstrom
@ 2009-06-11  3:44   ` Roman V. Shaposhnik
  1 sibling, 0 replies; 11+ messages in thread
From: Roman V. Shaposhnik @ 2009-06-11  3:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

В Втр, 09/06/2009 в 11:27 -0600, andrey mirtchovski пишет:
> I think I've mentioned this before, but on a few of my synthetic file
> systems here I'm using what you describe to slice a database by
> specific orderings. For example, I have a (long) list of resources
> which I'm managing in a particular environment each has an owner,
> type, status and a few static data containers. It's all backed by a
> relational database, but the file server presents different "slices"
> of that to external users, where the directory structure is rigidly
> defined as:
> 
> /
>  available/
>  by-type/
>  by-owner/
>  inuse/
>  ...
> 
> with all data to fill the directories being dynamically pulled from
> the database.

This looks like a slightly different use case than what I'm worried
about. Mainly it seems that you don't really have to deal with
the representations of the same resource, your problem is how to
group these resources in a reasonable fashion. Essentially you're
mapping a relational database to a tree hierarchy.

In your case, the sooner you have the fork of
  by-this/
  by-that/
  ....
in your hierarchy -- the better.

My case is a flip side of that. In fact, my worst case scenario is
that I can't really predict all the representations of existing
resources down the road, thus it appears that I have to push
that part of a filename as close to an actual file as possible:
   /long-path/file.<representation>
I'm almost tempted to consider "virtual extensions":
   /long-path/file ## default representation
   /long-path/file.gif
   ....
   /long-path/file.pdf
but at that point it becomes no more appealing than the content
negotiation techniques of HTTP.

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
       [not found] <eed9f9e37182c89c3e8a9982844f9d0f@quanstro.net>
@ 2009-06-09 19:31 ` andrey mirtchovski
  0 siblings, 0 replies; 11+ messages in thread
From: andrey mirtchovski @ 2009-06-09 19:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Jun 9, 2009 at 1:16 PM, erik quanstrom<quanstro@quanstro.net> wrote:
>> still a hash. i'm not doing anything particularly clever for speed,
>> and it shows in places.

I lied a bit here: in some cases, for example where a particular query
would involve going through several (up to a couple of thousand) files
and subdirectories to compose, i provide a single file that gives me
that information much faster and in only a fraction of the 9p queries
it would normally would. it's by no means a general solution to the
speed problem, but it does get me the data 30-50 times faster...

but i digress...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
@ 2009-06-09 19:16 erik quanstrom
  0 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2009-06-09 19:16 UTC (permalink / raw)
  To: mirtchovski, 9fans

> still a hash. i'm not doing anything particularly clever for speed,
> and it shows in places. listing large directories is the slowest
> operation by far, as it would be for most cases where several thousand
> "stat" structures would have to be dynamically created for each entry
> in a directory. i'm not pre-generating anything however, so in daily
> use, where each client knows exactly where to go, i'm not seeing
> slowdowns.

thanks!

> not that i'm worried: we recently discovered a few misconfigured
> clusters around here (names withheld) that were using ldap and no
> local nameservice caching. each stat on those boxes would take 0.05 ms
> (instead of 0.005) to complete because it needed to contact a server
> for username lookup. the wait became unbearable above a number of
> thousands of files in a particular directory, so people finally
> started complaining after waiting for minutes for 'ls -l' to finish.
> things could be way worse, i guess :)

i suppose we could all be forced to reimplement vi for the
apollo landing computer.

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
  2009-06-09 18:10   ` erik quanstrom
@ 2009-06-09 19:01     ` andrey mirtchovski
  0 siblings, 0 replies; 11+ messages in thread
From: andrey mirtchovski @ 2009-06-09 19:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> how are the resultant files looked up?  it turns out that generating
> the file hash table was the single most expensive operation for
> upas/fs, given mailboxes with ~10k messages.
> (http://9fans.net/archive/2009/05/106)

still a hash. i'm not doing anything particularly clever for speed,
and it shows in places. listing large directories is the slowest
operation by far, as it would be for most cases where several thousand
"stat" structures would have to be dynamically created for each entry
in a directory. i'm not pre-generating anything however, so in daily
use, where each client knows exactly where to go, i'm not seeing
slowdowns.

not that i'm worried: we recently discovered a few misconfigured
clusters around here (names withheld) that were using ldap and no
local nameservice caching. each stat on those boxes would take 0.05 ms
(instead of 0.005) to complete because it needed to contact a server
for username lookup. the wait became unbearable above a number of
thousands of files in a particular directory, so people finally
started complaining after waiting for minutes for 'ls -l' to finish.
things could be way worse, i guess :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
@ 2009-06-09 18:59 erik quanstrom
  0 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2009-06-09 18:59 UTC (permalink / raw)
  To: nemo, 9fans

On Tue Jun  9 14:15:29 EDT 2009, nemo@lsub.org wrote:
> With mail2fs I leave messages alone and use all kinds of mail lists
> that contain just relative paths to actual messages. Perhaps nupas
> could do the same.
>

i think that essential strategy is a winner.  upas would use
.idx files.

so the general plan would be to never delete messages.
they're in the dump anyway.  just delete them from
the index.

so that seems simple, why isn't that done already?

first, i should point out that there are some complicating
assumptions i have.  our heavy users are receiving
about 1000 messages a day after spam filtering.  it would
be pretty easy to accumulate a million messages.  also,
it's important for imap support to be able to scan several
hundred mailboxes in short order.

hopefully that's enough context to understand the two
basic problems i see:
1.  the mdir format is limited by the underlying fs
in the number of messages that can be efficiently
stored.  (my previous tests with ken's fs on a pIII
machine showed that 100k was a lower upper bound.)
a million-message index would be ~600mb on-disk.

2.  upas/fs needs memory proportinal to the number
of messages in the mailbox.  and clients need to read
directories that are sized in proportion to the number
of messages.

#1 seems straightforward to fix.  #2 seems more
fundamental.

- erik

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
@ 2009-06-09 18:15 Francisco J Ballesteros
  0 siblings, 0 replies; 11+ messages in thread
From: Francisco J Ballesteros @ 2009-06-09 18:15 UTC (permalink / raw)
  To: 9fans; +Cc: 9fans

With mail2fs I leave messages alone and use all kinds of mail lists  
that contain just relative paths to actual messages. Perhaps nupas  
could do the same.

El 09/06/2009, a las 20:11, quanstro@quanstro.net escribió:

> On Tue Jun 9 13:28:55 EDT 2009, mirtchovski@gmail.com wrote:
>> I think I've mentioned this before, but on a few of my synthetic file
>> systems here I'm using what you describe to slice a database by
>> specific orderings. For example, I have a (long) list of resources
>> which I'm managing in a particular environment each has an owner,
>> type, status and a few static data containers. It's all backed by a
>> relational database, but the file server presents different "slices"
>> of that to external users, where the directory structure is rigidly
>> defined [...]
>
> this is definately a problem for upas/fs. it would be nice, for  
> example,
> for upas/fs to have the option of sorting mailboxes in various ways.
> imap4d's requirements are not the same as nedmail's. by thread,
> by date, by order of arrival are all useful sortings. and of course,
> given the ability to manage giant piles of messages reasonably  
> efficiently,
> it's tempting to replace the idea of different boxes with different
> views of the same giant pile of messages.
>
> how are the resultant files looked up? it turns out that generating
> the file hash table was the single most expensive operation for
> upas/fs, given mailboxes with ~10k messages.
> (http://9fans.net/archive/2009/05/106)
>
> - erik
>
> [/mail/box/nemo/msgs/200906/41850]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
  2009-06-09 17:27 ` andrey mirtchovski
@ 2009-06-09 18:10   ` erik quanstrom
  2009-06-09 19:01     ` andrey mirtchovski
  2009-06-11  3:44   ` Roman V. Shaposhnik
  1 sibling, 1 reply; 11+ messages in thread
From: erik quanstrom @ 2009-06-09 18:10 UTC (permalink / raw)
  To: 9fans

On Tue Jun  9 13:28:55 EDT 2009, mirtchovski@gmail.com wrote:
> I think I've mentioned this before, but on a few of my synthetic file
> systems here I'm using what you describe to slice a database by
> specific orderings. For example, I have a (long) list of resources
> which I'm managing in a particular environment each has an owner,
> type, status and a few static data containers. It's all backed by a
> relational database, but the file server presents different "slices"
> of that to external users, where the directory structure is rigidly
> defined [...]

this is definately a problem for upas/fs.  it would be nice, for example,
for upas/fs to have the option of sorting mailboxes in various ways.
imap4d's requirements are not the same as nedmail's.  by thread,
by date, by order of arrival are all useful sortings.  and of course,
given the ability to manage giant piles of messages reasonably efficiently,
it's tempting to replace the idea of different boxes with different
views of the same giant pile of messages.

how are the resultant files looked up?  it turns out that generating
the file hash table was the single most expensive operation for
upas/fs, given mailboxes with ~10k messages.
(http://9fans.net/archive/2009/05/106)

- erik

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
  2009-06-09 17:14 Roman V Shaposhnik
  2009-06-09 17:27 ` andrey mirtchovski
@ 2009-06-09 17:30 ` J.R. Mauro
  1 sibling, 0 replies; 11+ messages in thread
From: J.R. Mauro @ 2009-06-09 17:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Jun 9, 2009 at 1:14 PM, Roman V Shaposhnik<rvs@sun.com> wrote:
> Lets assume a classical example (modified slightly to fit 9P):
> a synthetic filesystem that serves images from a web cam.
> The very same frame can be asked for in different formats
> (.gif, .png, .pdf, etc.). Is serving
>   /<date>/<time>/<camera-id>/gif/frame
>   /<date>/<time>/<camera-id>/png/frame
>   ...
>   /<date>/<time>/<camera-id>/pdf/frame
> and relying on reading
>   /<date>/<time>/<camera-id>
> for the list of "supported" representations really better
> than what HTTP content negotiation offers?
>

Plan 9 does this a bit, in that you can ask a special file in /net for
how to dial a certain host across all protocols. You can then pick the
one that suits you, and get instructions on how to use that proto
inside /net. I think it's a good use.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Different representations of the same file/resource in a synthetic FS
  2009-06-09 17:14 Roman V Shaposhnik
@ 2009-06-09 17:27 ` andrey mirtchovski
  2009-06-09 18:10   ` erik quanstrom
  2009-06-11  3:44   ` Roman V. Shaposhnik
  2009-06-09 17:30 ` J.R. Mauro
  1 sibling, 2 replies; 11+ messages in thread
From: andrey mirtchovski @ 2009-06-09 17:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I think I've mentioned this before, but on a few of my synthetic file
systems here I'm using what you describe to slice a database by
specific orderings. For example, I have a (long) list of resources
which I'm managing in a particular environment each has an owner,
type, status and a few static data containers. It's all backed by a
relational database, but the file server presents different "slices"
of that to external users, where the directory structure is rigidly
defined as:

/
 available/
 by-type/
 by-owner/
 inuse/
 ...

with all data to fill the directories being dynamically pulled from
the database.

in this particular case it saves me having to implement a generic SQL
query mechanism, which is unsafe, as well as pushing the complexity of
knowing the underlying database structure onto the clients. in the
end, clients only know how to navigate to a particular resource and
'reserve' or 'release' it. this scheme could potentially be extended
(at least in my case) to match your "user-defined sets" by simply
enumerating every unique column in the database as subdirectories. a
user defines a "subset" of all available nodes of a particular type
foo which have owner bar by cd-ing to
/available/by-type/foo/by-owner/bar. (I admit this is a bit hasty, so
perhaps not what you're really after)...

the reason i'm sticking with a file system is that if i have to do
this for many different resources which can't  be easily stuck in the
same database, I'd have to design a protocol in order to avoid
replicating everything, and if I'm going to design a protocol I may as
well use 9p, something that's simple and I'm familiar with.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [9fans] Different representations of the same file/resource in a synthetic FS
@ 2009-06-09 17:14 Roman V Shaposhnik
  2009-06-09 17:27 ` andrey mirtchovski
  2009-06-09 17:30 ` J.R. Mauro
  0 siblings, 2 replies; 11+ messages in thread
From: Roman V Shaposhnik @ 2009-06-09 17:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Working on a RESTful API lately (which is as close to working on a 9P
filesystem as I can get these days) I've been puzzling over this issue:
is content negotiation a good thing or a bad thing? Or to justify
posting to this list: what would be the proper 9P way of not only
representing different "renditions" of the same information in
a synthetic filesystem but also give consumer a chance to declare
*a set* of preferred ones.

Lets assume a classical example (modified slightly to fit 9P):
a synthetic filesystem that serves images from a web cam.
The very same frame can be asked for in different formats
(.gif, .png, .pdf, etc.). Is serving
   /<date>/<time>/<camera-id>/gif/frame
   /<date>/<time>/<camera-id>/png/frame
   ...
   /<date>/<time>/<camera-id>/pdf/frame
and relying on reading
   /<date>/<time>/<camera-id>
for the list of "supported" representations really better
than what HTTP content negotiation offers?

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-06-11  3:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-09 19:41 [9fans] Different representations of the same file/resource in a synthetic FS erik quanstrom
     [not found] <eed9f9e37182c89c3e8a9982844f9d0f@quanstro.net>
2009-06-09 19:31 ` andrey mirtchovski
  -- strict thread matches above, loose matches on Subject: below --
2009-06-09 19:16 erik quanstrom
2009-06-09 18:59 erik quanstrom
2009-06-09 18:15 Francisco J Ballesteros
2009-06-09 17:14 Roman V Shaposhnik
2009-06-09 17:27 ` andrey mirtchovski
2009-06-09 18:10   ` erik quanstrom
2009-06-09 19:01     ` andrey mirtchovski
2009-06-11  3:44   ` Roman V. Shaposhnik
2009-06-09 17:30 ` J.R. Mauro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).