From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu Date: Thu, 29 Mar 2007 09:11:37 +0000 From: Amit Singh Message-ID: <1175051369.282439.191740@e65g2000hsc.googlegroups.com> Content-Type: text/plain; charset="iso-8859-1" References: <1174682487.948169.40940@y66g2000hsf.googlegroups.com> Subject: [9fans] Re: Fwd: Reading from FS with inaccurate file sizes? Topicbox-Message-UUID: 3521d346-ead2-11e9-9d60-3106f5b1d025 On Mar 27, 6:20 am, r...@swtch.com (Russ Cox) wrote: > To be fair, these are the kinds of mistakes I would expect any > Unix-mindset implementation to make, and it surprised me quite > a bit that Linux FUSE got so much of this right from the start > (or at least from when I started using it). I wonder how many > of these mistakes BSD FUSE makes. You're assuming quite a bit here, especially in concluding that these are "mistakes" that you "expect" because of a "Unix-mindset" implementation. BTW, I don't know when you started using FUSE on Linux, but it's been there on Linux at least since 2001. MacFUSE came out in 2007, so your surprise is surprising. > Synthetic file systems tend not to care about the > offset on writes anyway. And the Mac OS X VFS kernel extension environment isn't exactly geared towards synthetic file systems. OS X may have an open source kernel, *but* it's not practical to write kernel extensions that require kernel changes. Therefore, things a kernel extension can do is limited by the interfaces/data that are available to the extension in a stock kernel. In the case of reads/writes when the advertised size is 0, you run into the unified buffer cache, which really wants to believe the file size. To get around this, MacFUSE must explicitly implement separate read/write paths from the vnode operations to user-space and back. Release 0.2.2 does this for reads if you use the 'direct_io' option. In other words, if you add the 'direct_io' option while mounting, what you are looking for should already work. Note that you will have no buffer cache (which is what you'd want anyway in this case). 'direct_io' doesn't do anything for writes in Release 0.2.2. It'd be straightforward to expand the write implementation. A future release of MacFUSE might have it. > MacFUSE also seems to employ somesubterfuge where fds > do not map one-to-one with FUSE file handles. Another bug I've filed:http://code.google.com/p/macfuse/issues/detail?id=133 The subterfuge is intentional and necessary in the current design. The open() and close() vnode operations of MacFUSE *do not* have access to the file descriptor in question. The data structures involved are opaque, so it'd be quite ugly and unmaintainable to try to get at the descriptor by brute force. Given the lack of descriptor, you can't match opens and closes. Along the same lines, MacFUSE only can look at the vnode, and *not* at file structures, which are inaccessible. You can't track connections between file structures and FUSE file handles. Therefore, as a matter of feasibility and simplicity, MacFUSE shares file handles when possible, with reference counting. For multiple opens of a single given file, you won't see every open invocation go up to user space unless the open flags are different from a previous invocation. > On Linux apparently things happen the other way around: > O_TRUNC is never sent, but O_APPEND is sent for >> opens. > MacFUSE doesn't send either, which is another bug I've filed:http://code.google.com/p/macfuse/issues/detail?id=132 > Right now, MacFUSE distinguishes between 3 types of open for a given file: O_RDONLY, O_WRONLY, and O_RDWR. Since write handles could be shared, adding O_APPEND to the mix means we essentially have two additional types of open that MacFUSE must track. This isn't too big of a deal eventually, but the extra complexity wasn't justified in MacFUSE's nascent days, even though it meant sacrificing some arguably contrived semantics. I say contrived because O_APPEND is still handled correctly by the kernel (if you report the correct file size)--it's just that the flag is not passed to user space. So, like you said in your bug report, this matters in cases like "shared append-only files on the server side". Hope this clarifies things.