From mboxrd@z Thu Jan 1 00:00:00 1970 From: FODEMESI Gergely To: plan9 mailing list <9fans@cse.psu.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: [9fans] fossil/venti/manber Date: Mon, 28 Apr 2003 22:17:21 +0200 Topicbox-Message-UUID: 99c01a6c-eacb-11e9-9e20-41e7f4b1d025 Hi, in the venti paper there is a reference to manber's algorithm, for possible future development to venti/fossil. (Udi Manber: Finding similar files in a large file system) Did anyone consider giving this possible development a second thought? I'd like to elaborate on the possibility of using this algorithm with venti. Could somebody correct me if the following comments are false? 1. Anchors would be needed to synchronize to block boundries. 2. In order to somehow detect possible similar bit-streams, venti must know more about the meta information on these similar bit-streams (files/directories). By this I mean venti format has to be extended with meta information on files. 3. Venti would have to implement a method of generating possible anchors to possible similar bit-streams to "new" (i.e. freshly stored) bit-streams. This should probably be done parallel to storing new blocks. "Lazy anchoring?" 4. Except for databases with dynamically changing sizes (are there any?), what kind of bit-streams could such a method be used for? 5. Depending on the comments to 4. could anybody imagine changing venti format in order to provide such a seemingly marginally useful feature? See Russ's comment on possibly never changing the venti format. thanks for listening: gergo