From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3ff7aff7e446ebd1fe0a788c93d5d844@vitanuova.com> To: 9fans@cse.psu.edu Subject: Re: [9fans] kfs un-removable file From: rog@vitanuova.com In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Mon, 27 Oct 2003 16:02:13 +0000 Topicbox-Message-UUID: 79f684cc-eacc-11e9-9e20-41e7f4b1d025 > I doubt that. C2 80 is UTF for 0x80, the error rune. > When any of the UTF routines process a bad UTF sequence, they > replace it with the error rune. So what's really happening, > probably, is that kfs is giving you bad data (not UTF) and > ls is coping. that's not necessarily the case. we've got some files on our filesystem (an old style fileserver) that have C2 80 sequences in them, and they seem to be unremovable. a direct stat on the files still gives the c2-80 sequences (as far as i can see convM2D doesn't do any utf conversions, so it shouldn't be necessary to look at the raw dir format) not only are the files non-removable, several of them have a few duplicates. after a little experimentation, it seems that the fileserver (and presumably kfs too) doesn't check utf consistency on input, but does convert utf chars on output (mind you, it's not obvious from a quick check in the source). in fact, in the example i just tried, i did similar to: char buf[] = "/tmp/yyXz"; buf[7] = 0xff; create(buf, OWRITE, 8r666); buf[7] = 0xfd; create(buf, OWRITE, 8r666); both creates succeeded, and i now have two unremovable files in my /tmp (oops). cat /tmp | xd -c shows that the filenames of each are identical (in this case they're each exactly "yy"). i'd suggest that perhaps it'd be a good idea for the fileserver to canonicalise names on creation as quite apart from invalid utf sequences, aren't there several possible utf sequences that can validly map to the same character?