From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/4394 Path: news.gmane.org!not-for-mail From: Luca Barbato Newsgroups: gmane.linux.lib.musl.general Subject: Re: validation of utf-8 strings passed as system call arguments Date: Fri, 13 Dec 2013 13:11:38 +0100 Message-ID: <52AAF97A.1090505@gentoo.org> References: <20131212213006.dc30d64f61e5ec441c34ffd4f788e58e.381c744cf1.wbe@email22.secureserver.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1386936706 9859 80.91.229.3 (13 Dec 2013 12:11:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 13 Dec 2013 12:11:46 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-4398-gllmg-musl=m.gmane.org@lists.openwall.com Fri Dec 13 13:11:50 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1VrRau-00056z-VX for gllmg-musl@plane.gmane.org; Fri, 13 Dec 2013 13:11:49 +0100 Original-Received: (qmail 3110 invoked by uid 550); 13 Dec 2013 12:11:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 3102 invoked from network); 13 Dec 2013 12:11:46 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 In-Reply-To: <20131212213006.dc30d64f61e5ec441c34ffd4f788e58e.381c744cf1.wbe@email22.secureserver.net> Xref: news.gmane.org gmane.linux.lib.musl.general:4394 Archived-At: On 13/12/13 05:30, writeonce@midipix.org wrote: > Hello, > > While working on code that converts arguments from utf-16 to utf-8, I found > myself wondering about the "responsibility" for checking well-formedness of > utf-8 strings that are passed to the kernel. As I suspected, validation of > these strings takes place neither in the kernel, nor in the C library. The > attached program demonstrates this by creating a file named <0xE0 0x9F 0x80>, > which according to the Unicode Standard (6.2, p. 95) is an ill-formed byte sequence. > > I am not sure whether this can officially be considered a bug, and it is quite > clear that fixing this is going to entail some performance penalty. That being > said, after deleting this file from my Ubuntu desktop most (but not all) > attempts to open the Trash folder made Nautilus crash, and it was only after > deleting the file permanently from the shell that order had been restored... > any kind of rejection beside null and separator seems to me that would be more harmful and even more dangerous than the status quo. lu