From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/4390 Path: news.gmane.org!not-for-mail From: Newsgroups: gmane.linux.lib.musl.general Subject: validation of utf-8 strings passed as system call arguments Date: Thu, 12 Dec 2013 21:30:06 -0700 Message-ID: <20131212213006.dc30d64f61e5ec441c34ffd4f788e58e.381c744cf1.wbe@email22.secureserver.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=_e3df83218951a98b726dc12ed4666ef2" X-Trace: ger.gmane.org 1386909025 15045 80.91.229.3 (13 Dec 2013 04:30:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 13 Dec 2013 04:30:25 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-4394-gllmg-musl=m.gmane.org@lists.openwall.com Fri Dec 13 05:30:29 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1VrKON-0000pA-R9 for gllmg-musl@plane.gmane.org; Fri, 13 Dec 2013 05:30:23 +0100 Original-Received: (qmail 23672 invoked by uid 550); 13 Dec 2013 04:30:22 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 23663 invoked from network); 13 Dec 2013 04:30:21 -0000 X-SID: 0sW71n0012XSfNk01 X-Originating-IP: 71.206.170.124 User-Agent: Workspace Webmail 5.6.45 Xref: news.gmane.org gmane.linux.lib.musl.general:4390 Archived-At: --=_e3df83218951a98b726dc12ed4666ef2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"
Hello,

While working on code that converts arguments from = utf-16 to utf-8, I found myself wondering about the "responsibility" for ch= ecking well-formedness of utf-8 strings that are passed to the kernel. = ; As I suspected, validation of these strings takes place neither in the ke= rnel, nor in the C library.  The attached program demonstrates this by= creating a file named <0xE0 0x9F 0x80>, which according to the Unico= de Standard (6.2, p. 95) is an ill-formed byte sequence.

I am not su= re whether this can officially be considered a bug, and it is quite clear t= hat fixing this is going to entail some performance penalty.  That bei= ng said, after deleting this file from my Ubuntu desktop most (but not all)= attempts to open the Trash folder made Nautilus crash, and it was only aft= er deleting the file permanently from the shell that order had been restore= d...

Best regards,
zg

--=_e3df83218951a98b726dc12ed4666ef2 Content-Transfer-Encoding: base64 Content-Type: text/x-c; name="open__ill_formed_utf8.c"; Content-Disposition: attachment; filename="open__ill_formed_utf8.c"; I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxmY250bC5oPgojaW5jbHVkZSA8dW5pc3RkLmg+ CiNpbmNsdWRlIDxzeXMvc3RhdC5oPgojaW5jbHVkZSA8c3lzL3R5cGVzLmg+CgppbnQgbWFpbiAo aW50IGFyZ2MsIGNoYXIgKiBhcmd2W10sIGNoYXIgKiBlbnZwW10pCnsKCWNoYXIgcGF0aFtdID0g ezB4RTAsIDB4OUYsIDB4ODAsIDB4MDB9OwoJbW9kZV90IG1vZGUgPSBTX0lSVVNSIHwgU19JV1VT UiB8IFNfSVJHUlAgfCBTX0lXR1JQIHwgU19JUk9USDsKCglpbnQgZmQgPSBvcGVuIChwYXRoLCBP X1dST05MWSB8IE9fRVhDTCB8IE9fQ1JFQVQsIG1vZGUpOwoJCglpZiAoZmQgPT0gLTEpIHsKCQlw ZXJyb3IgKCJvcGVuIik7CgkJcmV0dXJuIDI7Cgl9IGVsc2UgewoJCXByaW50ZigiSXQgd29ya2Vk ISBUaGUgZmlsZSBkZXNjcmlwdG9yIGlzICVkLlxuIixmZCk7Cgl9CgkKCXJldHVybiAwOwp9Cgo= --=_e3df83218951a98b726dc12ed4666ef2--