mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: validation of utf-8 strings passed as system call arguments
Date: Thu, 12 Dec 2013 23:39:41 -0500	[thread overview]
Message-ID: <20131213043941.GA24286@brightrain.aerifal.cx> (raw)
In-Reply-To: <20131212213006.dc30d64f61e5ec441c34ffd4f788e58e.381c744cf1.wbe@email22.secureserver.net>

On Thu, Dec 12, 2013 at 09:30:06PM -0700, writeonce@midipix.org wrote:
>    Hello,
> 
>    While working on code that converts arguments from utf-16 to utf-8, I
>    found myself wondering about the "responsibility" for checking
>    well-formedness of utf-8 strings that are passed to the kernel.  As I
>    suspected, validation of these strings takes place neither in the kernel,
>    nor in the C library.  The attached program demonstrates this by creating
>    a file named <0xE0 0x9F 0x80>, which according to the Unicode Standard
>    (6.2, p. 95) is an ill-formed byte sequence.
> 
>    I am not sure whether this can officially be considered a bug, and it is
>    quite clear that fixing this is going to entail some performance penalty. 
>    That being said, after deleting this file from my Ubuntu desktop most (but
>    not all) attempts to open the Trash folder made Nautilus crash, and it was
>    only after deleting the file permanently from the shell that order had
>    been restored...

There's nothing in POSIX that says that filenames have to be valid
strings in the current locale's encoding -- in fact, this is highly
problematic to enforce on implementations other than musl, such as
glibc, where the encoding might vary by locale and where different
users might be using locales with different encodings.

But there's also nothing that says arbitrary byte sequences (excluding
of course those containing '/' and NUL) have to be accepted as
filenames either. The historical _expectation_ and practice has been
that filenames can contain arbitrary byte sequences. And Linus in
particular is opposed to changing this, though there's been some
indicastion (I don't have references right off) that he might be open
to optional restrictions at the kernel level.

What's clear to me is that restrictions at the libc level are not
useful. If your concern is that creating files with illegal sequences
in their names can confuse/break/crash some software, adding a
restriction on file creation in libc won't help. A malicious user can
just make the syscalls directly to make malicious filenames. On the
other hand, having the restriction in libc would be annoying because
it would _prevent_ you from renaming or deleting these bad filenames
using standard tools; you'd have to use special tools that make the
syscalls directly.

So if you want protection against illegal sequences in filenames
(personally, I want this too) the right place to lobby for it (and
propose an optional feature) is in the kernel, not in libc.

Rich


  reply	other threads:[~2013-12-13  4:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-13  4:30 writeonce
2013-12-13  4:39 ` Rich Felker [this message]
2013-12-13  6:36   ` Szabolcs Nagy
2013-12-13  6:49     ` Rich Felker
2013-12-13 12:11 ` Luca Barbato
2013-12-13 12:52 writeonce
2013-12-13 17:28 ` Rich Felker
2013-12-13 18:57 writeonce
2013-12-13 19:46 ` Rich Felker
2013-12-13 20:23 writeonce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131213043941.GA24286@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).