Changes since v1: * Reject UTF-16 surrogate range runes * Remove locale override This is from some discussion on IRC and while I agree that it's more "correct" in POSIX terms, I'm not particularly happy about having to explicitly enable UTF-8 support with setlocale. There might still be bugs and character ranges that need to be rejected.