From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Re: Issues in mbsnrtowcs and wcsnrtombs
Date: Fri, 11 Aug 2017 20:31:07 -0400 [thread overview]
Message-ID: <20170812003107.GN1627@brightrain.aerifal.cx> (raw)
In-Reply-To: <dfc33583-c665-05cd-9847-9c06c4708c8c@gmail.com>
On Wed, Aug 09, 2017 at 08:57:27PM +0300, Mikhail Kremnyov wrote:
> --- ./src/regression/mbsnrtowcs-overread.c 1970-01-01 03:00:00.000000000 +0300
> +++ ./src/regression/mbsnrtowcs-overread.c 2017-08-09 20:20:29.472003066 +0300
> @@ -0,0 +1,45 @@
> +// mbsnrtowcs issue, reported in www.openwall.com/lists/musl/2017/07/18/3
> +#include <locale.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <wchar.h>
> +#include "test.h"
> +
> +int main(void)
> +{
> + const char *const chr = "\u044B";
This should probably use \x to write out the UTF-8 rather than
assuming the compiler charset is UTF-8.
> + const int chr_size = strlen(chr);
> + // The passed length of the source string in bytes should be bigger than
> + // 32*4 to force mbsnrtowcs to use the optimization based on mbsrtowcs.
> + const int chr_count_to_convert = 1000;
> + // Make sure that the source string has more characters after the passed
> + // length.
> + const int chr_count = chr_count_to_convert + 10;
> +
> + char src[chr_count * chr_size + 1];
> + // dest should also have extra space
> + wchar_t dest[chr_count + 1];
> + size_t r;
> + const char *str_ptr = src;
> + mbstate_t mbs;
> +
> + for (int i = 0; i < chr_count; ++i)
> + {
> + memcpy(src + i * chr_size, chr, chr_size);
> + }
> + src[chr_count * chr_size] = 0;
> +
> + setlocale(LC_CTYPE, "en_US.UTF-8");
I think this should use t_setutf8(), added in commit
defcb8d354e052f2d6ba230e7e2983546429a583, so that the logic for
finding a UTF-8 locale is centralized and not dependent on en_US.
> +
> + memset(&mbs, 0, sizeof(mbs));
> + r = mbsnrtowcs(dest, &str_ptr, chr_count_to_convert * chr_size,
> + sizeof(dest)/sizeof(dest[0]), &mbs);
> +
> + if (r != chr_count_to_convert)
> + {
> + t_error("Expected to convert %d characters, but converted %d\n",
> + chr_count_to_convert, r);
> + }
> +
> + return t_status;
> +}
> --- ./src/regression/wcsnrtombs_underread.c 1970-01-01 03:00:00.000000000 +0300
> +++ ./src/regression/wcsnrtombs_underread.c 2017-08-09 20:24:57.575995227 +0300
> @@ -0,0 +1,46 @@
> +// wcsnrtombs issue, reported in www.openwall.com/lists/musl/2017/07/18/3
> +#include <locale.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <wchar.h>
> +#include "test.h"
> +
> +#define TEST_CHR "\u044B"
> +
> +#define CAT_IMPL(x, y) x##y
> +#define CAT(x, y) CAT_IMPL(x, y)
> +
> +int main(void)
> +{
> + const wchar_t *const chr = CAT(L, TEST_CHR);
> + const int chr_len_in_utf_8 = strlen(TEST_CHR);
> + const int chr_size = wcslen(chr);
> + // The number of characters should be greater than 32 to force wcsnrtombs
> + // to use the optimization based on wcsrtombs.
> + const int chr_count = 1000;
> + wchar_t src[chr_count];
> + char dest[chr_count * 4];
> + size_t r;
> + const wchar_t *str_ptr = src;
> + mbstate_t mbs;
> +
> + for (int i = 0; i < chr_count; ++i)
> + {
> + memcpy(src + i, chr, sizeof(*chr));
> + }
> + src[chr_count] = 0;
> +
> + setlocale(LC_CTYPE, "en_US.UTF-8");
Likewise.
> --- ./src/multibyte/mbsnrtowcs.c 2017-08-08 16:19:29.311584832 +0300
> +++ ./src/multibyte/mbsnrtowcs.c 2017-08-09 20:33:27.515980317 +0300
I haven't reviewed this part yet but it's on my radar. Thanks.
Rich
next prev parent reply other threads:[~2017-08-12 0:31 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-18 20:05 Mikhail Kremnyov
2017-08-09 17:57 ` Mikhail Kremnyov
2017-08-12 0:31 ` Rich Felker [this message]
2017-08-31 18:28 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170812003107.GN1627@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).