mailing list of musl libc
 help / color / mirror / code / Atom feed
From: "罗勇刚(Yonggang Luo) " <luoyonggang@gmail.com>
To: Rich Felker <dalias@libc.org>
Cc: John Sully <john@csquare.ca>, Karsten Blees <blees@dcon.de>,
	musl@lists.openwall.com,  dplakosh@cert.org,
	austin-group-l@opengroup.org, hsutter@microsoft.com,
	 Clang Dev <cfe-dev@cs.uiuc.edu>,
	James McNellis <james@jamesmcnellis.com>
Subject: Re: Re: [cfe-dev] Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API?
Date: Sun, 10 May 2015 20:19:46 +0800	[thread overview]
Message-ID: <CAE2XoE-C_Pi+i4YT3QKGana3oaWMKz6zUwSN94gnSamtbDxD5Q@mail.gmail.com> (raw)
In-Reply-To: <20150509200535.GK17573@brightrain.aerifal.cx>

2015-05-10 4:05 GMT+08:00 Rich Felker <dalias@libc.org>:
> On Sat, May 09, 2015 at 07:19:14PM +0800, 罗勇刚(Yonggang Luo)  wrote:
>> 2015-05-09 18:36 GMT+08:00 Szabolcs Nagy <nsz@port70.net>:
>> > * John Sully <john@csquare.ca> [2015-05-09 00:55:12 -0700]:
>> >> In my opinion you almost never want 32-bit wide characters once you learn
>> >> of their limitations.  Most people assume that if they use them they can
>> >> return to the one character -> one glyph idiom like ASCII.  But Unicode is
>> >
>> > wchar_t must be at least 21 bits on a system that spports unicode
>> > in any locale: it has to be able to represent all code points of the
>> > supported character set.
>> >
>> > in practice this means that the only conforming definition to iso c
>> > (and thus posix, c++ and other standards based on c) is a 32bit wchar_t
>> > (the signedness can be choosen freely).
>> >
>> > so the definition is not based on what "you almost never want" or what
>> > "most people assume".
>> >
>> > if the goal is to provide a posix implementation then 16bit wchar_t
>> > is not an option (assuming the system wants to be able to communicate
>> > with the external world that uses unicode text).
>> wchar_t is not the only way to communicate with the external way, and
>> it's also not suite for communicate to the external world,
>
> Of course it's not. UTF-8 is. But per both ISO C and POSIX, any
> character the locale supports has a representation as wchar_t. If
> wchar_t is only 16-bit, then you fundamentally can't support all of
> Unicode in the locale's encoding. mbrtowc has to fail with EILSEQ for
> 4-byte characters, regex functions cannot process 4-byte characters,
> etc. Such a system is is conforming to the requirements for C and
> POSIX but does not support Unicode (in full) at the locale level.
>
>> from the C11 standard, it's never restrict the wchar_t's width, and
>> for Posix, most API are implement in
>> utf8, and indeed, Windows need the posix layer mainly because of those
>> API that using utf8, not wchar_t APIs,
>> for the communicate reason to getting wchar_t to be 32 bit on Win32 is
>> not a good idea,
>>
>> And for portable text processing(Including win32) apps or libs, they
>> would and should never dependents on the wchar_t must be 32 bit width.
>
> If __STDC_ISO_10646__ is defined, wchar_t must have at least 21 value
> bits. Applications which are portable only to systems where this macro
> is defined, or which have some fallback (like dropping multilingual
> text support) for systems where it's not defined, CAN make such
> assumptions.
>
>> And C11/C++11 already provide uchar.h to provide cross-platform
>> char16_t and char32_t, so there is no reason to getting wchar_t to be
>> 32bit
>> on win32 for suport posix on win32.
>
> If wchar_t is 16-bit, you can't represent non-BMP characters in
> char32_t because they can't be part of the locale's character set. All
> char32_t buys you then is 16 wasted zero bits.
>
>> We were intent to creating a usable posix layer on win32, not creating
>> a theoretical POSIX layer that would be useless, on win32, we should
>> considerate the de facto things
>> on win32.
>
> Uselessness is a big assumption you're making that's not supported by
> data. If you actually provide a working POSIX layer, you'll have
> pretty much any application that's currently working on Linux, BSDs,
> etc. (with actual portable code, not system-specific #ifdefs) working
> on Windows with few or no changes. If you do that with 32-bit wchar_t,
> they'll support Unicode fully. If you do it with 16-bit wchar_t, then
> the ones that are using the locale system for character handling will
> have to be refitted with extra layers to support more than the BMP,
> and those patches probably (hopefully) won't be accepted upstream.
>
> The only applications that would benefit from having 16-bit wchar_t
> are existing Windows applications that are not going to have much use
> for a POSIX layer anyway, and they can be fixed very easily with
> search-and-replace (no new code layers).
That's not so easy as you said to search-and-replace,

Windows and POSIX there is a lot of incompatible and that won't be changed, or
We just implement a virtual machine that running on Win32, that's
would compatible all the POSIX
things on win32, but that's useless

The intention to provide a POSIX layer is to reduce the burden for
those Developers have intension
to create cross-platform(include Windows), but not for those
Developers that only intent to developing apps
for Linux/POSIX.

So such a layer should preserve the usable part of POSIX and dropping
those part that just creating inconvenience.
wchar_t to be 32bit is obviously suite for Win32.

My intention is not developing a virtual machine like layer such as
cygwin, but a native Win32 layer that provide
most POSIX functions and with utf8 support, that would solve most
portable issue and works on win32 just like
a win32 app but not a Unix/Linux app.
>
> Rich



-- 
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo


  reply	other threads:[~2015-05-10 12:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-09  3:16 罗勇刚(Yonggang Luo) 
2015-05-09  3:32 ` Rich Felker
2015-05-09  3:36   ` 罗勇刚(Yonggang Luo) 
     [not found] ` <CAE2XoE_vO83dVqmJ3xRb9md8H=EO0j723Ycwqijo1To88iGueA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-09  7:55   ` John Sully
2015-05-09 10:36     ` [cfe-dev] " Szabolcs Nagy
     [not found]       ` <20150509103645.GG29035-4P1ElwuDYu6sTnJN9+BGXg@public.gmane.org>
2015-05-09 11:19         ` 罗勇刚(Yonggang Luo) 
2015-05-09 20:05           ` Rich Felker
2015-05-10 12:19             ` 罗勇刚(Yonggang Luo)  [this message]
2015-05-10 12:31             ` 罗勇刚(Yonggang Luo) 
2015-05-10 13:42               ` Rich Felker
     [not found]                 ` <20150510134230.GN17573-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2015-05-10 14:15                   ` [musl] " 罗勇刚(Yonggang Luo) 
2015-05-10 15:30                     ` Rich Felker
     [not found]               ` <CAE2XoE8ARm6BkarKYspPK_uDkePw8PewHXPWRXmT+mGM5mwEaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-11  1:47                 ` [musl] " Mike Frysinger
2015-05-11  3:25                   ` 罗勇刚(Yonggang Luo) 
2015-05-11 10:27                   ` [musl] " Joerg Schilling
2015-05-12  3:21                     ` 罗勇刚(Yonggang Luo) 
2015-05-10 18:47 ` Karsten Blees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAE2XoE-C_Pi+i4YT3QKGana3oaWMKz6zUwSN94gnSamtbDxD5Q@mail.gmail.com \
    --to=luoyonggang@gmail.com \
    --cc=austin-group-l@opengroup.org \
    --cc=blees@dcon.de \
    --cc=cfe-dev@cs.uiuc.edu \
    --cc=dalias@libc.org \
    --cc=dplakosh@cert.org \
    --cc=hsutter@microsoft.com \
    --cc=james@jamesmcnellis.com \
    --cc=john@csquare.ca \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).