From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7609 Path: news.gmane.org!not-for-mail From: John Sully Newsgroups: gmane.comp.compilers.clang.devel,gmane.linux.lib.musl.general,gmane.comp.standards.posix.austin.general Subject: Re: Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API? Date: Sat, 9 May 2015 00:55:12 -0700 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7038743253745277232==" X-Trace: ger.gmane.org 1431158269 8102 80.91.229.3 (9 May 2015 07:57:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 9 May 2015 07:57:49 +0000 (UTC) Cc: blees-x8bNZE/nUJk@public.gmane.org, musl-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org, dplakosh-etTNj8cnB6w@public.gmane.org, austin-group-l-7882/jkIBncuagvECLh61g@public.gmane.org, hsutter-0li6OtcxBFHby3iVrkZq2A@public.gmane.org, Clang Dev , James McNellis To: luoyonggang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Original-X-From: cfe-dev-bounces-Tmj1lob9twqVc3sceRu5cw@public.gmane.org Sat May 09 09:57:35 2015 Return-path: Envelope-to: gcccd-cfe-dev-Uylq5CNFT+jYtjvyW6yDsg@public.gmane.org Original-Received: from dcs-maillist2.engr.illinois.edu ([130.126.112.106]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yqzdf-0007UN-0Y for gcccd-cfe-dev-Uylq5CNFT+jYtjvyW6yDsg@public.gmane.org; Sat, 09 May 2015 09:57:35 +0200 Original-Received: from dcs-maillist2.engr.illinois.edu (localhost [127.0.0.1]) by dcs-maillist2.engr.illinois.edu (8.14.4/8.13.1) with ESMTP id t497tMlN028134; Sat, 9 May 2015 02:55:26 -0500 Original-Received: from engr-mail-prod.engr.illinois.edu (engr-mail-prod.engr.illinois.edu [192.17.58.72]) by dcs-maillist2.engr.illinois.edu (8.14.4/8.13.1) with ESMTP id t497tKWt028131 for ; Sat, 9 May 2015 02:55:20 -0500 Original-Received: from pps01.cites.illinois.edu ([192.17.82.69]) by engr-mail-prod.engr.illinois.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1YqzbU-0002Fu-6A for cfe-dev-Tmj1lob9twqVc3sceRu5cw@public.gmane.org; Sat, 09 May 2015 02:55:20 -0500 Original-Received: from mail-ig0-f172.google.com (mail-ig0-f172.google.com [209.85.213.172]) by pps01.cites.illinois.edu (8.14.5/8.14.5) with ESMTP id t497tC3Y030677 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Sat, 9 May 2015 02:55:12 -0500 Original-Received: by igbyr2 with SMTP id yr2so38049102igb.0 for ; Sat, 09 May 2015 00:55:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=iNKOrXcaM+N+1nZ2xwYTk98pd0HVWw5fywvODLiyGrE=; b=NNcW8PoXBrSuaweVJX1IGYZq9zA4Nnq2s9izgW4OH6z3WT1AzoBbyvtsS/VTd9Mfyf Ufh9xjDk1FYNyuveZYbte1YAR/jAJaMjk/LKfxjDWGVD/lYYC2Qo+4YJQoLISQw7B7+J cZ4Seprn3KMSk1qZ/wQN7E8jsYW1qTU8xuLb5nG8UETG8ZtsOBNkcoQreKN+A8d7i4Et bJgQ/cNMVv97LXRk6EFWe5ZCd+2J9nLQ4lix/Bgz2QnvZuP38OvO8gp94PCTdWTr2nP1 IREbfpCqq4nPKW0yiIpu1vrG+SPM7ZdBY6HSPUSyPCQjmf6D1dRc684DFJWdLmNEPGQu FruA== X-Gm-Message-State: ALoCoQlbwb9doqrT47SY4C1LEs1VMSpJ6nRoZWMV+S6sX1UyINI3FpbhGz9mVFh+pGYaV7v+PE04 X-Received: by 10.50.137.100 with SMTP id qh4mr2205409igb.1.1431158112375; Sat, 09 May 2015 00:55:12 -0700 (PDT) Original-Received: by 10.79.2.193 with HTTP; Sat, 9 May 2015 00:55:12 -0700 (PDT) In-Reply-To: X-Spam-Score: 0 X-Spam-Details: rule=cautious_plus_nq_notspam policy=cautious_plus_nq score=0 spamscore=0 suspectscore=3 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1505090102 X-Spam-OrigSender: john-FZqURIBfT/qw5LPnMra/2Q@public.gmane.org X-Spam-Bar: X-BeenThere: cfe-dev-Tmj1lob9twqVc3sceRu5cw@public.gmane.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Clang Front End for LLVM Developers' List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: cfe-dev-bounces-Tmj1lob9twqVc3sceRu5cw@public.gmane.org Errors-To: cfe-dev-bounces-Tmj1lob9twqVc3sceRu5cw@public.gmane.org Xref: news.gmane.org gmane.comp.compilers.clang.devel:42668 gmane.linux.lib.musl.general:7609 gmane.comp.standards.posix.austin.general:10759 Archived-At: --===============7038743253745277232== Content-Type: multipart/alternative; boundary=001a11c31f74e7f1790515a1766e --001a11c31f74e7f1790515a1766e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable wchar_t is also pretty common in the win32 world. You shouldn't assume people use the windows macros. Regardless of what you choose someone is going to lose, so it might make more sense to think about what is more useful long term. In my opinion you almost never want 32-bit wide characters once you learn of their limitations. Most people assume that if they use them they can return to the one character -> one glyph idiom like ASCII. But Unicode is vastly more complex than that and while you avoid surrogates you don't avoid things like combining characters and diacritics so the idiom does not hold. Given that almost every character in frequent use around the world is in the BMP plane 16-bit wide chars make the most sense for most applications. On Fri, May 8, 2015 at 8:16 PM, =E7=BD=97=E5=8B=87=E5=88=9A(Yonggang Luo) <= luoyonggang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Two solution: > 1=E3=80=81Change the width of wchar_t to 16 bit, I guess that would broke= n a > lot of things that exist on Win32 world. > 2=E3=80=81Or we should preserve wchar_t to be 16 bit on win32, and add th= e > char16_t and char32_t > variant API for all API that have both narrow and wide version? > > > I support for the second one, even if the second option is not > applicable. the first option would cause a lot problems, the first > thing is all Windows API use wchar_t and dependent on the wchar_t to > be 2 byte width. Second is, there is open source libraries that > dependent the de fac=C2=B7to that wchar_t to be 16 bit, such as Qt, > Git(maybe). > Almost exist open source libraries that already ported to Win32 are > dependent the the fact wchar_t to be 16 bit, cygwin is also discussed > if getting wchar_t to be 32bit on win32 > > https://www.cygwin.com/ml/cygwin/2011-02/msg00037.html > > > > think there is no one would use > >>>>> wchar_t for cross text processing, cause, on some system, wchar_t i= s > >>>>> just 8bit width! > >>>> > >>>> anybody would use wchar_t who cares about standard conformant > >>>> implementations. > >>>> > >>>> non-standard broken platforms may get an unmaintained #ifdef > >>>> as usual.. > >>> > >>> I think we (and midipix) have a different perspective from Yonggang > >>> Luo on portable development. Our view is that you write to a POSIX (o= r > >>> nearly-POSIX) target with fully working Unicode support and fix the > >>> small number of targets (i.e. just Windows) that don't already provid= e > >> Small is relative, if counting the distribution count, well, Unix wins= . > >>> these things. Yonggang Luo's perspective seems to be more of a > >>> traditional Windows approach with #ifdef and lots of OS-specific code= , > >>> but just making the Windows branch of the #ifdefs less hideous than i= t > >>> was before. :) > >> If getting wchar_t to be 32 bit on win32, then truly will be a lot of > >> #ifdef, I am not so sure > >> if you have experience on Win32 API development, I hope we discussing > >> the problems in a > >> more objective way. > >> > > > > One primary objective of code portability and posix-compatibility layer > > for win32 is to _remove_ the need for OS-specific code-paths. A wchar_t > > that is anything short (no pun intended) of a 32-bit integer will rende= r > > it impossible to build out of the box many pieces of commonly-used > > software, including, but not limited to musl libc, the curses library, > > and anything that expects wchar_t to cover the entire unicode range. > > > > As for your suggested framework: there are currently at least three > > compilers that can produce optimized code for the target platform (gcc, > > clang, and cparser), and which work very well with most open-source > > software out there. As an aside, if you are interested in an 8-byte lon= g > > on 64-bit windows then an open-source compiler is probably your only > > option. To compile musl with msvc, on the other hand, you'd have to mak= e > > so many changes to the source code that you might as well write your ow= n > > libc from scratch. To see why, please attempt to compile some ten or > > fifteen core libc headers (stdio.h, unistd.h, etc.) with msvc. If that > > goes well (spoiler: it won't), then the next step would be to compile a > > subset of the source files (src/pthread or src/stdio, for instance) and > > remove any remaining obstacles. > > > > m. > > > > > >>> > >>> Rich > >> > >> > >> > > > > > > _______________________________________________ > cfe-dev mailing list > cfe-dev-Tmj1lob9twqVc3sceRu5cw@public.gmane.org > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > --001a11c31f74e7f1790515a1766e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
wchar_t is also pretty common in the win32 = world.=C2=A0 You shouldn't assume people use the windows macros.=C2=A0 = Regardless of what you choose someone is going to lose, so it might make mo= re sense to think about what is more useful long term.

In my o= pinion you almost never want 32-bit wide characters once you learn of their= limitations.=C2=A0 Most people assume that if they use them they can retur= n to the one character -> one glyph idiom like ASCII.=C2=A0 But Unicode = is vastly more complex than that and while you avoid surrogates you don'= ;t avoid things like combining characters and diacritics so the idiom does = not hold.

Given that almost every character in frequent use ar= ound the world is in the BMP plane 16-bit wide chars make the most sense fo= r most applications.


On Fri, May 8, 2015 at 8:16 PM, =E7=BD=97=E5=8B=87= =E5=88=9A(Yonggang Luo) <luoyonggang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Two solution:
1=E3=80=81Change the width of wchar_t to 16 bit, I guess that would broken = a
lot of things that exist on Win32 world.
2=E3=80=81Or we should preserve wchar_t to be 16 bit on win32, and add the<= br> char16_t and char32_t
variant API for all API that have both narrow and wide version?


I support for the second one, even if the second option is not
applicable. the first option would cause a lot problems, the first
thing is all Windows API use wchar_t and dependent on the wchar_t to
be 2 byte width.=C2=A0 Second is, there is open source libraries that
dependent the de fac=C2=B7to that wchar_t to be 16 bit, such as Qt,
Git(maybe).
Almost exist open source libraries that already ported to Win32 are
dependent the the fact wchar_t to be 16 bit,=C2=A0 cygwin is also discussed=
if getting wchar_t to be 32bit on win32

https://www.cygwin.com/ml/cygwin/2011-02/msg00037.html


> think there is no one would use
>>>>> wchar_t for cross text processing, cause, on some syst= em, wchar_t is
>>>>> just 8bit=C2=A0 width!
>>>>
>>>> anybody would use wchar_t who cares about standard conform= ant
>>>> implementations.
>>>>
>>>> non-standard broken platforms may get an unmaintained #ifd= ef
>>>> as usual..
>>>
>>> I think we (and midipix) have a different perspective from Yon= ggang
>>> Luo on portable development. Our view is that you write to a P= OSIX (or
>>> nearly-POSIX) target with fully working Unicode support and fi= x the
>>> small number of targets (i.e. just Windows) that don't alr= eady provide
>> Small is relative, if counting the distribution count, well, Unix = wins.
>>> these things. Yonggang Luo's perspective seems to be more = of a
>>> traditional Windows approach with #ifdef and lots of OS-specif= ic code,
>>> but just making the Windows branch of the #ifdefs less hideous= than it
>>> was before. :)
>> If getting wchar_t to be 32 bit on win32, then truly will be a lot= of
>> #ifdef, I am not so sure
>> if you have experience on Win32 API development, I hope we discuss= ing
>> the problems in a
>>=C2=A0 =C2=A0more objective way.
>>
>
> One primary objective of code portability and posix-compatibility laye= r
> for win32 is to _remove_ the need for OS-specific code-paths. A wchar_= t
> that is anything short (no pun intended) of a 32-bit integer will rend= er
> it impossible to build out of the box many pieces of commonly-used
> software, including, but not limited to musl libc, the curses library,=
> and anything that expects wchar_t to cover the entire unicode range. >
> As for your suggested framework: there are currently at least three > compilers that can produce optimized code for the target platform (gcc= ,
> clang, and cparser), and which work very well with most open-source > software out there. As an aside, if you are interested in an 8-byte lo= ng
> on 64-bit windows then an open-source compiler is probably your only > option. To compile musl with msvc, on the other hand, you'd have t= o make
> so many changes to the source code that you might as well write your o= wn
> libc from scratch. To see why, please attempt to compile some ten or > fifteen core libc headers (stdio.h, unistd.h, etc.) with msvc. If that=
> goes well (spoiler: it won't), then the next step would be to comp= ile a
> subset of the source files (src/pthread or src/stdio, for instance) an= d
> remove any remaining obstacles.
>
> m.
>
>
>>>
>>> Rich
>>
>>
>>
>
>

_______________________________________________
cfe-dev mailing list
cfe-dev-Tmj1lob9twqVc3sceRu5cw@public.gmane.org
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

--001a11c31f74e7f1790515a1766e-- --===============7038743253745277232== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ cfe-dev mailing list cfe-dev-Tmj1lob9twqVc3sceRu5cw@public.gmane.org http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev --===============7038743253745277232==--