From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/15080 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: JeanHeyd Meneide Newsgroups: gmane.linux.lib.musl.general Subject: Re: [ Guidance ] Potential New Routines; Requesting Help Date: Mon, 30 Dec 2019 13:53:45 -0500 Message-ID: References: <87zhfg185y.fsf@mid.deneb.enyo.de> <20191226021354.GE30412@brightrain.aerifal.cx> <20191230172822.GH30412@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="29758"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Florian Weimer , musl@lists.openwall.com To: Rich Felker Original-X-From: musl-return-15096-gllmg-musl=m.gmane.org@lists.openwall.com Mon Dec 30 19:54:09 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1im0B7-0007eR-IY for gllmg-musl@m.gmane.org; Mon, 30 Dec 2019 19:54:09 +0100 Original-Received: (qmail 16246 invoked by uid 550); 30 Dec 2019 18:54:07 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 16228 invoked from network); 30 Dec 2019 18:54:06 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=T9XPG+1V/CIXFjDi3B4dAAVQ7M+Q2p/U8uPrAC5/8UQ=; b=L0J5D5h99q88UVZcSKNKRgJUGcp5/vAZ1vfOpUqB4rUgWOhwCM+PSUjNsRXfbz2qvE /Crusp5b+4V9BDoEsZ3rsuC//XBzhVW2q+m9XnQWY2qKh866MtsKpnyG4m5hogIHjWzx Bm4819oNRUlMaIyRh30xr/J8tSs+cj3i0kedpwaGHW3GzY6Y+1MZpzU/mL0DF67R1RCU KaNB5h8LpXjIr7iWr+1lxULqLaJSKZqmmQ4u6MjoW+2r2KveK9/KqaZ+7oMFh+0R08JM lG3QT/qP9CNolvWk/grNgOlD3qeQ7XMUV9QS7s2cdKEDMZ+ACYgBdXM9tb313akKR6Ug GDbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=T9XPG+1V/CIXFjDi3B4dAAVQ7M+Q2p/U8uPrAC5/8UQ=; b=kU4hV/wwcgwLuby/+ccKcdBMO+0A2bl9R+GN7c5ZC6XuQeKXVvz/VMDF6fh7PMENcC eWNB/a/2PtM+cVHu01rMSDpCau5qJFAL1T76Uwr91Z1atNY9RK1X629P6NK2AR+0cUgs Rg5iJvi+OUHAqdHFMv55hHey3t5lgFNqA+SnvnkzaMbl4XpdtGQZn+5cAAnp43rKXhXA CdRBynOQLB4CHr8veABejZN9VltCDpVwcD7+3EuzvluVCaRgfTtOd089j/uUfUOtvmy5 DPPvgJBzdYlgfizwwlEP3s3U869AX57picLleXc8uNS/rBnDWjPBD+3DCArGuvTSSDoF as2A== X-Gm-Message-State: APjAAAW7xcALsBpPz/sl0bowCZ+JwgLahR7J0cT4HyN3IUfFPSZFTovc GSmnGHc0YwkoEq1R7NGswhx7WCAPCMoU3uov4XE= X-Google-Smtp-Source: APXvYqxeJRktfMoWO+fGizJBJFW1+1YsnAnxLR6KLJGLs+MqyddPOVwFTV+osWYXftDKiouLp1VURrC+2Jc0tSZPXH0= X-Received: by 2002:a67:d011:: with SMTP id r17mr26966307vsi.159.1577732035152; Mon, 30 Dec 2019 10:53:55 -0800 (PST) In-Reply-To: <20191230172822.GH30412@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:15080 Archived-At: On Mon, Dec 30, 2019 at 12:28 PM Rich Felker wrote: > I think you misunderstood my remarks here. I was not talking about > invention of new charsets (which we seem to agree should not happen), > but making it possible to use existing legacy charsets which were > previously not usable as a locale's encoding due to limitations of the > C APIs. I see making that possible as counter-productive. It does not > serve to let users keep doing something they were already doing > (compatibility), only do to something newly backwards. My goal is to allow developers to go from an encoding they do not control fully (the multibyte encoding) to an encoding they know and can reason about in their program (c8, for example). This is why I am providing the mb -> cNN and wc -> cNN functions in both single-character and string forms. The hope is to make it easy to go from a statically known encoding (modulo difficulties from __STD_C_UTF16/32__ not being defined) to the platform encoding, and vice-versa, using the same style of functions like mb(s)(r)towc(s) and wc(s)(r)tomb(s). > > ... I will, however, note that the paper > > specifically wants to add the Restartable versions of "single unit" wc > > and mb to/from functions. > > I don't follow. mbrtowc and wcrtomb already exist and have since at > least C99. Apologies, I meant doing wc <-> cNN and mb <-> cNN! > > ... > > > > This means that while wcto* and *towc functions are broken, the > > I don't see them as broken. They support every encoding that has ever > worked in the past as the encoding for a locale (tautologically). The > only way they're "broken" is if you want to add new locale encodings > that weren't previously supportable. Apologies; this was in reference to wide characters given a not UTF-32 interpretation on certain platforms like Windows and certain flavors of IBM. They chose 16 bits, which can't accommodate Unicode without needing multiple wchar_t. Unfortunately, this means that they were really out of luck before DR488 was accepted: they had no means to return multiple wchar_t for characters outside the 16-bit maximum. With DR488, restartable functions have the potential to convert out properly (albeit, the DR was only applied to char16_t functions, so while I have a hope and a wish we can fix it for their platforms it might not work out for the wcto* and *towc functions anyways). char16_t functions, though, should offer those platforms a better way out (though not a perfect one: they'll need to rely on platform knowledge and perform some casts). > ... > > Conversion of arbitrary encodings other than the one in use by the > locale requires a different API that takes encodings by name or some > other identifier. The standard (POSIX) API for this is iconv, which > has plenty of limitations of its own, some the same as what you've > identified. Absolutely agreed! I just want the ones that the platform controls (wide character and multibyte character encodings) to have correct, simple paths to static encodings that can be used for more rigorous text processing. Sincerely, JeanHeyd Meneide