From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3984 Path: news.gmane.org!not-for-mail From: Roy Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Re: Big5 "mostly" complete Date: Wed, 28 Aug 2013 08:57:24 +0800 Message-ID: References: <20130817205757.GA32462@brightrain.aerifal.cx> <20130818073229.GE20515@brightrain.aerifal.cx> <20130827015349.GG20515@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: Quoted-Printable X-Trace: ger.gmane.org 1377651454 18419 80.91.229.3 (28 Aug 2013 00:57:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 28 Aug 2013 00:57:34 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3988-gllmg-musl=m.gmane.org@lists.openwall.com Wed Aug 28 02:57:38 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1VEU4n-0006VB-EY for gllmg-musl@plane.gmane.org; Wed, 28 Aug 2013 02:57:37 +0200 Original-Received: (qmail 23654 invoked by uid 550); 28 Aug 2013 00:57:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 23640 invoked from network); 28 Aug 2013 00:57:36 -0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 150 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 203186096008.static.ctinets.com User-Agent: Opera Mail/11.64 (Win32) Xref: news.gmane.org gmane.linux.lib.musl.general:3984 Archived-At: On Tue, 27 Aug 2013 09:53:49 +0800, Rich Felker wrot= e: > On Sun, Aug 18, 2013 at 07:19:57PM +0800, Roy wrote: >> On Sun, 18 Aug 2013 15:32:29 +0800, Rich Felker = >> wrote: >> >> >On Sun, Aug 18, 2013 at 12:20:47PM +0800, Roy wrote: >> >>Both Big5-UAO and Big5-HKSCS are needed for those Taiwan people and= >> >>Hong Kong people. >> >>For Big5-UAO, there is some commonly used dingbats(for example "=E2= =99=A1" >> >>mark) and numeric representations(for example "=E2=91=A0") are in B= ig5-UAO >> >>but not in CP950. >> >>and Big5-UAO is still being used not only in ptt.cc telnet BBS, but= >> >>also in text data files(file lists/cue sheets) because of >> >>not-supporting UTF-8 in applications(for example, Perl File-system >> >>I/O in windows, CD-Rippers). >> >>for Big5-HKSCS, it use used for storing commonly used Cantonese >> >>ideographs (for example, "=F0=A8=8B=A2" means "lift" in Cantonese) = in Hong >> >>Kong. >> > >> >HKSCS is supported as of yesterday's commit. I'm aware that it's >> >needed for representing Cantonese language in Big5, and that it's >> >widely used on the web. >> > >> >What I'm not clear on is the necessity of UAO. Keep in mind that ico= nv >> >is an API for information interchange: things like interpreting web >> >content, email, old text files, etc. The fact that UAO exists is not= >> >alone reason to support it; it has to actually have usefulness in >> >situations where the iconv interface should be used. If you want to >> >see it included, this is what you need to convince us of: >> > >> >- That it's in widespread use in large volumes of existing data (on >> > the web, text files, etc.) or data that is being newly generated >> > (e.g. as a default encoding of popular mail software). >> >> People are told *NOT* to publish file with Big5-UAO to the web(or >> say, people, even the creator of UAO, appeal to people that not to >> publish file with Big5-UAO to the web), but still there are some >> that's in archive format.(Like I said before, for example cue-sheet >> file of CD-ROM image, etc.) >> But for local data processing, UAO does facilitate file managing to >> windows users. > > Based on this, I think: > > (1) It's reasonable to omit UAO for now, and > (2) Support for iconv to load user-defined character mappings would be= > a worthwhile feature to work on post-1.0. > That is good. But I have few feature request about this: - user-defined mapping can be overlayed to another coding, just like HKS= CS = does. - user-defined mapping can be embedded to static-linked binary. And for Unicode to CJK legacy encodings is a must (hope it is available = = before musl-1.0) > My reasoning is that the goal of iconv in musl, at least for the > built-in character set conversions, is to facilitate information > interchange, particularly reading of data that may be received in > email, as documents published on the web, via IRC or IM protocols, > etc. An encoding whose creators specifically request that it NOT be > used for publishing/interchange is well outside this scope. Yeah it is not encouraged for publishing since it is not a standard and = = people are not encouraged to install UAO blindly, but people do use it f= or = private interchange(like sending files via ftp/instant messaging) > > I agree with your examples (CD-ROM cue sheets, archived text files, > that telnet BBS, etc.) that there is a need by some users to > process/import data encoded in UAO, but most of these usages do not > seem to require general applications, treating charsets in an > abstract, MIME-style manner, to be able to handle it. For many of the > examples, a command-line conversion utility (BTW, there are ones much > more powerful than iconv out there) would be the logical choice. For > the BBS, my understanding is that most of its users are using special > telnet/terminal apps with the conversion built-in. > >> >- That it's necessary to represent linguistic content in languages >> > used in Taiwan, not just as a substitute for Unicode to represent >> > foreign languages. >> >> It does, some Chinese ideographs are used as part of name, but not >> in CP950 mapping like "=E5=96=86" and "=E5=A0=83". > > How do these users send email or enter their names in web-based apps? > My guess would be that the email clients switch to UTF-8 when > encountering a character they can't encode in Big5, and that, > nowadays, most web apps are built on CMS that are Unicode-based. Is > this correct? > Yes, most popular web apps are using UTF-8 nowadays. In the past, people enter (=E6=96=B9=E6=96=B9=E5=9C=9F) as =E5=A0=83 and= (=E5=90=89=E5=90=89) as =E5=96=86, and they may = install ChinaSea/UAO/etc. charset extensions for =E5=A0=83 and =E5=96=86= as well. >> >- That failure to support it would put musl's iconv in a worse >> > position of compatibility than other iconv implementations or >> > software-specific (e.g. in-browser) character set conversions. >> >> Since people made Big5-UAO patch for libiconv and glibc(gconv) >> unofficially to meet their uses, if musl libc have an optional >> Big5-UAO mapping will be an advantage to Taiwan people. > > *nod* > > For what it's worth, how do those patches handle it? Do they add a new= > "Big5-UAO" charset name to iconv, or do they modify the existing Big5 > to treat it as UAO? The original patches by Tiberius Teng modify Big5 with Big5-UAO mappings= . I'm trying to reach Tiberius and get the patch if available. And there is another libiconv patch that adds big5-uao encoding instead.= http://ku.myftp.org/goods/libiconv-1.11-uao.patch.bz2 > > My feeling for now is to increase the priority of adding custom local > charmap files to iconv after musl 1.0 is released. My main reason is > that "intended for information interchange" vs "intended only for > local use" seems to be the best guideline for whether an encoding is > appropriate to include built-in. > > Rich