From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17946 invoked from network); 17 Apr 2008 18:33:57 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 17 Apr 2008 18:33:57 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 81583 invoked from network); 17 Apr 2008 18:33:51 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 17 Apr 2008 18:33:51 -0000 Received: (qmail 2796 invoked by alias); 17 Apr 2008 18:33:48 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24833 Received: (qmail 2771 invoked from network); 17 Apr 2008 18:33:47 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 17 Apr 2008 18:33:47 -0000 Received: from rcpt-expgw.biglobe.ne.jp (rcpt-expgw.biglobe.ne.jp [133.205.19.68]) by bifrost.dotsrc.org (Postfix) with ESMTP id CCD588043AC7 for ; Thu, 17 Apr 2008 20:33:38 +0200 (CEST) Received: from smtp-gw.biglobe.ne.jp by rcpt-expgw.biglobe.ne.jp (kbkr/0208160408) with ESMTP id m3HIXb7X021302 for ; Fri, 18 Apr 2008 03:33:37 +0900 X-Biglobe-Sender: Received: from [192.168.0.3] (211.135.242.103 [211.135.242.103]) by smtp-gw.biglobe.ne.jp id DBSVAC15AFDC; Fri, 18 Apr 2008 03:33:36 +0900 (JST) Message-Id: <9F0DCF1B-F5FB-4150-A4FF-C441DE615404@kba.biglobe.ne.jp> From: "Jun T." To: zsh-workers@sunsite.dk In-Reply-To: <20080413175442.0e95a241@pws-pc> Content-Type: text/plain; charset=ISO-2022-JP; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: PATCH: (large) initial support for combining characters in ZLE. Date: Fri, 18 Apr 2008 03:33:36 +0900 References: <20080413175442.0e95a241@pws-pc> X-Mailer: Apple Mail (2.919.2) X-Virus-Scanned: ClamAV 0.91.2/6813/Thu Apr 17 19:00:03 2008 on bifrost X-Virus-Status: Clean Thank you for starting the combining character support! At 17:54 +0100 08.4.13, Peter Stephenson wrote: >the base character must be an alphanumeric (and >I'm not sure about the numeric, I need to find a better definition), and I think this is too restrictive, because in some Asian languages (Japanese, Korean, Thai, etc.) the base character can be non-alphaget. For example, in Japanese, Hiragana/Katakana can be combined with U+3099 (VOICED SOUND MARK) or U+309A (SEMI-VOICED SOUND MARK). Example: U+3057 U+3099 = "じ" the base character U+3057 = "し" is not an alphanumeric. >the zero-width characters afterwards (I haven't imposed a limit on how >many there are) must be punctuation. I guess this is also too restrictive. I have run the code like the following on Fedora7: wchar_t w; setlocale(LC_ALL,""); for(w=1; w<0x2ffff; ++w) { if(wcwidth(w)==0 && iswpunct(w)==0) { printf("%05x: %lc\n",w,w); } } It listed 166 characters, all of which seem to be combining chars in Thai or Korean (U+0e4e and U+1160 may not be combining, I'm not sure). I think strictly defining combined char is virtually impossible, because there are so many "nonsensical" combinations like "Hiragana + umlaut". Even within alphabet, a combination like "x + U+0318" is almost as strange as "space + grave". How about accepting any combination? If terminal emulator displays garbage, the user can turn off the option COMBINING_CHARS to see the hex code.