From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 20 Mar 2006 22:00:01 +0200 From: Aharon Robbins Subject: Re: [9fans] ports from GPL In-reply-to: <20060320021808.91DE411FC1@dexter-peak.quanstro.net> To: 9fans@cse.psu.edu Cc: Message-id: <200603202000.k2KK01Xu006954@skeeve.com> Content-transfer-encoding: 7BIT References: <454862c111d0f6eeb1b584a74ad6506a@quanstro.net> Topicbox-Message-UUID: 1a9f2f60-ead1-11e9-9d60-3106f5b1d025 In article <20060320021808.91DE411FC1@dexter-peak.quanstro.net> you write: >the gnu awk folks are doing a pretty good job, given their constraints. Thanks! I try, I really do. >i have not read the sed code (for a while, anyway), but i could imagine >that it may have the same character set problems as newer versions of gnu grep. >gnu grep calls mbtowc for each input character, even when not required. > >have you tried your test with LC_LANG=C? Make that LC_ALL=C and you'll be on track. (FWIW, the CVS grep is much better than the released version; they've been working on this problem.) And yes, the locale stuff is a *N*I*G*H*T*M*A*R*E*. Much of the heavy lifting was done by others for the dfa and regex code, but I've done my share to get it working too, and I must admit it's often a PITA. Almost always the differences in behavior from LC_ALL=C to LC_ALL=xxx.UTF-8 are due to the locale definitions, not to gawk's handling of UTF characters. That all happens in the (GNU) library, below the level where I can do anything about it. OTOH, when I get fan mail from people in China and other such places who are able to *get their work done* using gawk, it makes things much more worthwhile. And, to completely change the subject, if anyone on this list wants to hire a telecommuter who would LOVE to finally make the jump to Plan 9 without looking back, please drop me a line... Arnold -- Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765 Nof Ayalon Cell Phone: +972 50 729-7545 D.N. Shimshon 99785 ISRAEL