From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <1aad458c439864fdd227ffc52d1cf9fe@granite.cias.osakafu-u.ac.jp>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] awk
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-2022-JP"
Content-Transfer-Encoding: 7bit
From: okamoto@granite.cias.osakafu-u.ac.jp
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="upas-tmthlxvzkzzomvhnwqtpczltyj"
Date: Thu,  7 Nov 2002 18:56:44 +0900
Topicbox-Message-UUID: 17a2154e-eacb-11e9-9e20-41e7f4b1d025

This is a multi-part message in MIME format.
--upas-tmthlxvzkzzomvhnwqtpczltyj
Content-Disposition: inline

I'm not insulting you, but...

As is seen here recently, we seem to have small developpersnow.
Furthermore, this is an example of an application bug, and it's
deeply related to consistency of usage of UTF-8 in an application.
Taking into consideration of these facts, I think you'd better to report
the fix for it, because I believe you can do it.   I'm supposing this seems
not to be a serious bug, probably just in a match function etc.. No I have
no idea for this though.

just my two cents,

Kenji

--upas-tmthlxvzkzzomvhnwqtpczltyj
Content-Type: message/rfc822
Content-Disposition: inline

Received: from granite.cias.osakafu-u.ac.jp ([192.168.1.3]) by diabase; Thu Nov  7 15:51:17 JST 2002
Received: from elmo.cias.osakafu-u.ac.jp (elmo.cias.osakafu-u.ac.jp [157.16.103.2])
	by granite.cias.osakafu-u.ac.jp (8.9.3/8.9.3) with ESMTP id PAA00935
	for <okamoto@granite.cias.osakafu-u.ac.jp>; Thu, 7 Nov 2002 15:47:15 +0900
Received: from mail.cse.psu.edu (psuvax1.cse.psu.edu [130.203.4.6])
	by elmo.cias.osakafu-u.ac.jp (8.9.3/3.7W-02110515) with ESMTP id PAA28312
	for <okamoto@granite.cias.osakafu-u.ac.jp>; Thu, 7 Nov 2002 15:47:18 +0900 (JST)
Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.30.6])
	by mail.cse.psu.edu (CSE Mail Server) with ESMTP
	id D2303199BE; Thu,  7 Nov 2002 01:47:08 -0500 (EST)
Delivered-To: 9fans@cse.psu.edu
Received: from pc.aichi-u.ac.jp (a130035.usr.starcat.ne.jp [61.211.130.35])
	by mail.cse.psu.edu (CSE Mail Server) with SMTP id 4C02B19995
	for <9fans@cse.psu.edu>; Thu,  7 Nov 2002 01:46:32 -0500 (EST)
Message-ID: <d7e2337e18e882fc2b734291a9cc9365@ar.aichi-u.ac.jp>
From: "Kenji Arisawa" <arisawa@ar.aichi-u.ac.jp>
To: 9fans@cse.psu.edu
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Subject: [9fans] awk
Sender: 9fans-admin@cse.psu.edu
Errors-To: 9fans-admin@cse.psu.edu
X-BeenThere: 9fans@cse.psu.edu
X-Mailman-Version: 2.0.11
Precedence: bulk
Reply-To: 9fans@cse.psu.edu
X-Reply-To: "Kenji Arisawa" <arisawa@aichi-u.ac.jp>
List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.cse.psu.edu>
List-Archive: <https://lists.cse.psu.edu/archives/9fans/>
Date: Thu, 7 Nov 2002 15:46:29 +0900
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by granite.cias.osakafu-u.ac.jp id PAA00935

I tested some awk string functions to examine if
they can handle UFT-8 code well.
The bollow is my text code:
#!/bin/rc
#
#	Can awk function handle UTF strings ?
#
echo '=E3=83=99=E3=83=AB:=E7=A0=94=E7=A9=B6=E6=89=80' | awk '{
print $0	# =E3=83=99=E3=83=AB:=E7=A0=94=E7=A9=B6=E6=89=80
print length($0)	# 6
print index($0,":")	# 3
print match($0,":.*"),RSTART, RLENGTH	# 7	7 4
print substr($0,3)	# :=E7=A0=94=E7=A9=B6=E6=89=80
a=3D$0; sub(":.+", "alice", a); print a	# =E3=83=99=E3=83=ABalice
}'

Output is commented after `#' in each line.
Function `match' returns byte position that is inconsitent
with others. I believe this is a bug.

Kenji Arisawa
--upas-tmthlxvzkzzomvhnwqtpczltyj--