From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: From: "Kenji Arisawa" To: 9fans@cse.psu.edu MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: [9fans] awk Date: Thu, 7 Nov 2002 15:46:29 +0900 Topicbox-Message-UUID: 1782527c-eacb-11e9-9e20-41e7f4b1d025 I tested some awk string functions to examine if they can handle UFT-8 code well. The bollow is my text code: #!/bin/rc # # Can awk function handle UTF strings ? # echo 'ベル:研究所' | awk '{ print $0 # ベル:研究所 print length($0) # 6 print index($0,":") # 3 print match($0,":.*"),RSTART, RLENGTH # 7 7 4 print substr($0,3) # :研究所 a=$0; sub(":.+", "alice", a); print a # ベルalice }' Output is commented after `#' in each line. Function `match' returns byte position that is inconsitent with others. I believe this is a bug. Kenji Arisawa