From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 28 Feb 2010 17:34:43 -0500 To: 9fans@9fans.net Message-ID: <65310acf780030ea3061084b956a4e97@ladd.quanstro.net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: [9fans] rc rune mishandling (and fix) Topicbox-Message-UUID: dd417eb6-ead5-11e9-9d60-3106f5b1d025 in the process of cleaning trying to get rc working with 4-byte utf-8 sequences, i noticed that rc has a few weak points when it comes to handling runes that have nothing to do with rune size. for example this script ; cat badbq #!/bin/rc nl=' ' ifs=α$nl echo `{echo abαβ} produces this output ; cat /n/sources/contrib/quanstro/src/futharc/badbq | /n/sources/plan9/386/bin/rc ab � this is because Xbackq reads and checks its input one byte at a time. so the first byte of β's two-byte sequence matches the first byte of α in the ifs. we're left with a garbage byte that was the second byte in β's utf sequence. rio turns this into Runeerror. a second problem is in the lexing: ; 8.badrune #!/bin/rc echo hel�;echo 2nd line [...] notice that rc doesn't see the ';' in the echo: ; /n/sources/contrib/quanstro/src/futharc/8.badrune | /n/sources/plan9/386/bin/rc hel�;echo 2nd line [...] this is because rc assumes good input. since 0xc0 starts a two-byte sequence, the second byte doesn't need to get checked. in fact, xd shows that rc emits bad utf: ; /n/sources/contrib/quanstro/src/futharc/8.badrune | /n/sources/plan9/386/bin/rc | xd -c | sed 1q 0000000 h e l c0 ; e c h o 2 n d l i both these problems were addressed by adding a rutf() function to io.c. rutf keeps enough in the io buffer to deal with broken utf at any point in the input. if broken runes are detected, Runeerror is returned and 1 byte of input is consumed. Xbackq was modified to use rutf and the strstr is now safe, since only complete runes are tested. lex was also modified to use rutf, and the byte sequence 0xc0 ';' is now interpreted as Runeerror ';'. the source is in /n/sources/contrib/quanstro/src/futharc with a pre-compiled executable, 8.out. - erik p.s. old differences from standard rc 1. support for history file 2. support for break within loops. p.p.s. other new differences 1. x=( 1 2 ) is now acceptable syntax. 2. Xbackq uses exponential allocation for good behavior on really long input 3. print offending line number on syntax errors.