9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] rc rune mishandling (and fix)
@ 2010-02-28 22:34 erik quanstrom
  0 siblings, 0 replies; only message in thread
From: erik quanstrom @ 2010-02-28 22:34 UTC (permalink / raw)
  To: 9fans

in the process of cleaning trying to get rc working with 4-byte
utf-8 sequences, i noticed that rc has a few weak points when
it comes to handling runes that have nothing to do with rune
size.  for example this script
	; cat badbq
	#!/bin/rc
	nl='
	'
	ifs=α$nl echo `{echo abαβ}

produces this output
	; cat /n/sources/contrib/quanstro/src/futharc/badbq |
		/n/sources/plan9/386/bin/rc
	ab �

this is because Xbackq reads and checks its input one byte
at a time.  so the first byte of β's two-byte sequence matches
the first byte of α in the ifs.  we're left with a garbage byte
that was the second byte in β's utf sequence.  rio turns this
into Runeerror.

a second problem is in the lexing:
	; 8.badrune
	#!/bin/rc
	echo hel�;echo 2nd line
	[...]

notice that rc doesn't see the ';' in the echo:
	; /n/sources/contrib/quanstro/src/futharc/8.badrune |
		/n/sources/plan9/386/bin/rc
	hel�;echo 2nd line
	[...]

this is because rc assumes good input.  since 0xc0 starts a
two-byte sequence, the second byte doesn't need to get checked.
in fact, xd shows that rc emits bad utf:
 	; /n/sources/contrib/quanstro/src/futharc/8.badrune |
		/n/sources/plan9/386/bin/rc | xd -c | sed 1q
	0000000   h  e  l c0  ;  e  c  h  o     2  n  d     l  i

both these problems were addressed by adding a rutf()
function to io.c.  rutf keeps enough in the io buffer to
deal with broken utf at any point in the input.  if broken
runes are detected, Runeerror is returned and 1 byte of
input is consumed.  Xbackq was modified to use rutf and
the strstr is now safe, since only complete runes are tested.
lex was also modified to use rutf, and the byte sequence
0xc0 ';' is now interpreted as Runeerror ';'.

the source is in /n/sources/contrib/quanstro/src/futharc
with a pre-compiled executable, 8.out.

- erik

p.s. old differences from standard rc
1.  support for history file
2.  support for break within loops.

p.p.s. other new differences
1.
x=(
	1
	2
	)
is now acceptable syntax.
2.  Xbackq uses exponential allocation for good behavior on
really long input
3.  print offending line number on syntax errors.




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-02-28 22:34 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-28 22:34 [9fans] rc rune mishandling (and fix) erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).