9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: [9fans] rc rune mishandling (and fix)
Date: Sun, 28 Feb 2010 17:34:43 -0500	[thread overview]
Message-ID: <65310acf780030ea3061084b956a4e97@ladd.quanstro.net> (raw)

in the process of cleaning trying to get rc working with 4-byte
utf-8 sequences, i noticed that rc has a few weak points when
it comes to handling runes that have nothing to do with rune
size.  for example this script
	; cat badbq
	#!/bin/rc
	nl='
	'
	ifs=α$nl echo `{echo abαβ}

produces this output
	; cat /n/sources/contrib/quanstro/src/futharc/badbq |
		/n/sources/plan9/386/bin/rc
	ab �

this is because Xbackq reads and checks its input one byte
at a time.  so the first byte of β's two-byte sequence matches
the first byte of α in the ifs.  we're left with a garbage byte
that was the second byte in β's utf sequence.  rio turns this
into Runeerror.

a second problem is in the lexing:
	; 8.badrune
	#!/bin/rc
	echo hel�;echo 2nd line
	[...]

notice that rc doesn't see the ';' in the echo:
	; /n/sources/contrib/quanstro/src/futharc/8.badrune |
		/n/sources/plan9/386/bin/rc
	hel�;echo 2nd line
	[...]

this is because rc assumes good input.  since 0xc0 starts a
two-byte sequence, the second byte doesn't need to get checked.
in fact, xd shows that rc emits bad utf:
 	; /n/sources/contrib/quanstro/src/futharc/8.badrune |
		/n/sources/plan9/386/bin/rc | xd -c | sed 1q
	0000000   h  e  l c0  ;  e  c  h  o     2  n  d     l  i

both these problems were addressed by adding a rutf()
function to io.c.  rutf keeps enough in the io buffer to
deal with broken utf at any point in the input.  if broken
runes are detected, Runeerror is returned and 1 byte of
input is consumed.  Xbackq was modified to use rutf and
the strstr is now safe, since only complete runes are tested.
lex was also modified to use rutf, and the byte sequence
0xc0 ';' is now interpreted as Runeerror ';'.

the source is in /n/sources/contrib/quanstro/src/futharc
with a pre-compiled executable, 8.out.

- erik

p.s. old differences from standard rc
1.  support for history file
2.  support for break within loops.

p.p.s. other new differences
1.
x=(
	1
	2
	)
is now acceptable syntax.
2.  Xbackq uses exponential allocation for good behavior on
really long input
3.  print offending line number on syntax errors.




                 reply	other threads:[~2010-02-28 22:34 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65310acf780030ea3061084b956a4e97@ladd.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).