9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
@ 2009-03-17  0:40 geoff
  2009-03-17 22:16 ` Uriel
  0 siblings, 1 reply; 23+ messages in thread
From: geoff @ 2009-03-17  0:40 UTC (permalink / raw)
  To: 9fans

Setting ifs='' defeats rc's tokenisation, so the result
of `{} will be a series of rc `words', each limited to
Wordmax (8192) bytes and with the next byte of the input
stream after each word set to NUL.

Did you perhaps intend to write ifs=(), which has different
meaning?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17  0:40 [9fans] Strange rc bug for the 9fans bug-squashing squad geoff
@ 2009-03-17 22:16 ` Uriel
  2009-03-17 22:24   ` erik quanstrom
  0 siblings, 1 reply; 23+ messages in thread
From: Uriel @ 2009-03-17 22:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Thanks Geoff for the prompt explanation, but I'm getting the same
results with ifs=() Not sure why, but I'm not sure I understand the
difference between setting ifs to '' and ().

Thanks again

uriel

On Tue, Mar 17, 2009 at 1:40 AM,  <geoff@plan9.bell-labs.com> wrote:
> Setting ifs='' defeats rc's tokenisation, so the result
> of `{} will be a series of rc `words', each limited to
> Wordmax (8192) bytes and with the next byte of the input
> stream after each word set to NUL.
>
> Did you perhaps intend to write ifs=(), which has different
> meaning?
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 22:16 ` Uriel
@ 2009-03-17 22:24   ` erik quanstrom
  2009-03-17 23:14     ` Uriel
  0 siblings, 1 reply; 23+ messages in thread
From: erik quanstrom @ 2009-03-17 22:24 UTC (permalink / raw)
  To: 9fans

On Tue Mar 17 18:17:53 EDT 2009, uriel99@gmail.com wrote:
> Thanks Geoff for the prompt explanation, but I'm getting the same
> results with ifs=() Not sure why, but I'm not sure I understand the
> difference between setting ifs to '' and ().

in your test, try this

	echo $#x

- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 22:24   ` erik quanstrom
@ 2009-03-17 23:14     ` Uriel
  0 siblings, 0 replies; 23+ messages in thread
From: Uriel @ 2009-03-17 23:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> in your test, try this
>
>        echo $#x

I tried that too, I'm getting the same result for an ifs of '' or ().

% ifs=() {x=`{cat f}; echo $#x}
2
% ifs='' {x=`{cat f}; echo $#x}
2

I'm doing something else wrong?

uriel



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18 10:31                 ` maht
@ 2009-03-18 15:27                   ` erik quanstrom
  0 siblings, 0 replies; 23+ messages in thread
From: erik quanstrom @ 2009-03-18 15:27 UTC (permalink / raw)
  To: 9fans

On Wed Mar 18 06:33:49 EDT 2009, mattmobile@proweb.co.uk wrote:
> Using rc in werc neutralizes OS differences to a certain degree,
> obviously some things catch one out, such as this one. (and just wait
> until a \0 comes along!)

this is an easy problem to solve:

	tr '\0' '☺'

- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18 14:16               ` erik quanstrom
@ 2009-03-18 14:36                 ` roger peppe
  0 siblings, 0 replies; 23+ messages in thread
From: roger peppe @ 2009-03-18 14:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/3/18 erik quanstrom <quanstro@coraid.com>:
> the total cost is O(maximum token length) for the
> whole input.  how could this be a problem?

well, if there's only one token (e.g. when ifs=''), it's actually
O(n^2), assuming
that realloc copies every time.

but your first argument is sufficient. i acquiesce.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18 13:52             ` roger peppe
@ 2009-03-18 14:16               ` erik quanstrom
  2009-03-18 14:36                 ` roger peppe
  0 siblings, 1 reply; 23+ messages in thread
From: erik quanstrom @ 2009-03-18 14:16 UTC (permalink / raw)
  To: 9fans

On Wed Mar 18 09:54:54 EDT 2009, rogpeppe@gmail.com wrote:
> 2009/3/18 erik quanstrom <quanstro@quanstro.net>:
> > -                               ewd = wd+l+100-1;
>
> one small comment, based on a totally superficial scan of that diff:
> might it not be better to grow the buffer by some multiplicative
> factor, to avoid linear behaviour when reading large files?
> i often (for no particularly good reason) use 50% as a growth
> factor - it doesn't seem as radical as *2, but will still work ok
> in the long run.

i have two arguments against doing expontential growth:
- other dynamicly allocated buffers in rc are allocated
in increments of 100 bytes.

- the linear behavior would only be for long *tokens*.
the length of the input is irrelavant.  only in the case
of tokens >= 100 chars would there be a second call
to realloc.

the total cost is O(maximum token length) for the
whole input.  how could this be a problem?

- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18 13:18           ` erik quanstrom
@ 2009-03-18 13:52             ` roger peppe
  2009-03-18 14:16               ` erik quanstrom
  0 siblings, 1 reply; 23+ messages in thread
From: roger peppe @ 2009-03-18 13:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/3/18 erik quanstrom <quanstro@quanstro.net>:
> -                               ewd = wd+l+100-1;

one small comment, based on a totally superficial scan of that diff:
might it not be better to grow the buffer by some multiplicative
factor, to avoid linear behaviour when reading large files?
i often (for no particularly good reason) use 50% as a growth
factor - it doesn't seem as radical as *2, but will still work ok
in the long run.

i've probably misread the code though...



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18 10:53         ` roger peppe
@ 2009-03-18 13:18           ` erik quanstrom
  2009-03-18 13:52             ` roger peppe
  0 siblings, 1 reply; 23+ messages in thread
From: erik quanstrom @ 2009-03-18 13:18 UTC (permalink / raw)
  To: 9fans

> 2009/3/17 erik quanstrom <quanstro@quanstro.net>:
> > it is unreasonable to expect to be able to generate tokens
> > that are bigger than 8k.
>
> i'm not sure i agree. they're not just tokens, they're strings,
> and there are lots of reasons why one might wish to
> have a string longer than 8k read from a file. i've certainly done so
> in inferno's sh, which doesn't have this restriction.

you win.

couple of notes -
* same changes to haven't fork, omitted for clarity
* erealloc should be in subr.c and declared in rc.h
   and should be supported by Realloc in (plan9
   unix win32)^.c
* there are two other calls to realloc that should
   be addressed, too.
* the if guarding efree prevents a "free 0" whine.

havefork.c:67,81 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:67,72
  	}
  }

- char*
- erealloc(char *p, long n)
- {
- 	p = realloc(p, n);		/* botch, should be Realloc */
- 	if(p==0)
- 		panic("Can't realloc %d bytes\n", n);
- 	return p;
- }
-
  /*
   * Who should wait for the exit from the fork?
   */
havefork.c:82,89 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:73,81
  void
  Xbackq(void)
  {
- 	int c, l;
- 	char *s, *wd, *ewd, *stop;
+ 	char wd[8193];
+ 	int c;
+ 	char *s, *ewd=&wd[8192], *stop;
  	struct io *f;
  	var *ifs = vlook("ifs");
  	word *v, *nextv;
havefork.c:108,127 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:100,115
  	default:
  		close(pfd[PWR]);
  		f = openfd(pfd[PRD]);
- 		s = wd = ewd = 0;
+ 		s = wd;
  		v = 0;
  		while((c = rchr(f))!=EOF){
- 			if(s==ewd){
- 				l = s-wd;
- 				wd = erealloc(wd, l+100);
- 				ewd = wd+l+100-1;
- 				s = wd+l;
+ 			if(strchr(stop, c) || s==ewd){
+ 				if(s!=wd){
+ 					*s='\0';
+ 					v = newword(wd, v);
+ 					s = wd;
+ 				}
  			}
- 			if(strchr(stop, c) && s!=wd){
- 				*s='\0';
- 				v = newword(wd, v);
- 				s = wd;
- 			}
  			else *s++=c;
  		}
  		if(s!=wd){
havefork.c:128,135 - /n/dump/2009/0316/sys/src/cmd/rc/havefork.c:116,121
  			*s='\0';
  			v = newword(wd, v);
  		}
- 		if(wd)
- 			efree(wd);
  		closeio(f);
  		Waitfor(pid, 0);
  		/* v points to reversed arglist -- reverse it onto argv */


- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18  1:25               ` erik quanstrom
@ 2009-03-18 11:30                 ` Uriel
  0 siblings, 0 replies; 23+ messages in thread
From: Uriel @ 2009-03-18 11:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

But parsing is not the big issue (thanks Charles for sending me a
small program that does just that), the issue here is sticking the
results in a variable (or variables), Russ' suggestion would work, but
only in native Plan 9 and not in p9p.

Still, I might be able to hack something up if characters don't get
deleted, which seems like a real bug to me.

But while I don't like arbitrary limits (specially when I hit them
;)), I can understand that for the sake of simplicity it makes sense
to have them and not fall into the 'lets handle every possibility
under the sun' dogma.

Which makes me wonder, would it be excessive to double the current
limit? While 8k is quite ample, 16k would be even more so :)

Thanks everyone for all the ideas and suggestions

uriel

On Wed, Mar 18, 2009 at 2:25 AM, erik quanstrom <quanstro@quanstro.net> wrote:
> On Tue Mar 17 20:29:50 EDT 2009, uriel99@gmail.com wrote:
>> > why can't you just let ifs = $newline (formatted to fit your screen) ?
>>
>> Unfortunately that doesn't work in this case, my input is HTTP post
>> data, which is a single line of URL-encoded text which I have to
>> decode into multiple parameters of arbitrary length.
>
> why not write a small program to crack the post data.
> might take ½ an hour, tops.
>
> - erik
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 22:43       ` erik quanstrom
  2009-03-17 23:23         ` Uriel
@ 2009-03-18 10:53         ` roger peppe
  2009-03-18 13:18           ` erik quanstrom
  1 sibling, 1 reply; 23+ messages in thread
From: roger peppe @ 2009-03-18 10:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/3/17 erik quanstrom <quanstro@quanstro.net>:
> it is unreasonable to expect to be able to generate tokens
> that are bigger than 8k.

i'm not sure i agree. they're not just tokens, they're strings,
and there are lots of reasons why one might wish to
have a string longer than 8k read from a file. i've certainly done so
in inferno's sh, which doesn't have this restriction.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18  1:23               ` Russ Cox
  2009-03-18  7:31                 ` Gabriel Díaz López de la Llave
@ 2009-03-18 10:31                 ` maht
  2009-03-18 15:27                   ` erik quanstrom
  1 sibling, 1 reply; 23+ messages in thread
From: maht @ 2009-03-18 10:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

not every environment is Plan 9 so /env is not an option.

Arbitrary limits seem a bit, well, arbitrary ! (not that I'm complaining).
With a flat memory address space and Gbs of memory chucking a realloc in
there is not totally out of technical bounds.

Using rc in werc neutralizes OS differences to a certain degree,
obviously some things catch one out, such as this one. (and just wait
until a \0 comes along!)

In this case it might make sense to inspect Content-Length and
Content-Type and awk it with FS="&" to individual files and then inspect
their size
And then someone will want to upload Mime !



> On Tue, Mar 17, 2009 at 5:26 PM, Uriel <uriel99@gmail.com> wrote:
>
>> Unfortunately that doesn't work in this case, my input is HTTP post
>> data, which is a single line of URL-encoded text which I have to
>> decode into multiple parameters of arbitrary length.
>>
>
> writing a shell script doesn't mean you have to
> write everything in the shell.  why not write a
> simple c program that reads stdin, decodes the
> key=value arguments, and writes each "value" to
> /env/form_key?
>
> russ
>
>
>




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18  1:23               ` Russ Cox
@ 2009-03-18  7:31                 ` Gabriel Díaz López de la Llave
  2009-03-18 10:31                 ` maht
  1 sibling, 0 replies; 23+ messages in thread
From: Gabriel Díaz López de la Llave @ 2009-03-18  7:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello

http://plan9.aichi-u.ac.jp/pegasus/cgitools/formparse.html

It works well for me, seems than Kenji Arisawa thought about this when
developing his http tools pegasus and rit.

Take a look a it, may be it helps you.

Slds.

Gabi



On 18/03/09 2:23, "Russ Cox" <rsc@swtch.com> wrote:

> On Tue, Mar 17, 2009 at 5:26 PM, Uriel <uriel99@gmail.com> wrote:
>> Unfortunately that doesn't work in this case, my input is HTTP post
>> data, which is a single line of URL-encoded text which I have to
>> decode into multiple parameters of arbitrary length.
>
> writing a shell script doesn't mean you have to
> write everything in the shell.  why not write a
> simple c program that reads stdin, decodes the
> key=value arguments, and writes each "value" to
> /env/form_key?
>
> russ
>
>





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18  0:26             ` Uriel
  2009-03-18  1:23               ` Russ Cox
@ 2009-03-18  1:25               ` erik quanstrom
  2009-03-18 11:30                 ` Uriel
  1 sibling, 1 reply; 23+ messages in thread
From: erik quanstrom @ 2009-03-18  1:25 UTC (permalink / raw)
  To: 9fans

On Tue Mar 17 20:29:50 EDT 2009, uriel99@gmail.com wrote:
> > why can't you just let ifs = $newline (formatted to fit your screen) ?
>
> Unfortunately that doesn't work in this case, my input is HTTP post
> data, which is a single line of URL-encoded text which I have to
> decode into multiple parameters of arbitrary length.

why not write a small program to crack the post data.
might take ½ an hour, tops.

- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-18  0:26             ` Uriel
@ 2009-03-18  1:23               ` Russ Cox
  2009-03-18  7:31                 ` Gabriel Díaz López de la Llave
  2009-03-18 10:31                 ` maht
  2009-03-18  1:25               ` erik quanstrom
  1 sibling, 2 replies; 23+ messages in thread
From: Russ Cox @ 2009-03-18  1:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Mar 17, 2009 at 5:26 PM, Uriel <uriel99@gmail.com> wrote:
> Unfortunately that doesn't work in this case, my input is HTTP post
> data, which is a single line of URL-encoded text which I have to
> decode into multiple parameters of arbitrary length.

writing a shell script doesn't mean you have to
write everything in the shell.  why not write a
simple c program that reads stdin, decodes the
key=value arguments, and writes each "value" to
/env/form_key?

russ


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 23:26           ` erik quanstrom
@ 2009-03-18  0:26             ` Uriel
  2009-03-18  1:23               ` Russ Cox
  2009-03-18  1:25               ` erik quanstrom
  0 siblings, 2 replies; 23+ messages in thread
From: Uriel @ 2009-03-18  0:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> why can't you just let ifs = $newline (formatted to fit your screen) ?

Unfortunately that doesn't work in this case, my input is HTTP post
data, which is a single line of URL-encoded text which I have to
decode into multiple parameters of arbitrary length.

Still, if no characters were getting lost, I probably can figure some
way to work around the issue and stitch things together after they get
split.

uriel



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 23:23         ` Uriel
@ 2009-03-17 23:26           ` erik quanstrom
  2009-03-18  0:26             ` Uriel
  0 siblings, 1 reply; 23+ messages in thread
From: erik quanstrom @ 2009-03-17 23:26 UTC (permalink / raw)
  To: 9fans

> >> Right now having the output of `{} corrupted can be quite inconvenient...
> >
> > it is unreasonable to expect to be able to generate tokens
> > that are bigger than 8k.
>
> Well, I would prefer if such limit didn't exist ;) But it doesn't seem
> like a totally unreasonable limit either.

why can't you just let ifs = $newline (formatted to fit your screen) ?

- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 22:43       ` erik quanstrom
@ 2009-03-17 23:23         ` Uriel
  2009-03-17 23:26           ` erik quanstrom
  2009-03-18 10:53         ` roger peppe
  1 sibling, 1 reply; 23+ messages in thread
From: Uriel @ 2009-03-17 23:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Mar 17, 2009 at 11:43 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> On Tue Mar 17 18:29:14 EDT 2009, uriel99@gmail.com wrote:
>> Thanks martin for your analysis, this makes some sense to me, but as I
>> pointed out, even setting ifs to () doesn't solve the issue, so it
>> would be nice to find a solution to this.
>>
>> Right now having the output of `{} corrupted can be quite inconvenient...
>
> it is unreasonable to expect to be able to generate tokens
> that are bigger than 8k.

Well, I would prefer if such limit didn't exist ;) But it doesn't seem
like a totally unreasonable limit either.

>  however, the '8' should not be dropped.

Yes, this is the critical issue, at least if the tokens are just
split, one can join them up by hand if needed, but as things are now
the data gets corrupted in ways that at least at first are mystifying,
and which are hard to work around.

> i would think this small change would be worth
> consideration.

I will give it a try when I get a chance, but if it fixes the lost
chars, I'll be happy.

Thanks!

uriel

> ; diffy -c havefork.c
> /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:74,80 - havefork.c:74,80
>  Xbackq(void)
>  {
>        char wd[8193];
> -       int c;
> +       int c, trunc;
>        char *s, *ewd=&wd[8192], *stop;
>        struct io *f;
>        var *ifs = vlook("ifs");
> /n/dump/2009/0317/sys/src/cmd/rc/havefork.c:105,113 - havefork.c:105,116
>                while((c = rchr(f))!=EOF){
>                        if(strchr(stop, c) || s==ewd){
>                                if(s!=wd){
> +                                       trunc = s == ewd;
>                                        *s='\0';
>                                        v = newword(wd, v);
>                                        s = wd;
> +                                       if(trunc)
> +                                               *s++ = c;
>                                }
>                        }
>                        else *s++=c;
>
> - erik
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17 22:27     ` Uriel
@ 2009-03-17 22:43       ` erik quanstrom
  2009-03-17 23:23         ` Uriel
  2009-03-18 10:53         ` roger peppe
  0 siblings, 2 replies; 23+ messages in thread
From: erik quanstrom @ 2009-03-17 22:43 UTC (permalink / raw)
  To: 9fans

On Tue Mar 17 18:29:14 EDT 2009, uriel99@gmail.com wrote:
> Thanks martin for your analysis, this makes some sense to me, but as I
> pointed out, even setting ifs to () doesn't solve the issue, so it
> would be nice to find a solution to this.
>
> Right now having the output of `{} corrupted can be quite inconvenient...

it is unreasonable to expect to be able to generate tokens
that are bigger than 8k.  however, the '8' should not
be dropped.  i would think this small change would be worth
consideration.

; diffy -c havefork.c
/n/dump/2009/0317/sys/src/cmd/rc/havefork.c:74,80 - havefork.c:74,80
  Xbackq(void)
  {
  	char wd[8193];
- 	int c;
+ 	int c, trunc;
  	char *s, *ewd=&wd[8192], *stop;
  	struct io *f;
  	var *ifs = vlook("ifs");
/n/dump/2009/0317/sys/src/cmd/rc/havefork.c:105,113 - havefork.c:105,116
  		while((c = rchr(f))!=EOF){
  			if(strchr(stop, c) || s==ewd){
  				if(s!=wd){
+ 					trunc = s == ewd;
  					*s='\0';
  					v = newword(wd, v);
  					s = wd;
+ 					if(trunc)
+ 						*s++ = c;
  				}
  			}
  			else *s++=c;

- erik



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17  2:01   ` Martin Neubauer
@ 2009-03-17 22:27     ` Uriel
  2009-03-17 22:43       ` erik quanstrom
  0 siblings, 1 reply; 23+ messages in thread
From: Uriel @ 2009-03-17 22:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Thanks martin for your analysis, this makes some sense to me, but as I
pointed out, even setting ifs to () doesn't solve the issue, so it
would be nice to find a solution to this.

Right now having the output of `{} corrupted can be quite inconvenient...

Thanks

uriel

On Tue, Mar 17, 2009 at 3:01 AM, Martin Neubauer <m.ne@gmx.net> wrote:
> On second thought (and in the light of Geoffs reply) I probably won't.
> If you do care, the following change to the loop in question will at
> least preserve all input:
>
>                while((c = rchr(f))!=EOF){
>                        if(strchr(stop, c)){
>                                if(s!=wd){
>                                        *s='\0';
>                                        v = newword(wd, v);
>                                        s = wd;
>                                }
>                        }
>                        else if(s==ewd){
>                                *s='\0';
>                                v = newword(wd, v);
>                                s = wd;
>                                *s++=c;
>                        }
>                        else *s++=c;
>                }
>
> With a dynamic buffer the tokenisation could be prevented, but in your
> example the lexical scanner would quite likely bail afterwards.  (I
> remember a discussion some time ago about this.)
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-17  1:31 ` Martin Neubauer
@ 2009-03-17  2:01   ` Martin Neubauer
  2009-03-17 22:27     ` Uriel
  0 siblings, 1 reply; 23+ messages in thread
From: Martin Neubauer @ 2009-03-17  2:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On second thought (and in the light of Geoffs reply) I probably won't.
If you do care, the following change to the loop in question will at
least preserve all input:

		while((c = rchr(f))!=EOF){
			if(strchr(stop, c)){
				if(s!=wd){
					*s='\0';
					v = newword(wd, v);
					s = wd;
				}
			}
			else if(s==ewd){
				*s='\0';
				v = newword(wd, v);
				s = wd;
				*s++=c;
			}
			else *s++=c;
		}

With a dynamic buffer the tokenisation could be prevented, but in your
example the lexical scanner would quite likely bail afterwards.  (I
remember a discussion some time ago about this.)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Strange rc bug for the 9fans bug-squashing squad
  2009-03-16 23:26 Uriel
@ 2009-03-17  1:31 ` Martin Neubauer
  2009-03-17  2:01   ` Martin Neubauer
  0 siblings, 1 reply; 23+ messages in thread
From: Martin Neubauer @ 2009-03-17  1:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hi,

I think the following gives a clue:

	% cmp f f2
	f f2 differ: char 8193

The following snippet from the Xbackq code seems to be the culprit:

	char wd[8193];
	int c;
	char *s, *ewd=&wd[8192], *stop;

	...

		while((c = rchr(f))!=EOF){
			if(strchr(stop, c) || s==ewd){
				if(s!=wd){
					*s='\0';
					v = newword(wd, v);
					s = wd;
				}
			}
			else *s++=c;
		}

Keeping the loop from dropping characters is trivial.  Getting rid of
the inserted space probably requires a dynamic buffer.  I might give
it a shot.

Regards,
	Martin

* Uriel (uriel99@gmail.com) wrote:
> At first I thought very big rc variables seem to become strangely corrupted.
>
> % for(i in `{seq 1000}) { echo 0123456789 >> f }
> % ifs='' {x=`{cat f}}
> % echo -n $x > f2
> % diff f f2
> 745c745
> < 0123456789
> ---
> > 01234567 9
>
> But the bug seems to be in `{ } because replacing the use of the x var
> with simply:
>
> % ifs='' { echo -n `{cat f} > f2}
>
> Produces the same results.
>
> Longer strings get more random(?) characters 'blanked'.
>
> The results are identical in p9p and native plan9.
>
> I looked a bit around the rc source that seemed relevant, but didn't
> see any obvious errors, but I don't fully understand the code.
>
> Peace
>
> uriel



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [9fans] Strange rc bug for the 9fans bug-squashing squad
@ 2009-03-16 23:26 Uriel
  2009-03-17  1:31 ` Martin Neubauer
  0 siblings, 1 reply; 23+ messages in thread
From: Uriel @ 2009-03-16 23:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

At first I thought very big rc variables seem to become strangely corrupted.

% for(i in `{seq 1000}) { echo 0123456789 >> f }
% ifs='' {x=`{cat f}}
% echo -n $x > f2
% diff f f2
745c745
< 0123456789
---
> 01234567 9

But the bug seems to be in `{ } because replacing the use of the x var
with simply:

% ifs='' { echo -n `{cat f} > f2}

Produces the same results.

Longer strings get more random(?) characters 'blanked'.

The results are identical in p9p and native plan9.

I looked a bit around the rc source that seemed relevant, but didn't
see any obvious errors, but I don't fully understand the code.

Peace

uriel



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-03-18 15:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-17  0:40 [9fans] Strange rc bug for the 9fans bug-squashing squad geoff
2009-03-17 22:16 ` Uriel
2009-03-17 22:24   ` erik quanstrom
2009-03-17 23:14     ` Uriel
  -- strict thread matches above, loose matches on Subject: below --
2009-03-16 23:26 Uriel
2009-03-17  1:31 ` Martin Neubauer
2009-03-17  2:01   ` Martin Neubauer
2009-03-17 22:27     ` Uriel
2009-03-17 22:43       ` erik quanstrom
2009-03-17 23:23         ` Uriel
2009-03-17 23:26           ` erik quanstrom
2009-03-18  0:26             ` Uriel
2009-03-18  1:23               ` Russ Cox
2009-03-18  7:31                 ` Gabriel Díaz López de la Llave
2009-03-18 10:31                 ` maht
2009-03-18 15:27                   ` erik quanstrom
2009-03-18  1:25               ` erik quanstrom
2009-03-18 11:30                 ` Uriel
2009-03-18 10:53         ` roger peppe
2009-03-18 13:18           ` erik quanstrom
2009-03-18 13:52             ` roger peppe
2009-03-18 14:16               ` erik quanstrom
2009-03-18 14:36                 ` roger peppe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).