rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* Re: environment again
@ 2000-06-08  7:19 Carlo Strozzi
  2000-06-08 16:48 ` Decklin Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-08  7:19 UTC (permalink / raw)
  To: fosterd; +Cc: rc

Decklin Foster wrote:

| > ; a=`{cat bigfile}
| > ; echo $a |wc -c
|
| Can you give a non-trivial example? This should be 'wc -c < bigfile',
| obviously; I'm wondering what the real problem you're working on is
| where you can't iron out the variable.

Oh, yes, of course that was just an example, but it is meant to
show that bloating the environment is bad. On the other hand, given
the fact that rc does not provide a 'read' builtin, I cannot really
devise how to do to process a file line-by-line other than swallowing
it into memory all in one go and then using $*(n) to reference each
line in turn.

The point is that I use the shell to build the outer layer of
relatively large and complex Web applications. You may argue that I
should resort to a different language, but I would disagree. When it
comes to delivering applications in virtually no time, nothing beats
the shell. I have usually used bourne-type shells for this, but now
that I know of rc I really like it and I would like to switch. Now,
this leads to the next, much more practical example of real-world
problem that may lead to a bloated environment with rc:

Suppose you want to build a Web search engine like Altavista (not
that I built that particular one :-), that will show the results on
the output page in chunks of ten at a time. To render the results I
am using a web page template, that contains a special tag that marks
the point in the page where I want the final result to appare. This
is usually in the body of an html table, where each table row is one
search hit (like Altavista). Unlike Altavista though, suppose that
the output contains the complete hits, not just absracts of them. As
you know, you can put such html structure with all of its formatting
tags all on one single line, as the final rendering will not depend on
physical newlines but rather on the html tags themselves. Furthermore
each output hit can contain other hyperlinks, embedded images,
formatting tags and so on, so the shell variable that is going to
hold it may become pretty large. Then say I want to use sed(1) to
substitute the special tag in the page template with such a result
string, I need to do something like:

  sed 's/__SPECIAL_TAG__/'^$my_big_var'/' page_template.html

which will send the final output to stdout, i.e. to the Web client
Of course I must also provide for escaping any sed(1) special
characters in $my_big_var, so I need to make that variable known to
the shell for all this back-and-forth between different utilities
(that's what the shell is supposed to be: the "glue" between utilities).

Although not easily, I could use an external file rather than
$my_big_var, and use 'sed -f', but that would require a few more i/o
operations in the CGI program. If the site handles a couple of million
hits a day I would really like to avoid that.

Back to the possibility that you suggest I should resort to a different
language for such things, apart from fast go-to-market considerations
a can demonstrate that a lightweight shell + well-choosen utilities
can provide a faster application than other self-contained approaches
(there are books on that).  Unless you want to re-code the whole system
into your application, the very few times you will run an external
utility (like sendmail(8) for instance) you will need a system(3),
which will run a shell (possibly bash(1), which is ten time slower
than rc), so you had better just to use the shell in the first place.

Sorry for the length, but I think that explaining things works
better than posting a few lines of obscure code :-)

As I said I'm awful at C, but is it really that difficult to provide a
way for not exporting everything to the environment by default ?
I think it would simply make rc a better interpreter.

Take care	--carlo



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: environment again
  2000-06-08  7:19 environment again Carlo Strozzi
@ 2000-06-08 16:48 ` Decklin Foster
  0 siblings, 0 replies; 7+ messages in thread
From: Decklin Foster @ 2000-06-08 16:48 UTC (permalink / raw)
  To: carlos; +Cc: rc

Carlo Strozzi writes:

>   sed 's/__SPECIAL_TAG__/'^$my_big_var'/' page_template.html

OK, but where is $my_big_var set? That's what I need to know to figure
out how to write it without the variable.

> Back to the possibility that you suggest I should resort to a different
> language for such things,

I didn't say that ;-)

> you will need a system(3),

Feh. fork/exec. But that's totally off-topic, so nevermind.

-- 
There is no TRUTH. There is no REALITY. There is no CONSISTENCY. There
are no ABSOLUTE STATEMENTS. I'm very probably wrong. -- BSD fortune(6)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: environment again
  2000-06-12 15:23 ` Tim Goodwin
@ 2000-06-12 16:35   ` Carlo Strozzi
  0 siblings, 0 replies; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-12 16:35 UTC (permalink / raw)
  To: Tim Goodwin; +Cc: rc

On Mon, Jun 12, 2000 at 04:23:55PM +0100, Tim Goodwin wrote:
| > { Col1='data1'; Col2='data2'; Col3='some-big-data' }
| > [...]
| >    sed 's/__SPECIAL_TAG__/'^$Col3'/' page_template.html
| 
| What's the problem with simply doing this?  Is Col3 so enormous that
| you're running into environment variable limits? Or are you worried
| about performance?

No, it is the former, i.e. size. Again, I'm not a C programmer but
I noticed that with ksh(1) if a keep a large variable local it
works fine, while exporting it to the environment produces the
same problem that I get with rc(1), as I showed in my original
posting on this thread. Anyway, having huge shell variables may
indeed not be a good idea. That very thing (combined with the fact
that is seems really hard to get you rc guys to change anything
of that shell :-)) suggests that it is probably better to write such
data out to a temporary file and handle it differently.

| Finally, if I've understood correctly what you're trying to achieve,
| there is a workaround, although it's rather ugly.
| 
|     eval 'Col3=() sed s/__SPECIAL_TAG__/'^$Col3^'/ page_template.html'

Hmm ... wouldn't the following be equivalent ?

    Col3=() { sed 's/__SPECIAL_TAG__/'^$Col3^'/' page_template.html }


bye	--carlo
-- 
I can read MIME or uuencoded e-mail attachments in PDF, Postscript, HTML,
RTF or text formats. Please do not send Word, Excel or PowerPoint files. 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: environment again
  2000-06-08 19:02 Carlo Strozzi
@ 2000-06-12 15:23 ` Tim Goodwin
  2000-06-12 16:35   ` Carlo Strozzi
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Goodwin @ 2000-06-12 15:23 UTC (permalink / raw)
  To: carlos; +Cc: rc

> { Col1='data1'; Col2='data2'; Col3='some-big-data' }
> [...]
>    sed 's/__SPECIAL_TAG__/'^$Col3'/' page_template.html

What's the problem with simply doing this?  Is Col3 so enormous that
you're running into environment variable limits?  Or are you worried
about performance?

If it's performance, "profile, don't speculate!"  It would be very
interesting to see some test results showing just how much it costs to
have some enormous variables in the environment.

Finally, if I've understood correctly what you're trying to achieve,
there is a workaround, although it's rather ugly.

    eval 'Col3=() sed s/__SPECIAL_TAG__/'^$Col3^'/ page_template.html'

Tim.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: environment again
@ 2000-06-08 19:02 Carlo Strozzi
  2000-06-12 15:23 ` Tim Goodwin
  0 siblings, 1 reply; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-08 19:02 UTC (permalink / raw)
  To: fosterd; +Cc: rc

Decklin Foster wrote:

| Carlo Strozzi writes:
|
| >   sed 's/__SPECIAL_TAG__/'^$my_big_var'/' page_template.html
|
| OK, but where is $my_big_var set? That's what I need to know to figure
| out how to write it without the variable.

Hmm .. ok, I'll try and provide a reasonably simple example. I often
use flat-file ascii tables with the following sample structure:

; cat mytable

Col1     Col2     Col3
----     ----     ----
data1    data2    some-big-data

where the data parts may contain stuff that "bytes", especially if exposed
to an 'eval' statement.

Then suppose I need to use data1 and data2 in several places in my shell
script, and some-big-data needs to be passed to sed(1) for substitution
on the output page template (as per my previous message). Furthermore,
all this needs to be done with the least possible No. of processes,
as we are talking about high traffic web sites. All the above can
actually be done in one single call to awk, like this:

eval `{awk 'awk program' mytable}

where 'awk program' is supposed to:

1) escape sed(1) special chars in some-big-data
2) further escape shell special chars in all the data parts
3) return the following fragment of rc code:

{ Col1='data1'; Col2='data2'; Col3='some-big-data' }

This makes it possible to eval awk's output and grab the results back
into the calling shell script for later use, for instance in the
aforementioned statement:

   sed 's/__SPECIAL_TAG__/'^$Col3'/' page_template.html

Of course one way out could be to have 'awk program' only return

{ Col1='data1'; Col2='data2' }

while writing the sed(1) statement 's/__SPECIAL_TAG__/some-big-data/'
to a temporary file, to be later used as:

           sed -f tmpfile page_template.html

This would cost us:

- one more i/o from awk to write tmpfile
- one more i/o for sed to read tmpfile
- one more rm(1) process to remove tmpfile on exit

which multiplied by the two-million times our CGI gets hit every day ... :-)

Furthermore, 'awk program' would be less general as it would need to
treat some-big-data differently from data1/data2.

| > you will need a system(3),
|
| Feh. fork/exec. But that's totally off-topic, so nevermind.

Oh, I wasn't talking about C, but rather PHP or other mainstream
interpreted languages for CGI scripting, or even awk, that whatever
external program it needs to run it calls it with '/bin/sh -c'.
This is true at least with mawk/gawk.

bye	--carlo



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: environment again
  2000-06-07  6:53 Carlo Strozzi
@ 2000-06-08  5:55 ` Decklin Foster
  0 siblings, 0 replies; 7+ messages in thread
From: Decklin Foster @ 2000-06-08  5:55 UTC (permalink / raw)
  To: carlos; +Cc: rc

Carlo Strozzi writes:

> ; a=`{cat bigfile}
> ; echo $a |wc -c

Can you give a non-trivial example? This should be 'wc -c < bigfile',
obviously; I'm wondering what the real problem you're working on is
where you can't iron out the variable.

I'd tenatively say that I think a shell should provide you with enough
facilites not to have to ever make variables that big, but you might
come back at me with a doozy...

-- 
There is no TRUTH. There is no REALITY. There is no CONSISTENCY. There
are no ABSOLUTE STATEMENTS. I'm very probably wrong. -- BSD fortune(6)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* environment again
@ 2000-06-07  6:53 Carlo Strozzi
  2000-06-08  5:55 ` Decklin Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-07  6:53 UTC (permalink / raw)
  To: rc

Hi all,

a few weeks back I started a thread on the fact that rc should provide
a way of not exporting variables to the environment. There seemed to
be a general position that the current behaviour is ok. I think it
isn't, though:

; ls -l bigfile
-rw-r--r--   1 carlos   carlos     660226 May  8 14:13 bigfile

; a=`{cat bigfile}

; echo $a |wc -c
wc: Argument list too long

Any command after that fails with the same message, presumably because
of the bloated environment.

That was tested on AIX and Linux. I have tested it also with
a=``(){cat bigfile}, if that matters.

Doing the same with ksh:

$ a="`cat bigfile`"

echo "$a" | wc -c
 660226
$

So no problem, so far. Then:

$ export a

$ /bin/echo x
ksh: /bin/echo: Argument list too long

Again, a bloated environment does give problems, but the fact that ksh
does not export things by default is a plus here. "Well, why don't you
just use ksh then?" some of you may say. Because I rather like the
philosophy behind rc, I just would like it not to have those very few
defects. If I were a C programmer I would try to fix it myself, but
even so I would need to convince the rc people if I wanted to have
those changes included into the mainstram code.

As a final note, making rc swallow a large file all in one go would not
be necessary if it had a 'read' function, for line-oriented input, which
was the other thread of mine on this list.

	--carlo



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-06-13 21:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-08  7:19 environment again Carlo Strozzi
2000-06-08 16:48 ` Decklin Foster
  -- strict thread matches above, loose matches on Subject: below --
2000-06-08 19:02 Carlo Strozzi
2000-06-12 15:23 ` Tim Goodwin
2000-06-12 16:35   ` Carlo Strozzi
2000-06-07  6:53 Carlo Strozzi
2000-06-08  5:55 ` Decklin Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).