rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* environment again
@ 2000-06-07  6:53 Carlo Strozzi
  2000-06-08  5:55 ` Decklin Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-07  6:53 UTC (permalink / raw)
  To: rc

Hi all,

a few weeks back I started a thread on the fact that rc should provide
a way of not exporting variables to the environment. There seemed to
be a general position that the current behaviour is ok. I think it
isn't, though:

; ls -l bigfile
-rw-r--r--   1 carlos   carlos     660226 May  8 14:13 bigfile

; a=`{cat bigfile}

; echo $a |wc -c
wc: Argument list too long

Any command after that fails with the same message, presumably because
of the bloated environment.

That was tested on AIX and Linux. I have tested it also with
a=``(){cat bigfile}, if that matters.

Doing the same with ksh:

$ a="`cat bigfile`"

echo "$a" | wc -c
 660226
$

So no problem, so far. Then:

$ export a

$ /bin/echo x
ksh: /bin/echo: Argument list too long

Again, a bloated environment does give problems, but the fact that ksh
does not export things by default is a plus here. "Well, why don't you
just use ksh then?" some of you may say. Because I rather like the
philosophy behind rc, I just would like it not to have those very few
defects. If I were a C programmer I would try to fix it myself, but
even so I would need to convince the rc people if I wanted to have
those changes included into the mainstram code.

As a final note, making rc swallow a large file all in one go would not
be necessary if it had a 'read' function, for line-oriented input, which
was the other thread of mine on this list.

	--carlo



^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: environment again
@ 2000-06-08  7:19 Carlo Strozzi
  2000-06-08 16:48 ` Decklin Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-08  7:19 UTC (permalink / raw)
  To: fosterd; +Cc: rc

Decklin Foster wrote:

| > ; a=`{cat bigfile}
| > ; echo $a |wc -c
|
| Can you give a non-trivial example? This should be 'wc -c < bigfile',
| obviously; I'm wondering what the real problem you're working on is
| where you can't iron out the variable.

Oh, yes, of course that was just an example, but it is meant to
show that bloating the environment is bad. On the other hand, given
the fact that rc does not provide a 'read' builtin, I cannot really
devise how to do to process a file line-by-line other than swallowing
it into memory all in one go and then using $*(n) to reference each
line in turn.

The point is that I use the shell to build the outer layer of
relatively large and complex Web applications. You may argue that I
should resort to a different language, but I would disagree. When it
comes to delivering applications in virtually no time, nothing beats
the shell. I have usually used bourne-type shells for this, but now
that I know of rc I really like it and I would like to switch. Now,
this leads to the next, much more practical example of real-world
problem that may lead to a bloated environment with rc:

Suppose you want to build a Web search engine like Altavista (not
that I built that particular one :-), that will show the results on
the output page in chunks of ten at a time. To render the results I
am using a web page template, that contains a special tag that marks
the point in the page where I want the final result to appare. This
is usually in the body of an html table, where each table row is one
search hit (like Altavista). Unlike Altavista though, suppose that
the output contains the complete hits, not just absracts of them. As
you know, you can put such html structure with all of its formatting
tags all on one single line, as the final rendering will not depend on
physical newlines but rather on the html tags themselves. Furthermore
each output hit can contain other hyperlinks, embedded images,
formatting tags and so on, so the shell variable that is going to
hold it may become pretty large. Then say I want to use sed(1) to
substitute the special tag in the page template with such a result
string, I need to do something like:

  sed 's/__SPECIAL_TAG__/'^$my_big_var'/' page_template.html

which will send the final output to stdout, i.e. to the Web client
Of course I must also provide for escaping any sed(1) special
characters in $my_big_var, so I need to make that variable known to
the shell for all this back-and-forth between different utilities
(that's what the shell is supposed to be: the "glue" between utilities).

Although not easily, I could use an external file rather than
$my_big_var, and use 'sed -f', but that would require a few more i/o
operations in the CGI program. If the site handles a couple of million
hits a day I would really like to avoid that.

Back to the possibility that you suggest I should resort to a different
language for such things, apart from fast go-to-market considerations
a can demonstrate that a lightweight shell + well-choosen utilities
can provide a faster application than other self-contained approaches
(there are books on that).  Unless you want to re-code the whole system
into your application, the very few times you will run an external
utility (like sendmail(8) for instance) you will need a system(3),
which will run a shell (possibly bash(1), which is ten time slower
than rc), so you had better just to use the shell in the first place.

Sorry for the length, but I think that explaining things works
better than posting a few lines of obscure code :-)

As I said I'm awful at C, but is it really that difficult to provide a
way for not exporting everything to the environment by default ?
I think it would simply make rc a better interpreter.

Take care	--carlo



^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: environment again
@ 2000-06-08 19:02 Carlo Strozzi
  2000-06-12 15:23 ` Tim Goodwin
  0 siblings, 1 reply; 7+ messages in thread
From: Carlo Strozzi @ 2000-06-08 19:02 UTC (permalink / raw)
  To: fosterd; +Cc: rc

Decklin Foster wrote:

| Carlo Strozzi writes:
|
| >   sed 's/__SPECIAL_TAG__/'^$my_big_var'/' page_template.html
|
| OK, but where is $my_big_var set? That's what I need to know to figure
| out how to write it without the variable.

Hmm .. ok, I'll try and provide a reasonably simple example. I often
use flat-file ascii tables with the following sample structure:

; cat mytable

Col1     Col2     Col3
----     ----     ----
data1    data2    some-big-data

where the data parts may contain stuff that "bytes", especially if exposed
to an 'eval' statement.

Then suppose I need to use data1 and data2 in several places in my shell
script, and some-big-data needs to be passed to sed(1) for substitution
on the output page template (as per my previous message). Furthermore,
all this needs to be done with the least possible No. of processes,
as we are talking about high traffic web sites. All the above can
actually be done in one single call to awk, like this:

eval `{awk 'awk program' mytable}

where 'awk program' is supposed to:

1) escape sed(1) special chars in some-big-data
2) further escape shell special chars in all the data parts
3) return the following fragment of rc code:

{ Col1='data1'; Col2='data2'; Col3='some-big-data' }

This makes it possible to eval awk's output and grab the results back
into the calling shell script for later use, for instance in the
aforementioned statement:

   sed 's/__SPECIAL_TAG__/'^$Col3'/' page_template.html

Of course one way out could be to have 'awk program' only return

{ Col1='data1'; Col2='data2' }

while writing the sed(1) statement 's/__SPECIAL_TAG__/some-big-data/'
to a temporary file, to be later used as:

           sed -f tmpfile page_template.html

This would cost us:

- one more i/o from awk to write tmpfile
- one more i/o for sed to read tmpfile
- one more rm(1) process to remove tmpfile on exit

which multiplied by the two-million times our CGI gets hit every day ... :-)

Furthermore, 'awk program' would be less general as it would need to
treat some-big-data differently from data1/data2.

| > you will need a system(3),
|
| Feh. fork/exec. But that's totally off-topic, so nevermind.

Oh, I wasn't talking about C, but rather PHP or other mainstream
interpreted languages for CGI scripting, or even awk, that whatever
external program it needs to run it calls it with '/bin/sh -c'.
This is true at least with mawk/gawk.

bye	--carlo



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-06-13 21:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-07  6:53 environment again Carlo Strozzi
2000-06-08  5:55 ` Decklin Foster
2000-06-08  7:19 Carlo Strozzi
2000-06-08 16:48 ` Decklin Foster
2000-06-08 19:02 Carlo Strozzi
2000-06-12 15:23 ` Tim Goodwin
2000-06-12 16:35   ` Carlo Strozzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).