zsh-workers
 help / color / mirror / code / Atom feed
* Re: PATCH: parser (was: Re: PATCH: Improved _mailboxes)
@ 2000-02-25  8:41 Sven Wischnowsky
  2000-02-25  9:44 ` Precompiled wordcode zsh functions Bart Schaefer
  2000-02-25  9:55 ` PATCH: parser (was: Re: PATCH: Improved _mailboxes) Andrej Borsenkow
  0 siblings, 2 replies; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-25  8:41 UTC (permalink / raw)
  To: zsh-workers


Bart Schaefer wrote:

> On Feb 24, 10:07am, Sven Wischnowsky wrote:
> } Subject: RE: PATCH: parser (was: Re: PATCH: Improved _mailboxes)
> }
> } 
> } Andrej Borsenkow wrote:
> } 
> } > zcodeload file
> 
> Let's not do that, shall we?  Let's stick with autoload and have a file
> suffix convention, like emacs' .el and .elc, or something.  Heck, there
> could even be separate fpath and compiled_fpath or ...

I was wondering what to do when the directory isn't writable... but a
$COMPILED_FPATH containing one directory would be enough. Hm. Do you
want to say that you actually like the idea? Making everything ready
for the mmap would be quite simple. The only problem I can see is that 
we would need to have a wordcode-verifier (but, of course, that can be 
done). That's yet another reason for having only a scalar containing
only one directory name (so $COMPILED_FDIR might be a better name) --
save compiled functions only if that is set and names an existing,
writable directory. Users would set it to a directory in their account 
so that others can't trick them into using evil code.

> } All this also makes me think about a way to allow multiple zsh's to
> } share other memory bits (like the command table and so on). How
> } portable is anonymous shared mmap or shared mmap on /dev/null?
> 
> Do we really want to go down the road of having e.g. zmodload in one
> zsh suddenly make new builtins available to another zsh?  I don't want
> the behavior of a script that's running in the background to change
> because of something I loaded into my foreground shell ...

Should be configurable, of course. And to be turned on explicitly. If
at all...

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Precompiled wordcode zsh functions
@ 2000-02-25 10:42 Sven Wischnowsky
  2000-02-25 17:35 ` Bart Schaefer
  0 siblings, 1 reply; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-25 10:42 UTC (permalink / raw)
  To: zsh-workers


Bart Schaefer wrote:

> ...
>
> } [...] we would need to have a wordcode-verifier [...]
> 
> How does emacs assure the integrity of .elc files?  Or does it?

Dunno. What I'm worried about is that the parser catches wrong shell
code, but for wordcode... (of course, modules have the same problem,
probably even worse).

> } That's yet another reason for having only a scalar containing
> } only one directory name (so $COMPILED_FDIR might be a better name) --
> } save compiled functions only if that is set and names an existing,
> } writable directory. Users would set it to a directory in their account 
> } so that others can't trick them into using evil code.
> 
> Zsh should probably already be more paranoid than it is about loading
> modules or functions from widely-writable directories or files.  But
> that has nothing to do with how many such directories or files are
> involved.  Where does "save compiled functions" come in?  I'd think
> we'd want an explicit "zcompile" builtin so functions can selectively
> be compiled or not.  I don't want it just automatically writing out
> wordcode for every function it ever loads.

In the light of Andrej's last comments, how about:

Add a builtin (`zcompile' if you wish), that gets a list of
filenames. The first one is used as the file to write the code for all 
functions named by the other filenames into. These have to name
existing function files (not necessarily in $fpath). So the generated
file is a kind of digest containing the code for multiple functions.

Then: $fpath may also contain names of such digest files. In
getfpfunc() (that's where we load autoloaded functions), if the name
of a digest file in $fpath is found, the file is searched for the
definition of the function we are seeking. If it contains this
function, the thing is mapped and the Eprog is set up. We would keep a 
list of already mapped files, of course, and if all functions used in
such a file are re-defined or unfunction'ed, we unmap it.

One problem: should there be some warning if the digest file is older
than the function file (if that is reachable through $fpath)? I.e. do
we have to test that?

Second problem: functions like _cvs that essentially just define lots
of functions and re-define themselves[1]. The mapped function would of 
course be the short lived function-defining one.

Bye
 Sven

[1] I was always against doing it that way ;-)

--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Precompiled wordcode zsh functions
@ 2000-02-25 11:31 Sven Wischnowsky
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-25 11:31 UTC (permalink / raw)
  To: zsh-workers


I wrote:

> Second problem: functions like _cvs that essentially just define lots
> of functions and re-define themselves[1]. The mapped function would of 
> course be the short lived function-defining one.

Forget that. Function definitions are stored in a way that allows us
to use them directly for the function Eprog (without allocating
separate memory). So...

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Precompiled wordcode zsh functions
@ 2000-02-28 10:07 Sven Wischnowsky
  2000-02-28 14:50 ` Sven Wischnowsky
  0 siblings, 1 reply; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-28 10:07 UTC (permalink / raw)
  To: zsh-workers


[ I implemented my ideas at the weekend and got to read this mail
  today, so I'm withholding the patch for now... ]


Bart Schaefer wrote:

> On Feb 25, 11:42am, Sven Wischnowsky wrote:
> } Subject: Re: Precompiled wordcode zsh functions
> }
> } Add a builtin (`zcompile' if you wish), that gets a list of
> } filenames. The first one is used as the file to write the code for all 
> } functions named by the other filenames into. These have to name
> } existing function files (not necessarily in $fpath). So the generated
> } file is a kind of digest containing the code for multiple functions.
> } 
> } Then: $fpath may also contain names of such digest files.
> 
> So far so good, though I'd still prefer if such a file could just sit
> inside a directory in $fpath (or some other searched path) and be loaded
> like any other autoloaded function.  (Which means there needs to be some
> sort of convention for choosing the compiled file if both compiled and
> uncompiled functions are present.)

Hm. If we think about one file per function, we should certainly make
them be found in the directories in $fpath. But that would basically
give us two types of dump-files, unless we make them detectable
(e.g. by the .zwc extension you suggest) and make getfpfunc() search
all .zwc files for the function we are trying to load. Hm or maybe to
different kinds of lookup: if getfpfunc() finds out that one of the
strings in $fpath isn't a directory containing a file with the name
searched, it tries to use it as a dump-file containing multiple
functions and checks if it contains the definition for the function
searched (that' basically how the stuff I wrote works). But if the
directory from $fpath just being handled is a directory and it
contains a file <name>.zwc, we use that (at this time we could compare 
the modification times for <name> and <name>.zwc, of course).

> } In getfpfunc() (that's where we load autoloaded functions), if the
> } name of a digest file in $fpath is found, the file is searched for
> } the definition of the function we are seeking. If it contains this
> } function, the thing is mapped and the Eprog is set up.
> 
> Hmm.  Probably there'd have to be a "directory" at the top of the file
> with the names and offsets (or some such) of all the functions therein.

That's what my implementation does. Since this is currently only
intended for files containing lots of functions, they are always
mapped. Even mapped completely for now, could probably be changed to
map them step by step as more and more functions from it are
used. Although I'm really not that concerned about memory usage
here. The completion function (only the _* files) take up somewhat
less than 300KB, btw.

> That header could also contain some flags determined at compile time,
> such as whether the file should be mmap'd or merely read.  Such a flag
> would normally be computed by the compiler based on the size or some
> such criteria, but could be overridden by an option to the "zcompile"
> (or whatever) builtin.  Thus if one wanted to have a lot of small files
> with only one function each, the result would not be a zillion mmaps.

Hm, yes, hadn't thought about that. I'm not so sure about the
automatical detection of the flag since it would involve some kind of
threshold. It's always so difficult to find a good value (a page size?
per function or for the whole file if it contains more than one
function?).

> } One problem: should there be some warning if the digest file is older
> } than the function file (if that is reachable through $fpath)? I.e. do
> } we have to test that?
> 
> I *think* emacs detects that condition only when the .el and .elc are
> in the same directory.  Certainly we shouldn't go searching the entire
> fpath to verify every compiled function, particularly if there is more
> than one function in each wordcode file.

Yep. The implementation I have now does nothing about this, because it 
only thinks about `digest' files.

> } Second problem: functions like _cvs that essentially just define lots
> } of functions and re-define themselves[1].
> 
> I saw your follow-up, but one remark:  That technique would no longer be
> necessary because loading the wordcode file would immediately define all
> the functions therein without having to execute one of them first.

But it doesn't do any harm either -- it is very fast (with such a
dump-file in your fpath those initial completion where the functions
were loaded and parsed become, of course a lot faster and defining
functions in functions from a dump-file is very fast, too).

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Precompiled wordcode zsh functions
@ 2000-02-29  7:45 Sven Wischnowsky
  2000-02-29  8:15 ` Andrej Borsenkow
  2000-02-29  8:21 ` Bart Schaefer
  0 siblings, 2 replies; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-29  7:45 UTC (permalink / raw)
  To: zsh-workers


Bart Schaefer wrote:

> On Feb 28, 11:07am, Sven Wischnowsky wrote:
> } Subject: Re: Precompiled wordcode zsh functions
> }
> } Hm. If we think about one file per function, we should certainly make
> } them be found in the directories in $fpath. [...] if getfpfunc() finds
> } out that one of the strings in $fpath isn't a directory containing
> } a file with the name searched, it tries to use it as a dump-file
> } containing multiple functions and checks if it contains the definition
> } [... b]ut if the directory from $fpath just being handled is a
> } directory and it contains a file <name>.zwc, we use that (at this time
> } we could compare the modification times for <name> and <name>.zwc, of
> } course).
> 
> Yes.  Note that I think the files should have the .zwc extension in both
> cases; the only difference is whether the loading code opens the file and
> searches its internal "directory," or simply matches on the file name.

I've hacked more yesterday, reaching the state Zefram talked about,
sans the endian-ness-independence. Like you two I favour the approach
with files containing two versions. I hope to find the time for that
this evening.

Oh, and currently the shell does not check the extension of a digest
file and it doesn't compare the file times for compiled/non-compiled
functions... yet.

> ...
> 
> } Have .zwcb and .zwcl suffixes.
> 
> We should be friendly to those who compile zsh under Cygwin, or to Amol
> if he decides to update his NT port, and use only three-letter suffixes.
> Perhaps .zbw and .zlw for 32-bit ints and .zbl and .zll for 64-bit?
> Though IMO it'd be better if we could stick to 32 bits and one suffix.

I definitely want to stay with 32 bits. Although currently it is
dependent of the size of integers, I hope to make that architecture
independent and took care to always use the type `wordcode' instead of 
`int'. I still have to check -- do we have a configure test for the
size of ints?

> } A .zwc file in a directory in $fpath acts exactly like a normal
> } textual function definition file, except that it is in wordcode
> } instead of text; it should take precedence over any file (of either
> } type) further down $fpath, but we may want to do a date comparison
> } if both textual and wordcode files exist in the same directory. A
> } digest file should actually be listed in $fpath; its definitions take
> } precedence over directories (and digest files) further down $fpath.
> 
> I'm a bit worried about functions getting redefined -- and about
> functions that *need* to get redefined, e.g. a .zwc file representing
> a "package" may contain a function whose name clashes with one that
> the user defined earlier in $fpath.  In the current state of the world
> (without wordcode files) the package clobbers the user's function
> unless the package author has made an effort to avoid it (as in
> Completion/User/_cvs).  Emacs .el and .elc have that same behavior.
> What Zefram has suggested for function digest files would behave more
> like standard path hashing.

Yep. In my implementation digest files are really only one-file-
directories. I.e. they are searched like normal directories by
getfpfunc() (more precisely a utility function used by it). It will
not define all functions in the digest file immediatly. I really
prefer that behaviour because a user has to worry about nothing when,
for example, he wants to override one of the functions with his own
definition in a directory earlier in $fpath.

> Do we need some way to express at compile time whether a digest is a
> package with internal dependencies vs. a mere collection of otherwise
> unrelated functions?

I don't think so, if we keep the current behaviour.


Oh, and, btw, for testing purposes I set the threshold (when a
function gets mapped instead of being read) to 4096 bytes. The result
was that only very few functions (around ten) would be mapped. If we
use two pages as the threshold (or one page on a box with page-size == 
8192), no function will be mapped. I don't really have an opinion
about this, because I'll use it with one big wordcode file for the
whole completion system (and other functions I have)... so I won't do
much testing there, leaving it to all of you to decide (once I have
the patch in representable shape, so that you can play with it).

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Precompiled wordcode zsh functions
@ 2000-02-29  7:52 Sven Wischnowsky
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-29  7:52 UTC (permalink / raw)
  To: zsh-workers


I wrote:

> I definitely want to stay with 32 bits. Although currently it is
> dependent of the size of integers, I hope to make that architecture
> independent and took care to always use the type `wordcode' instead of 
> `int'. I still have to check -- do we have a configure test for the
> size of ints?

I forgot to ask: *are* there any machines with sizeof(int) == 8? And
if yes, do they have sizeof(short) == 4?


And about the threshold: when we make the wordcode files architecture
independent, we probably shouldn't make  it relative to the page size.

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Precompiled wordcode zsh functions
@ 2000-02-29 11:42 Sven Wischnowsky
  0 siblings, 0 replies; 15+ messages in thread
From: Sven Wischnowsky @ 2000-02-29 11:42 UTC (permalink / raw)
  To: zsh-workers


Bart Schaefer wrote:

> On Feb 29,  8:45am, Sven Wischnowsky wrote:
> } Subject: Re: Precompiled wordcode zsh functions
> }
> } [...]  In my implementation digest files are really only one-file-
> } directories. I.e. they are searched like normal directories by
> } getfpfunc() (more precisely a utility function used by it). It will
> } not define all functions in the digest file immediately. I really
> } prefer that behaviour because a user has to worry about nothing when,
> } for example, he wants to override one of the functions with his own
> } definition in a directory earlier in $fpath.
> 
> I'm concerned that we should at least have a way to produce a warning
> about it.  I mean, if I were to invent a function named `_files' that
> had nothing to do with completion, and put it in a directory early in
> my $fpath -- even PWS's guide recommends putting your own functions
> before distributed ones -- three-quarters of the completion system
> would be mysteriously broken for me.  If the whole completion system
> has been hidden inside one giant file, how do I find out what has gone
> wrong?

That's one of the reasons why I'm not too happy with the thought of
installing such a digest file per-default. I mean, maybe we should
just leave it to the user to create his/her own digest files
containing the stuff (s)he really wants. The 400KB (yes, it's 400, the 
300 was a typo, sorry) isn't that much, is it? With that we would have 
the same situation as now. Also, the functions in the digest can, of
course, be listed, so it's the same problem as looking into the
directories in $fpath. Hm, maybe a function that checks everything in
$fpath to see which names are defined more than once? [1]

> And lest you think this is farfetched, please note that I've had the
> following in my .zshenv for many years now[*]:
> 
>     alias calc="noglob _calc"
>     _calc() { awk "BEGIN {print $*}" < /dev/null }
> 
> So existing user functions with leading underscores are not out of the
> question.  

I had functions beginning with an underscore myself...

> Oh, and what's the handling with respect to kshautoload vs. a function
> like _cvs that wants to define other functions and then call itself?

Currently zcompile just puts the contents of the files into the
wordcode files. I.e. functions in them behave exactly like the files.


Bye
 Sven

[1] There are other interesting possibilities for functions wrt
    compilation: a `recompile' function that checks file dates and
    digest files. A function for syntax-checking a file -- that's
    possible because the zcompile reports parse errors as usual and
    one can use /dev/null as the name of the target wordcode file.

--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2000-02-29 11:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-02-25  8:41 PATCH: parser (was: Re: PATCH: Improved _mailboxes) Sven Wischnowsky
2000-02-25  9:44 ` Precompiled wordcode zsh functions Bart Schaefer
2000-02-25  9:55 ` PATCH: parser (was: Re: PATCH: Improved _mailboxes) Andrej Borsenkow
2000-02-25 10:42 Precompiled wordcode zsh functions Sven Wischnowsky
2000-02-25 17:35 ` Bart Schaefer
2000-02-25 11:31 Sven Wischnowsky
2000-02-28 10:07 Sven Wischnowsky
2000-02-28 14:50 ` Sven Wischnowsky
2000-02-28 18:18   ` Zefram
2000-02-29  4:22     ` Bart Schaefer
2000-02-29  7:45 Sven Wischnowsky
2000-02-29  8:15 ` Andrej Borsenkow
2000-02-29  8:21 ` Bart Schaefer
2000-02-29  7:52 Sven Wischnowsky
2000-02-29 11:42 Sven Wischnowsky

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).