From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from apollo.le.ac.uk ([143.210.16.125]) by hawkwind.utcs.utoronto.ca with SMTP id <25294>; Wed, 5 Jan 2000 18:52:01 -0500 Received: from happy.star.le.ac.uk ([143.210.36.58]) by apollo.le.ac.uk with smtp (Exim 3.03 #1) id 125p4V-0001Gl-00 for rc@hawkwind.utcs.toronto.edu; Wed, 05 Jan 2000 11:57:47 +0000 Received: (qmail 1733 invoked from network); 5 Jan 2000 11:58:09 -0000 Received: from ltpcg.star.le.ac.uk (tjg@143.210.36.203) by happy.star.le.ac.uk with SMTP; 5 Jan 2000 11:58:09 -0000 In-Reply-To: <199912151632.LAA19961@smtp3.fas.harvard.edu> To: rc@hawkwind.utcs.toronto.edu Subject: Dynamically loading readline on demand (was Re: rc futures) From: Tim Goodwin Message-ID: Date: Wed, 5 Jan 2000 06:59:36 -0500 > > 23. Dynamically load readline only when rc is about to read from a > > terminal device. > What operating system doesn't demand-load the binaries anyway? > If you're not using the readline code, (the large majority of) it won't > be resident in memory. Sure, but it still slows down fork(). My guess is that this is because of the extra page table entries to be managed. I swear I'm not making this up, and I have data to prove it. But first, let's examine the status quo. Specifying `--with-readline' to the rc configure script ends up simply appending `-lreadline' to the link command line (and maybe `-ltermcap' or similar). This will link rc against readline either dynamically, if the linker finds a libreadline.so, or statically, if the linker only finds libreadline.a or you said `-static' or similar. The INSTALL documentation has this advice to offer. It is a good idea to specify static linking (unless you are linking against a dynamic readline library)---if you are using gcc, you can do this by setting `CFLAGS=-static'. My assumption when I wrote this was that readline is big enough that it's always worth linking against it dynamically if you can. That was a clear violation of the rule: "profile, don't speculate!", so now let me make amends... I took an rc script that does nothing (makes no system calls) except fork() and wait() 10000 times. Dividing the total CPU time to run the script by 10000 gives an average loop time. There are two variables of interest: static versus dynamic linking, and no-readline versus readline; giving four variants of rc to try. (Remember that rc always needs the C library.) Here are the results I got on my 200MHz Linux PC (us = microseconds). rc.static 744us rc.rl.static 865us rc.dynamic 1071us rc.rl.dynamic 1442us (The raw data, together with file sizes, are appended. Lest you think that this is a quirk of Linux, I have even more dramatic results from SunOS 5. Why does dynamic linking increase the user time? I have no idea, but it reliably does on both these platforms.) My conclusion from this is that using readline is an insignificant (although measurable) performance hit. But dynamic linking is always a bad idea. I therefore intend to drop my suggestion that rc should itself load readline on demand, and instead to make the following minor changes for rc-1.6b3. 1. Modify the INSTALL documentation; we should always recommend linking statically. 2. Modify configure.in, so that, if we are using gcc, `-static' is specified. Any comments? Tim. text data bss dec hex filename 347773 13204 16008 376985 5c099 rc.static 12.58user 61.78system 1:14.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (100040major+2150702minor)pagefaults 0swaps text data bss dec hex filename 427989 25748 19500 473237 73895 rc.rl.static 13.07user 73.42system 1:26.66elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (200044major+2281366minor)pagefaults 0swaps text data bss dec hex filename 65580 8492 11360 85432 14db8 rc.dynamic 21.16user 85.95system 1:47.22elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (200100major+2449872minor)pagefaults 0swaps text data bss dec hex filename 67213 8532 11552 87297 15501 rc.rl.dynamic 30.07user 114.15system 2:24.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (200117major+2827443minor)pagefaults 0swaps