caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Brian Hurt <bhurt@spnz.org>
To: John J Lee <jjl@pobox.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Executable size?
Date: Wed, 12 Nov 2003 14:40:56 -0600 (CST)	[thread overview]
Message-ID: <Pine.LNX.4.44.0311121406060.5009-100000@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.58.0311121837540.2472@alice>

On Wed, 12 Nov 2003, John J Lee wrote:

> On Wed, 12 Nov 2003, Brian Hurt wrote:
> 
> > On Wed, 12 Nov 2003, Richard Jones wrote:
> [...]
> > > This is not a criticism of OCaml, but the executables do tend to be
> > > quite large. This seems mainly down to the fact that OCaml links the
> > > runtime library in statically. There was previous discussion on this
> [...]
> > This isn't as bad as it sounds.  A simplistic "hello world!" application
> > in Ocaml weighs in at 112K, versus 11K for the equivelent (dynamically
> > linked) C program- almost entirely either statically linked standard
> > libraries and infrastructure (garbage collections, etc.)- stuff that
> > doesn't expand with larger programs.
> 
> OK.  Is that 100K difference for "hello world" (which doesn't necessarily
> stay the same for larger programs, as you say below) simply a result of
> the fact that C has the "unfair" advantage of already having its runtime
> sitting on everyone's hard drive already?

Actually, I think Ocaml uses C's runtime libraries and builds on top of
them.  For example, if I understand things correctly, Ocaml's printf is a
wrapper which calls C's printf.  Which is why I haven't bothered comparing
Ocaml's size to C programs being statically linked.  Ocaml is at least
nice enough to only link libraries you are actually using (see the
print_string v. printf results).

In addition to a more complicated and complete standard library and 
bultins, Ocaml also has garbage collection, which is non-trivial to 
implement.  I wouldn't be surprised if half or more of that 100K of 
overhead is just the GC.  Currying, exceptions, etc. also have small size 
penalties.

On the other hand, I would argue that these features, while bloating the 
application.  Which is exactly the sort of thing small "benchmark" 
programs don't show.  I don't know how many times I've read or written C 
code like:

int copy_file(char * src, char * dst) {
    char * buf;
    FILE * inf;
    FILE * outf;

    if ((src == NULL) || (dst == NULL)) {
        return EINVAL;
    }

    inf = fopen(src, "rb");
    if (inf == NULL) {
        return errno;
    }

    outf = fopen(dst, "wb");
    if (outf == NULL) {
        fclose(inf);
        return errno;
    }

    buf = (char *) malloc(4096);
    if (buf == NULL) {
        fclose(outf);
        fclose(inf);
        return errno;
    }

    blah blah blah you get the idea

Vr.s the same code in Ocaml:

let copyfile src dst =
    let inf = open_in_bin src
    and outf = open_out_bin dst
    and buf = String.make 4096 ' '
    in
    let rec loop () =
        let c = input inf buf 0 4096 in
        if (c > 0) then
            begin
                output outf buf 0 c;
                loop ()
            end
        else
            ()
    in
    loop ()

The ocaml executable code for copyfile function will be smaller than the C 
version, simply because the ocaml version takes advantage of various 
features of the larger ocaml library and infrastructure- especially (in 
this case) exceptions and garbage collection.  

> 
> 
> > A naive assumption would be that an Ocaml program is about 100K or so
> > larger than the equivelent C program.  Not much, considering how easy it
> > is to get executables multiple megabytes in size.
> 
> [...]
> > Ocaml gets a lot more code reuse, and thus can actually lead to smaller
> > executables.
> 
> I don't understand what you mean by that (probably my fault).  What do you
> mean by "code reuse" here?  I usually understand that phrase to mean using
> code written by people other than me, but you seem to mean it in a
> different sense.
> 

I was using it in the most literal sense- using code more than once, in
more than one way.  In general, it's much better to have only one copy of
a function, used in two places, than two copies of the function.  The 
trick is that generally the two copies are not exactly identical- if 
the functions are, for example, the length of a linked list, one function 
might operate on a linked list of integers, another a linked list of 
floats.  Ocaml encourages you to program in a generic way- you actually 
have to work at it to write a linked list length routine that *isn't* 
generic, the naive implementation is (so is the optimized version).

Again, this generally isn't a problem in small programs, which easily fit 
into you brain as a whole.  Code reuse becomes more of a trick on moderate 
to large programs, especially moderate to large programs with more than 
one programmer.  How many times have we reimplemented linked lists in C?

> 
> > Unless you have special constraints, the difference between C program
> > sizes and Ocaml program sizes are not enough to be worth worrying about.
> 
> I don't really agree that the problem of distributing simple (few lines of
> code) applications in small executables is all that "special".  Certainly
> there are *many* applications where you don't need that; equally, there
> are quite a few where you do need/want that.

I was thinking of special cases where the difference of a 100K or 1M or so 
is the difference between working and not working.  If you are, for 
example, trying to fit your program on a 512K ROM, Ocaml's overhead might 
be a problem.  

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2003-11-12 19:44 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-12 16:14 John J Lee
2003-11-12 17:33 ` Richard Jones
2003-11-12 18:06   ` Dustin Sallings
2003-11-12 18:31     ` Sven Luther
2003-11-12 18:50       ` John J Lee
2003-11-13  9:10         ` Sven Luther
2003-11-13 13:46           ` John J Lee
2003-11-13 14:28             ` Sven Luther
2003-11-12 18:21   ` John J Lee
2003-11-12 22:53     ` Richard Jones
2003-11-12 23:50       ` John J Lee
2003-11-15 12:48     ` skaller
2003-11-15 15:25       ` John J Lee
2003-11-12 19:06   ` Brian Hurt
2003-11-12 18:38     ` Sven Luther
2003-11-12 19:04       ` Karl Zilles
2003-11-12 21:29         ` Brian Hurt
2003-11-12 20:03       ` Brian Hurt
2003-11-13  4:14         ` Kamil Shakirov
2003-11-13  9:06           ` Richard Jones
2003-11-13  9:18         ` Sven Luther
2003-11-12 18:46     ` John J Lee
2003-11-12 20:40       ` Brian Hurt [this message]
2003-11-12 20:10         ` Basile Starynkevitch
2003-11-12 20:35         ` John J Lee
2003-11-12 21:51           ` Brian Hurt
2003-11-12 21:35             ` David Brown
2003-11-12 22:12           ` Eric Dahlman
2003-11-12 23:32             ` Brian Hurt
2003-11-12 22:53               ` Eric Dahlman
2003-11-12 23:35               ` John J Lee
2003-11-12 23:44             ` John J Lee
2003-11-13  0:26               ` Karl Zilles
2003-11-13  1:29                 ` [Caml-list] F-sharp (was: Executable size?) Oleg Trott
2003-11-14  6:04                   ` [Caml-list] float_of_num Christophe Raffalli
2003-11-13 15:43               ` [Caml-list] Executable size? Eric Dahlman
2003-11-13 19:58                 ` John J Lee
2003-11-13 20:36                   ` Eric Dahlman
2003-11-13 22:16                     ` John J Lee
2003-11-15 13:41                   ` skaller
2003-11-15 15:13                     ` John J Lee
2003-11-15 18:07                       ` skaller
2003-11-15 13:36                 ` skaller
2003-11-15 15:01                   ` John J Lee
2003-11-15 17:53                     ` skaller
2003-11-13 13:37         ` Florian Hars
2003-11-12 18:05 ` Dustin Sallings
2003-11-12 18:36   ` John J Lee
2003-11-12 19:04     ` Dustin Sallings
2003-11-12 20:17       ` John J Lee
2003-11-12 20:01     ` Vitaly Lugovsky
2003-11-13  1:23 ` Nicolas Cannasse
2003-11-15 12:09 ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0311121406060.5009-100000@localhost.localdomain \
    --to=bhurt@spnz.org \
    --cc=caml-list@inria.fr \
    --cc=jjl@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).