From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id IAA03511; Tue, 13 Apr 2004 08:15:23 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA04868 for <caml-list@pauillac.inria.fr>; Tue, 13 Apr 2004 08:15:22 +0200 (MET DST)
Received: from bureau14.utcc.utoronto.ca (bureau14.utcc.utoronto.ca [128.100.132.42])
	by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i3D6FKYM027846
	for <caml-list@inria.fr>; Tue, 13 Apr 2004 08:15:21 +0200
Received: from user-292.gradstudents.utoronto.ca ([142.151.171.36] EHLO trevorkrny6zst ident: IDENT-NOT-QUERIED [port 4518]) by bureau14.utcc.utoronto.ca with ESMTP id <890037-19763>; Tue, 13 Apr 2004 02:15:13 -0400
From: "Trevor Andrade" <trevor.andrade@utoronto.ca>
To: "'John Goerzen'" <jgoerzen@complete.org>, <caml-list@inria.fr>
Subject: RE: [Caml-list] Dynamically evaluating OCaml code
Date: Tue, 13 Apr 2004 02:15:06 -0400
Message-ID: <000001c4211e$aa699c40$24ab978e@trevorkrny6zst>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook, Build 10.0.2616
In-Reply-To: <20040408155833.GG30763@excelhustler.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Importance: Normal
X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; andrade:01 andrade:01 caml-list:01 dynamically:01 ideally:01 smalltalk:01 python:01 perl's:01 cpan:01 raa:99 repositories:01 raa:99 monitored:99 repositories:01 cygwin's:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk
X-Status: 
X-Keywords:                  
X-UID: 266


I think when discussing this we have to separate means and ends.  The
end that we would like is that we want to download a caml distribution
from the internet and we get everything we want like IPV6, sockets,
directory functions etc, and also probably a lot of stuff we will never
use (internet connections are fast and disk space is cheap so who cares
if its big).  Ideally you should have one download where you get
everything.  Also this download should be fairly standardized so that
most people can download the same library and it should be of high
quality.

Why is this necessary?  Because history has shown it works.  One of the
main reasons many other languages like lisp and smalltalk are not very
popular is that they don't have large standardized libraries.  I also
think that the reasons C++ is dying is that it does not have a good
standardized library.  Languages like Python are very popular precisely
because they do have large standard libraries.  

Now the question is what are means through which you can accomplish
giving people large standardized distributions with everything they
need.  One way is just to have a big standard library.  Another way
might be to separate the standard library into two parts  a standard
core library and a separate extended standard library.  A third way is
to just have a centralized repository like Perl's CPAN or Ruby's RAA.  I
think you have to be careful with large centralized repositories because
the code quality can suck (eg Ruby's RAA) unless they are monitored
well.  Also if you do use a large central repository you often don't
have standardized tools because different versions exist for things.
Thirdly downloading and installing from centralized repositories should
be easy so there should be a tool for doing this like cygwin's
installation tool.  I should mention that cygwin's installation tool is
extremely nice.  It handles dependencies, versioning and does remote
installation.  Miktex uses the cygwin tool to access the Tex archives.
In this case the centralized repository method works pretty well.
However in tex it does often happen that there are separate packages
that you download that have some similar functions.  I think Pyhon's big
standard library model is the best.  Using this method you get a nice
big standardized distribution with everything you need. This makes your
life very easy.  In Python there are also many things outside the
standard library and as the quality of these things gets better they
often get put in the standard library.  It should also be noted that its
not really necessary for the core developers to manage everything in the
standard library.  In Python most of the standard library is not managed
by the core developers of the language who work on things like the
language and the interpreter.  Instead the different parts are managed
by either the author of that part or some maintainer.  I don't see why
core developers have to maintain the standard library.  They should just
decide what makes it in to the standard library. 

Regards,
Trevor


-----Original Message-----
From: owner-caml-list@pauillac.inria.fr
[mailto:owner-caml-list@pauillac.inria.fr] On Behalf Of John Goerzen
Sent: Thursday, April 08, 2004 11:59 AM
To: Richard Jones; Ocaml Mailing List
Subject: Re: [Caml-list] Dynamically evaluating OCaml code

On Thu, Apr 08, 2004 at 05:26:54PM +0200, Markus Mottl wrote:
> On Thu, 08 Apr 2004, Richard Jones wrote:
> > which returns the first n members of a list.  As for slicing the
> > middle from a list, I tend to think that the original poster should
> > probably be using a different, more suitable structure.  Perhaps an
> > Array if he wants random access.
> 
> That's also my opinion: it's usually an indication of a bad choice
> concerning datastructure if you need such list functions.

One does not always have the choice of data structure.  Sometimes, and
this happens to be the case in several programs I've worked on recently,
the inadequecies arise when I'm trying to get data frmo an older system
into an OCaml-based one that does use more suitable structures.

For instance, consider this problem, which is a simplified version of a
real problem...  I am reading a line-oriented file delimited by "|".
It's easy enough to split this apart into a list (even though I must go
to Str and regexps to do it, grr).  

Now, I know that I must ignore the first two and last three elements in
that list.  I do not know in advance what size the list will be, and it
varies from line to line, but it always has at least five elements.  I
also do not necessarily know the values for "2" and "3" at compile time,
though they are constant throughout execution of the program.  The order
of the elements is significant and must not be altered.  Each element is
the same type (ie, a String).  There is nothing in the data itself that
differentiates it.

Now, which OCaml data structure gives me the ability to easily pull out
such a slice?  As far as I can tell, there are no standard functions for
any of Array or List to do that.  I could write a function for either of
them to loop over it and get me what I want, but the point is that this
functionality is useful.

Now, there are arguably other ways to do this; I could use a complex
regular expression to pull out the data in the center... ie:

  ([^|]*\|){2}(.*)(\|[^|]*){3}

may do the trick in this particular example, but not always.  (What if,
instead of lines, I am dealing with packets coming in off the network?)

One does not always get data handed to onesself on a platter, already
nicely ordered and suitable for storing in a nice structure (in any
language).  Hell, half the time my first step is to -- yes -- convert to
ASCII.  (I sometimes have to work with data coming from an AS/400, an
EBCDIC system that is thankfully being phased out)

> I absolutely agree with you!  But the point is: is this really so
> important to have it in the _standard_ library?  You didn't mention
the
> word "standard" so I suppose you'd be perfectly happy if somebody
wrote a
> fully-featured library for this kind of functionality?  And you'd
rather
> like to see better "social tools" for making use of such
contributions?

I don't really care if it comes in ocaml.tar.gz or not, for some of
these things.  Some of them absolutely should, especially more powerful
string/array/list slicing, IPv6, and POSIX interfaces.  But many of the
rest don't exist at all or require contortions to use.  (For instance,
there are two libraries out there named Extlib, both of which implement
some of what I want with little overlap between them.  How annoying.)

I see these as critical ommissions from the actual standard library:

 * IPv6 and other socket-related and DNS-related problems
 * Lack of support for read/write handles in Pervasives (relegated to
   Unix and with complex interactions between the two)
 * String/array/list indexing/slicing improvements
 * Standard C/POSIX date/time functions
 * Path manipulation functions (about 75% of these are already there;
   I have no idea why the other 25% aren't)
 * mknod() and similar functions  (we have mkfifo(), after all...)

These are all enhancements of functionality already present in the
standard library.  Why would I be able to use the standard library for
IPv4 but have to go elsewhere for IPv6?  That doesn't really make sense.

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives:
http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners