From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id PAA21265; Thu, 29 Apr 2004 15:23:20 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id PAA21281 for ; Thu, 29 Apr 2004 15:23:19 +0200 (MET DST) Received: from gatekeeper.elmer.external.excelhustler.com (gatekeeper.excelhustler.com [68.99.114.105]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i3TDNHYM008227 for ; Thu, 29 Apr 2004 15:23:18 +0200 Received: from chatterbox.elmer.internal.excelhustler.com (unknown [192.168.0.12]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client CN "chatterbox.elmer.internal.excelhustler.com", Issuer "excelhustler.com" (not verified)) by gatekeeper.elmer.external.excelhustler.com (Postfix) with ESMTP id E8705E013F; Thu, 29 Apr 2004 08:23:16 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by chatterbox.elmer.internal.excelhustler.com (Postfix) with ESMTP id B3CD75C006; Thu, 29 Apr 2004 08:23:16 -0500 (CDT) Received: from wile.internal.excelhustler.com (wile.internal.excelhustler.com [192.168.1.34]) by chatterbox.elmer.internal.excelhustler.com (Postfix) with ESMTP id 93C315C005; Thu, 29 Apr 2004 08:23:16 -0500 (CDT) Received: by wile.internal.excelhustler.com (Postfix, from userid 1000) id 8C30D1B058; Thu, 29 Apr 2004 08:23:16 -0500 (CDT) Date: Thu, 29 Apr 2004 08:23:16 -0500 From: John Goerzen To: Benjamin Geer Cc: caml-list@inria.fr Subject: Re: [Caml-list] Re: Common IO structure Message-ID: <20040429132316.GB11323@excelhustler.com> References: <016401c42bc4$b6438840$19b0e152@warp> <20040428.004358.45522587.yoriyuki@mbg.ocn.ne.jp> <016501c42c73$24e64b30$ef01a8c0@warp> <20040428.015800.126758722.yoriyuki@mbg.ocn.ne.jp> <408EEE3E.7050008@socialtools.net> <20040428034415.GB19564@complete.org> <40902265.9040702@socialtools.net> <20040428214442.GE10198@excelhustler.com> <4090E597.1080603@socialtools.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4090E597.1080603@socialtools.net> User-Agent: Mutt/1.5.5.1+cvs20040105i X-Scanned-By: clamscan at chatterbox.elmer.internal.excelhustler.com X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 2004:99 glance:01 api:01 governing:99 api:01 apis:01 javadoc:01 python:01 java's:01 rdwr:01 python:01 language's:01 generic:01 readline:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Thu, Apr 29, 2004 at 12:23:03PM +0100, Benjamin Geer wrote: > >InputStreamReader, OutputStream, OutputStreamWriter, RandomAccessFile, > >Reader, or Writer. Really, I literally *do not know how to open a > >simple file*. I would not call that intuitive. > > You actually have to *read* the documentation, not just glance at the > class names. :) That's to be expected with a powerful API. Once you We were talking about being intuitive here. I'd have to read maybe a dozen different class descriptions, have to understand the differences between them, and cross-reference back and forth between them to figure out how to get the object I want. Or pay for a Java book that describes these relationships itself. > understand the key concepts governing the design of the API, it makes > sense, it and becomes intuitive to select the classes you need. I tried > to point out these concepts in the message you replied to. Even when I lived and breathed Java every day for a year, its I/O API was not intuitive. Actually, most of its APIs were not intuitive. Everybody I worked with had the Javadoc for the API bookmarked and referred to it constantly. In contrast, the API in Python, Perl, or even C is very easy to use. Here's one of the problems. Java's API makes it complex to do simple things without simplifying complex things. Recall my example -- being able to open a file read/write and seeking around in it? In C, I'd do: int file = open(filename, O_RDWR) or FILE * file = fopen(filename, "r+") Perl, it is: open(FH, "+<", $filename) Or, in Python: file = open(filename, "r+") If I don't know my language's code for file modes, I have one simple place to look. Now, none of these examples do UTF-8 or other conversions. That's fine. I usually don't need that. In fact, I dare say that any kind of conversion like that is by far the minority case. Java requires me to wade through and think about all of these things plus the way the file will eventually be used (do I want an array of bytes, an array of chars, strings, etc?) right when I open it. That's bad form. Make the open a generic call, and let people build upon the file object from there. This is how C and Python work. (Perl is a little wacko with its open call, but it works that way too, mostly.) > To read a file containing UTF-8 text, one line at a time: > > BufferedReader in = > new BufferedReader > (new InputStreamReader > (new FileInputStream(filename), "UTF8")); > > while (true) > { > String line = in.readLine(); > > if (line == null) > { > break; > } > > System.out.println(line); > } But the scary part is that this is about how hard it is to read a file of ASCII text, one line at a time. Whereas, with Python, I'd do: for line in open(filename, "r").xreadlines(): print line See what I mean about intuitive? But what about UTF-8 in Python? import codecs file = codecs.open(filename, "r", "UTF-8") for line in file.xreadlines(): print line By all means, if we are going to emulate a design from another language, let us emulate this one. It is far cleaner and sensible. For more info, see file:/usr/share/doc/python2.3/html/lib/module-codecs.html. Essentially, the codecs.open call opens a file handle and returns a StreamReader object that has the file handle passed in to it. But here's the key: this StreamReader object is itself a "file-like object" in Python parlance. That means you can use it everywhere you could have used a standard file object (assuming the code is capable of handling Unicode strings, which it usually is.) So you still have the helpful abstraction of Java without all the mess. And before you say, "See, Java has a StreamReader too!", note that codecs defines *4* classes: StreamWriter, StreamReader, StreamReaderWriter, and StreamRecoder. I can handle that. > functionality (like buffering). All the classes whose names end in > 'Stream' deal with bytes only; the ones whose names end in 'Reader' or > 'Writer' deal with characters. See? It's easy once you know the pattern. But the point is, this distinction is at the wrong place. -- John ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners