caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Benjamin Geer <ben@socialtools.net>
To: Yamagata Yoriyuki <yoriyuki@mbg.ocn.ne.jp>
Cc: warplayer@free.fr, caml-list@inria.fr
Subject: Re: [Caml-list] Re: Common IO structure
Date: Wed, 28 Apr 2004 00:35:26 +0100	[thread overview]
Message-ID: <408EEE3E.7050008@socialtools.net> (raw)
In-Reply-To: <20040428.015800.126758722.yoriyuki@mbg.ocn.ne.jp>

Yamagata Yoriyuki wrote:
> I doubt the benefit of
> bufferd IO, as I stated in the previous mail.  Unless operation is
> very simple, and atoms are very small, (that is, character IO) extra
> cost of element-wise IO is not important.

But there are times when you want to read one byte or character at a 
time, and in those cases, buffering saves the overhead of a function or 
method call per byte/char.

Buffering is also useful w you have to process a very large amount of 
data, and cannot keep it all in memory at once.

> I'm interested in (potential) users of IO libraries.  Could someone
> comment on IO system of Jave, Perl, Python, for example?

In Java there are two I/O libraries, the original one (java.io)[1] and 
the new one (java.nio)[2].  The old one has the virtue of being easy to 
understand and use, and flexible enough for many situations.  The basic 
InputStream and OutputStream classes deal only with bytes, have 
Unix-like 'read' and 'write' methods, and do no buffering.  There are 
derived classes such as FileInputStream and SocketInputStream.  The API 
allows you to add functionality to a stream by using wrappers.  For 
example, to add buffering to any InputStream, you wrap it in a 
BufferedInputStream (which is a class derived from InputStream).  To 
marshal Java objects to a byte stream, you wrap an OutputStream in an 
ObjectOutputStream, and pass objects to the ObjectOutputStream.

Classes derived from Reader and Writer deal with characters, and can be 
wrapped around streams to perform conversions between bytes and 
characters.  For example, to read bytes and convert them to characters, 
you wrap an InputStream in an InputStreamReader, which has a constructor 
that says which encoding to read, and 'read' methods that return 
(Unicode) characters.  Another example of a Reader is LineNumberReader, 
which counts lines in its input.

This is all fine as far as it goes, but it turns out to be cumbersome, 
and in some cases impossible, to implement certain things efficiently 
using this API.  The java.nio API solves these problems, but it is much 
more complicated to use.

For example, suppose you have to read a large amount of text from a 
network connection, convert it to another encoding, and save it in a 
file.  There's too much text to store all of it in memory at once, and 
you're dealing with a lot of network requests at the same time, so in 
any case you want to minimise the amount of memory used by each request. 
  You'd like to be able to read about 4K at a time, convert the bytes to 
the target encoding, and write them to the file.  You could make a 4K 
byte array and use it as a buffer, but what if the input encoding is 
UTF-8?  You might get an incomplete character at the end of the buffer; 
if the UTF-8 decoder is expecting a complete string, it will choke.

The solution in java.nio is to have two different kinds of buffer 
classes: ByteBuffer and CharBuffer.  You can fill up a ByteBuffer, and 
use a Decoder to convert the bytes to Unicode characters; the Decoder 
will read as many complete characters as it can, and put them in a 
CharBuffer.  You then 'compact' the ByteBuffer, which moves any 
remaining bytes to the beginning of the buffer, and start again. 
(Similarly, you can use an Encoder to convert the characters to bytes in 
the target encoding, filling up a ByteBuffer which you can then write to 
an output channel.)

Some of other useful things java.nio provides are:

* 'Direct' byte buffers.  'Given a direct byte buffer, the Java virtual 
machine will make a best effort to perform native I/O operations 
directly upon it. That is, it will attempt to avoid copying the buffer's 
content to (or from) an intermediate buffer before (or after) each 
invocation of one of the underlying operating system's native I/O 
operations.'

* Buffers that correspond to a memory-mapped region of a file.  This can 
be useful for dealing with huge files; it takes advantage of the 
operating system's support for memory-mapped files, where available.

* 'Scattering' channels.  'A scattering read operation reads, in a 
single invocation, a sequence of bytes into one or more of a given 
sequence of buffers. Scattering reads are often useful when implementing 
network protocols or file formats that, for example, group data into 
segments consisting of one or more fixed-length headers followed by a 
variable-length body.  Similar gathering write operations are defined in 
the GatheringByteChannel interface.'

My own view is that the flexibility and efficiency permitted by java.nio 
are valuable, but that its complexity is a problem.  The behaviour of 
the buffer classes[3] is tricky to understand and therefore error-prone.

Ben

[1] http://java.sun.com/j2se/1.4.2/docs/api/java/io/package-summary.html

[2] http://java.sun.com/j2se/1.4.2/docs/api/java/nio/package-summary.html

[3] http://java.sun.com/j2se/1.4.2/docs/api/java/nio/Buffer.html

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-04-27 23:35 UTC|newest]

Thread overview: 210+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-23 18:51 [Caml-list] [ANN] The Missing Library John Goerzen
2004-04-23 19:52 ` Kenneth Knowles
2004-04-23 20:09   ` Alexander V. Voinov
2004-04-23 20:27     ` John Goerzen
2004-04-23 20:23   ` John Goerzen
2004-04-23 20:36     ` Maxence Guesdon
2004-04-23 21:10       ` John Goerzen
2004-04-23 21:12         ` Maxence Guesdon
2004-04-23 21:18           ` Maxence Guesdon
2004-04-23 21:32             ` Nicolas Cannasse
2004-04-23 21:46             ` John Goerzen
2004-04-23 21:58               ` Maxence Guesdon
2004-04-24  8:15                 ` Matthieu BRUCHER
2004-04-24  8:15                   ` Maxence Guesdon
2004-04-23 21:36           ` John Goerzen
2004-04-23 21:33         ` John Goerzen
2004-04-23 22:04           ` Alain.Frisch
2004-04-24  4:26             ` John Goerzen
2004-04-24  8:13               ` Alain.Frisch
2004-04-24  9:28                 ` Nicolas Cannasse
2004-04-25  8:56                   ` Common IO structure (was Re: [Caml-list] [ANN] The Missing Library) Yamagata Yoriyuki
2004-04-25 11:54                     ` Gerd Stolpmann
2004-04-26 14:53                       ` [Caml-list] Re: Common IO structure Yamagata Yoriyuki
2004-04-26 21:02                         ` Gerd Stolpmann
2004-04-25 19:42                     ` Common IO structure (was Re: [Caml-list] [ANN] The Missing Library) Nicolas Cannasse
2004-04-26 13:16                       ` [Caml-list] Re: Common IO structure Yamagata Yoriyuki
2004-04-26 13:53                         ` Jacques GARRIGUE
2004-04-26 14:26                           ` Nicolas Cannasse
2004-04-28  6:52                             ` Jacques GARRIGUE
2004-04-26 14:23                         ` Nicolas Cannasse
2004-04-26 14:55                           ` skaller
2004-04-26 15:26                           ` Yamagata Yoriyuki
2004-04-26 19:28                             ` Nicolas Cannasse
2004-04-26 20:56                               ` Gerd Stolpmann
2004-04-26 21:14                                 ` John Goerzen
2004-04-26 22:32                                   ` Gerd Stolpmann
2004-04-26 21:52                                 ` Benjamin Geer
2004-04-27 16:00                                 ` Yamagata Yoriyuki
2004-04-27 21:51                                   ` Gerd Stolpmann
2004-04-27 19:08                                 ` Nicolas Cannasse
2004-04-27 22:22                                   ` Gerd Stolpmann
2004-04-28  7:42                                     ` Nicolas Cannasse
2004-04-29 10:13                                   ` Yamagata Yoriyuki
2004-04-27 15:43                               ` Yamagata Yoriyuki
2004-04-27 16:17                                 ` Nicolas Cannasse
2004-04-27 16:58                                   ` Yamagata Yoriyuki
2004-04-27 23:35                                     ` Benjamin Geer [this message]
2004-04-28  3:44                                       ` John Goerzen
2004-04-28 13:01                                         ` Richard Jones
2004-04-28 21:30                                         ` Benjamin Geer
2004-04-28 21:44                                           ` John Goerzen
2004-04-28 22:41                                             ` Richard Jones
2004-04-29 11:51                                               ` Benjamin Geer
2004-04-29 12:03                                                 ` Richard Jones
2004-04-29 15:16                                                   ` Benjamin Geer
2004-04-29 10:27                                             ` Yamagata Yoriyuki
2004-04-29 13:03                                               ` John Goerzen
2004-04-29 13:40                                                 ` Yamagata Yoriyuki
2004-04-29 14:02                                                   ` John Goerzen
2004-04-29 15:31                                                     ` Yamagata Yoriyuki
2004-04-29 17:31                                                       ` james woodyatt
2004-04-29 23:53                                                         ` Benjamin Geer
2004-04-30  4:10                                                           ` james woodyatt
2004-04-29 11:23                                             ` Benjamin Geer
2004-04-29 12:23                                               ` Richard Jones
2004-04-29 15:10                                                 ` Benjamin Geer
2004-04-29 15:35                                                   ` John Goerzen
2004-04-29 15:46                                                     ` Benjamin Geer
2004-04-29 15:58                                                       ` Richard Jones
2004-04-29 20:41                                                       ` John Goerzen
2004-04-29 22:35                                                         ` Benjamin Geer
2004-05-01 14:37                                                 ` Brian Hurt
2004-04-29 13:23                                               ` John Goerzen
2004-04-29 14:12                                                 ` John Goerzen
2004-04-29 15:37                                                 ` Benjamin Geer
2004-04-28  7:05                                       ` Nicolas Cannasse
2004-04-28  0:20                                     ` skaller
2004-04-28  3:39                                     ` John Goerzen
2004-04-28 13:04                                     ` Richard Jones
2004-04-24  9:40               ` [Caml-list] [ANN] The Missing Library Oliver Bandel
2004-04-23 22:54           ` Henri DF
2004-04-23 23:11           ` Shawn Wagner
2004-04-25  6:55           ` james woodyatt
2004-04-25  7:56             ` Brandon J. Van Every
2004-04-25 11:50             ` Benjamin Geer
2004-04-25 13:55               ` skaller
2004-04-26 12:08                 ` Martin Berger
2004-04-26 12:51                   ` skaller
2004-04-26 14:49                   ` skaller
2004-04-28  4:31                   ` Brian Hurt
2004-04-28  5:13                     ` Jon Harrop
2004-04-28  8:37                       ` skaller
2004-04-28  9:18                         ` Jon Harrop
2004-04-28 11:24                           ` skaller
2004-04-28 15:18                             ` John Goerzen
2004-04-28 16:28                               ` skaller
2004-04-28 18:02                                 ` John Goerzen
2004-04-29  0:54                                   ` skaller
2004-04-29 11:57                                     ` Andreas Rossberg
2004-04-29 13:38                                     ` John Goerzen
2004-04-28 18:42                                 ` Jon Harrop
2004-04-29  1:03                                   ` skaller
2004-04-29  1:56                                     ` Jon Harrop
2004-04-29  2:35                                       ` skaller
2004-04-29  3:00                                       ` skaller
2004-04-29  5:04                                         ` Jon Harrop
2004-04-29  5:38                                           ` skaller
2004-04-29  5:47                                     ` james woodyatt
2004-04-29 12:05                                     ` Andreas Rossberg
2004-04-28 17:07                             ` james woodyatt
2004-04-28 17:31                               ` skaller
2004-05-03  0:02                                 ` Marcin 'Qrczak' Kowalczyk
2004-05-03  7:54                                   ` skaller
2004-05-03  8:58                                     ` Marcin 'Qrczak' Kowalczyk
2004-05-03 10:58                                       ` skaller
2004-05-03 12:40                                         ` Marcin 'Qrczak' Kowalczyk
2004-05-03 13:04                                           ` Nicolas Cannasse
2004-05-03 14:24                                           ` brogoff
2004-05-03 15:26                                             ` Marcin 'Qrczak' Kowalczyk
2004-05-03 15:08                                           ` skaller
2004-05-03 16:00                                             ` Marcin 'Qrczak' Kowalczyk
2004-05-03 11:32                                       ` [Caml-list] Re: Tail-calls in C code (was: [ANN] The Missing Library) Wolfgang Lux
2004-05-03 12:34                                         ` skaller
2004-05-03 12:38                                         ` skaller
2004-05-03 12:55                                           ` skaller
2004-05-03 13:02                                         ` Marcin 'Qrczak' Kowalczyk
2004-04-28 15:15                       ` [Caml-list] [ANN] The Missing Library John Goerzen
2004-04-28 20:43                         ` Jon Harrop
2004-04-30 15:58                       ` Brian Hurt
2004-05-01  2:48                         ` skaller
2004-04-28  8:24                     ` skaller
2004-04-28  8:42                       ` Martin Berger
2004-04-28 11:38                         ` skaller
2004-04-28 16:07                           ` [Caml-list] " Shivkumar Chandrasekaran
2004-04-28 11:31                       ` [Caml-list] " Yaron M. Minsky
2004-04-28 12:09                         ` skaller
2004-04-28 12:36                           ` Nicolas Cannasse
2004-04-28 13:39                             ` skaller
2004-04-28 14:02                               ` Nicolas Cannasse
2004-04-28 15:34                                 ` skaller
2004-04-28 13:15                           ` Jean-Christophe Filliatre
2004-04-28 14:31                             ` skaller
2004-04-28 14:40                               ` Jean-Christophe Filliatre
2004-04-28 15:51                                 ` skaller
2004-04-28 13:29                           ` Andreas Rossberg
2004-04-28 16:10                           ` [Caml-list] " Shivkumar Chandrasekaran
2004-04-28 17:14                             ` skaller
2004-04-28 17:34                               ` Shivkumar Chandrasekaran
2004-04-28 20:00                               ` Jon Harrop
2004-04-25 12:20             ` [Caml-list] " Benjamin Geer
2004-04-25 14:06               ` skaller
2004-04-25 15:07                 ` Benjamin Geer
2004-04-26  0:19                   ` skaller
2004-04-23 22:08         ` Basile STARYNKEVITCH
2004-04-24  4:40           ` John Goerzen
2004-04-24 10:10           ` Oliver Bandel
2004-04-24 19:31             ` skaller
2004-04-23 20:54     ` Kenneth Knowles
2004-04-23 21:07       ` John Goerzen
2004-04-25 15:43       ` Brian Hurt
2004-04-26  0:22         ` skaller
2004-04-28  4:10           ` Brian Hurt
2004-04-26  6:48     ` Florian Hars
2004-04-23 20:41 ` Eric C. Cooper
2004-04-23 21:16   ` John Goerzen
2004-04-23 22:28     ` Shawn Wagner
2004-04-23 22:37       ` Kenneth Knowles
2004-04-23 23:16         ` Shawn Wagner
2004-04-24  1:38           ` [Caml-list] ocamlopt -pack portability John Carr
2004-04-24 10:31             ` Oliver Bandel
2004-04-24 16:53               ` John Carr
2004-04-24  4:46         ` [Caml-list] [ANN] The Missing Library John Goerzen
2004-04-24  2:43       ` Yamagata Yoriyuki
2004-04-24  9:19         ` Nicolas Cannasse
2004-04-24 12:27           ` Shawn Wagner
2004-04-24 12:58             ` Alain.Frisch
2004-04-24 17:36               ` Nicolas Cannasse
2004-04-26 14:49               ` Florian Hars
2004-04-24  2:44       ` Yamagata Yoriyuki
2004-04-24  4:51       ` John Goerzen
2004-04-24  5:11         ` Jon Harrop
2004-04-24 12:59       ` Proposal: community standard library project (was: Re: [Caml-list] [ANN] The Missing Library) Benjamin Geer
2004-04-24 17:29         ` [Caml-list] RE: Proposal: community standard library project Brandon J. Van Every
2004-04-24 18:23           ` Benjamin Geer
2004-04-25  4:37             ` Brandon J. Van Every
2004-04-26  1:45         ` [Caml-list] " Jacques GARRIGUE
2004-04-26  3:03           ` Brandon J. Van Every
2004-04-26  7:43           ` Martin Jambon
2004-04-26 18:25           ` Benjamin Geer
2004-04-26 19:37             ` Gerd Stolpmann
2004-04-26 20:24               ` skaller
2004-04-26 20:39                 ` John Goerzen
2004-04-26 22:17                   ` Brandon J. Van Every
2004-04-27  9:06                   ` skaller
2004-04-27  9:35                     ` Alain.Frisch
2004-04-27 11:29                     ` Gerd Stolpmann
2004-04-27 12:52                       ` skaller
2004-04-27 18:13                       ` [Caml-list] CVS labeling (was Re: Proposal: community standard library project) Brandon J. Van Every
2004-04-27 18:53                         ` John Goerzen
2004-05-03  6:12 [Caml-list] Re: Common IO structure Vladimir N. Silyaev
2004-05-04 21:31 ` Benjamin Geer
2004-05-04 22:59   ` Yamagata Yoriyuki
2004-05-05  8:11     ` skaller
2004-05-05 15:48       ` Marcin 'Qrczak' Kowalczyk
2004-05-05 19:28         ` skaller
2004-05-05 17:33     ` Vladimir N. Silyaev
2004-05-05 17:31   ` Vladimir N. Silyaev
2004-05-07 22:11     ` Benjamin Geer
2004-05-08  7:29       ` Vladimir N. Silyaev
2004-05-09 17:35         ` Benjamin Geer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=408EEE3E.7050008@socialtools.net \
    --to=ben@socialtools.net \
    --cc=caml-list@inria.fr \
    --cc=warplayer@free.fr \
    --cc=yoriyuki@mbg.ocn.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).