From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA07674; Thu, 29 Apr 2004 18:39:51 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA07672 for ; Thu, 29 Apr 2004 18:39:49 +0200 (MET DST) Received: from smtp.mbg.ocn.ne.jp (mbg.ocn.ne.jp [210.190.142.181]) by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i3TFVfjq004403 for ; Thu, 29 Apr 2004 17:31:41 +0200 Received: from localhost (p48100-adsau14honb7-acca.tokyo.ocn.ne.jp [221.187.101.100]) by smtp.mbg.ocn.ne.jp (Postfix) with ESMTP id 0653260FB; Fri, 30 Apr 2004 00:31:40 +0900 (JST) Date: Fri, 30 Apr 2004 00:31:22 +0900 (JST) Message-Id: <20040430.003122.28779093.yoriyuki@mbg.ocn.ne.jp> To: jgoerzen@complete.org Cc: ben@socialtools.net, caml-list@inria.fr Subject: Re: [Caml-list] Re: Common IO structure From: Yamagata Yoriyuki In-Reply-To: <20040429140240.GA12640@excelhustler.com> References: <20040429130335.GA11323@excelhustler.com> <20040429.224036.75426712.yoriyuki@mbg.ocn.ne.jp> <20040429140240.GA12640@excelhustler.com> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 yamagata:01 yoriyuki:01 yoriyuki:01 caml-list:01 2004:99 2004:99 0900,:01 yamagata:01 readline:01 encodings:01 implemented:01 eol:01 stateful:01 eol:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk From: John Goerzen Subject: Re: [Caml-list] Re: Common IO structure Date: Thu, 29 Apr 2004 09:02:40 -0500 > On Thu, Apr 29, 2004 at 10:40:36PM +0900, Yamagata Yoriyuki wrote: > > > > > > OK, but then you can leave out readline(), readlines() and xreadlines(), > > > > > > because they don't make any sense unless you've already dealt with > > > > > > character encodings. > > > > > > > > > > No, they can simply be implemented in terms of read(). > > > > > > > > It will break when UTF-16/UTF-32 are used. The line separator should > > > > be handled after code conversion. At least that is the idea of > > > > Unicode standard. (But Since Unicode standard is challenged by > > > > reality in every aspect, maybe nobody cares.) > > > > > > You are missing the point. read() could handle the code conversion. > > > > No, what I wanted to say is that the line separator should be handled > > in the Unicode level, not the byte-character level. Your design > > assumes read() always returns new line characters as in ASCII. This > > would not hold when read() returns UTF-16/UTF-32. > > I don't see why that is the case. If read() returns UTF-16 data, > readlines() works with it, and would of course be scanning it for a > UTF-16 EOL character or string. I don't see where that's the problem. Encoding could be stateful, so there would be no single representation of EOL. (*) Ok, this is very unlikely case currently, but I think there is an interesting encoding for Unicode which is fully stateful. So, readlines() needs to fully aware of the encoding. My proposal is mainly for sharing common channel types among libraries, so that a user can pass a channel from a libraries to anonther withoug writing a glue code. Since parsing endline, or loading the whole file into the string mainly occurs in the endpoint of IO, I do not think standardizing them are necessary for this purpose. I do not think standardizing the endpoint API is important, because I think that in the end, we will use only one library as the endpoint of IO. (*) IIRC, RFC defines the endianness of UTF-16 is swapped in the middle of the stream, when "BOM" 0xfffe appears. -- Yamagata Yoriyuki ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners