From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id GAA08764; Thu, 11 Mar 2004 06:02:55 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id GAA08124 for ; Thu, 11 Mar 2004 06:02:54 +0100 (MET) Received: from gw01.mail.saunalahti.fi (gw01.mail.saunalahti.fi [195.197.172.115]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i2B52rHd032310 for ; Thu, 11 Mar 2004 06:02:53 +0100 Received: from aka.i.naked.iki.fi (naked.iki.fi [62.142.249.112]) by gw01.mail.saunalahti.fi (Postfix) with ESMTP id 015FC111D0A3; Thu, 11 Mar 2004 07:02:53 +0200 (EET) Received: from naked by aka.i.naked.iki.fi with local (Exim 3.36 #1 (Debian)) id 1B1ILM-0004LP-00; Thu, 11 Mar 2004 07:02:52 +0200 To: skaller@users.sourceforge.net Cc: caml-list@pauillac.inria.fr Subject: Re: [Caml-list] Bug with really_input under cygwin References: <51FE3429-7219-11D8-8BF5-000393914EAA@atcorp.com> <1078888018.2452.52.camel@pelican.wigram> <87hdww4tc9.fsf@aka.i.naked.iki.fi> <1078976542.2452.106.camel@pelican.wigram> From: Nuutti Kotivuori Date: Thu, 11 Mar 2004 07:02:52 +0200 Message-ID: <87znao0ydf.fsf@aka.i.naked.iki.fi> User-Agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Security Through Obscurity, linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 bug:01 cygwin:01 sourceforge:01 2004:99 vastly:01 ucs-:01 endian:01 10646:01 10646:01 ackermann:01 non-portable:01 beast:01 portably:01 encodings:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk X-Status: X-Keywords: X-UID: 112 skaller@users.sourceforge.net wrote: > On Thu, 2004-03-11 at 02:25, Nuutti Kotivuori wrote: >> Luckily not everybody sees the world as glum :-) > > I'm not seeing it as glum. I'm pointing out that today the situation > is vastly more complex due to belated recognition of the need for > Standards to support I18N issues. > > Because of this the idea that \r\n <-> \n is the only real encoding > issue across platforms is wrong. If only that were the case today, > it would be a trivial problem to resolve. > > For example, text files may contain certain header bytes that > indicate if the file is UTF8 encoded, or UCS-2 with big or little > endian: these bytes if found must not be considered as 'text', > they're just encoding indicators. > > Even within Unicode/ISO-10646 there are myrriad 'encoding' problems, > the famous ones being the use of combining characters -- and that's > *after* you have found the ISO10646 code points :) > > So, if you want to handle *text* in a portable way, you have some > work ahead of you. Don't even try to render it correctly, the > required algorithm competes with Mr Ackermann in performance :D > > As long as these kinds of comments are labelled as 'rants' people > will continue to write non-portable software and fail to face up to > the issues. I have left the entire text here quoted to point out the difference in subjects. Sure, handling *text* is a really, really complex beast in today's world. I end up fighting with those problems almost daily. You are preaching to the choir. But - there's nothing ambiguous about slurping an entire file into a string. And there's nothing complex about doing that portably. Encodings, byte-order-marks, combining characters, text printing and all that do not enter into it. The \r\n <-> \n translation issue is the first portability hurdle, since it affects plain byte input and output, regardless of implications for text. String as an array of characters is a really complex beast to handle. String as an array of bytes is trivial to handle. And the encoding issues do not suddenly make 'md5sum' any less portable. Or 'rsync'. Or 'wget'. But the \r\n <-> \n issue does. -- Naked ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners