From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id EAA04786; Thu, 11 Mar 2004 04:38:33 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id EAA04407 for ; Thu, 11 Mar 2004 04:38:32 +0100 (MET) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i2B3cnKW009017 for ; Thu, 11 Mar 2004 04:38:50 +0100 Received: from [192.168.1.200] (ppp114-118.lns1.syd3.internode.on.net [150.101.114.118]) by smtp1.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id i2B3cDwn033000; Thu, 11 Mar 2004 14:08:14 +1030 (CST) Subject: Re: [Caml-list] Bug with really_input under cygwin From: skaller Reply-To: skaller@users.sourceforge.net To: Nuutti Kotivuori Cc: Eric Dahlman , skaller@users.sourceforge.net, caml-list@pauillac.inria.fr In-Reply-To: <87hdww4tc9.fsf@aka.i.naked.iki.fi> References: <51FE3429-7219-11D8-8BF5-000393914EAA@atcorp.com> <1078888018.2452.52.camel@pelican.wigram> <87hdww4tc9.fsf@aka.i.naked.iki.fi> Content-Type: text/plain Message-Id: <1078976542.2452.106.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 11 Mar 2004 14:42:23 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 bug:01 cygwin:01 sourceforge:01 2004:99 vastly:01 ucs-:01 endian:01 10646:01 10646:01 ackermann:01 non-portable:01 9660:01 glebe:01 unicode:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk X-Status: X-Keywords: X-UID: 110 On Thu, 2004-03-11 at 02:25, Nuutti Kotivuori wrote: > > even if you're processing text. Never depend on the > > language or OS conversion functions, its very unlikely > > they'll be right. Do all the conversions needed yourself. > > At least when you find a problem you're not handling > > correctly you can fix it. > > Luckily not everybody sees the world as glum :-) I'm not seeing it as glum. I'm pointing out that today the situation is vastly more complex due to belated recognition of the need for Standards to support I18N issues. Because of this the idea that \r\n <-> \n is the only real encoding issue across platforms is wrong. If only that were the case today, it would be a trivial problem to resolve. For example, text files may contain certain header bytes that indicate if the file is UTF8 encoded, or UCS-2 with big or little endian: these bytes if found must not be considered as 'text', they're just encoding indicators. Even within Unicode/ISO-10646 there are myrriad 'encoding' problems, the famous ones being the use of combining characters -- and that's *after* you have found the ISO10646 code points :) So, if you want to handle *text* in a portable way, you have some work ahead of you. Don't even try to render it correctly, the required algorithm competes with Mr Ackermann in performance :D As long as these kinds of comments are labelled as 'rants' people will continue to write non-portable software and fail to face up to the issues. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners