caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Nuutti Kotivuori <naked+caml@naked.iki.fi>
To: skaller@users.sourceforge.net
Cc: caml-list@pauillac.inria.fr
Subject: Re: [Caml-list] Bug with really_input under cygwin
Date: Thu, 11 Mar 2004 07:02:52 +0200	[thread overview]
Message-ID: <87znao0ydf.fsf@aka.i.naked.iki.fi> (raw)
In-Reply-To: <1078976542.2452.106.camel@pelican.wigram>

skaller@users.sourceforge.net wrote:
> On Thu, 2004-03-11 at 02:25, Nuutti Kotivuori wrote:
>> Luckily not everybody sees the world as glum :-)
>
> I'm not seeing it as glum. I'm pointing out that today the situation
> is vastly more complex due to belated recognition of the need for
> Standards to support I18N issues.
>
> Because of this the idea that \r\n <-> \n is the only real encoding
> issue across platforms is wrong.  If only that were the case today,
> it would be a trivial problem to resolve.
>
> For example, text files may contain certain header bytes that
> indicate if the file is UTF8 encoded, or UCS-2 with big or little
> endian: these bytes if found must not be considered as 'text',
> they're just encoding indicators.
>
> Even within Unicode/ISO-10646 there are myrriad 'encoding' problems,
> the famous ones being the use of combining characters -- and that's
> *after* you have found the ISO10646 code points :)
>
> So, if you want to handle *text* in a portable way, you have some
> work ahead of you. Don't even try to render it correctly, the
> required algorithm competes with Mr Ackermann in performance :D
>
> As long as these kinds of comments are labelled as 'rants' people
> will continue to write non-portable software and fail to face up to
> the issues.

I have left the entire text here quoted to point out the difference in
subjects.

Sure, handling *text* is a really, really complex beast in today's
world. I end up fighting with those problems almost daily. You are
preaching to the choir.

But - there's nothing ambiguous about slurping an entire file into a
string. And there's nothing complex about doing that portably.

Encodings, byte-order-marks, combining characters, text printing and
all that do not enter into it. The \r\n <-> \n translation issue is
the first portability hurdle, since it affects plain byte input and
output, regardless of implications for text. String as an array of
characters is a really complex beast to handle. String as an array of
bytes is trivial to handle.

And the encoding issues do not suddenly make 'md5sum' any less
portable. Or 'rsync'. Or 'wget'. But the \r\n <-> \n issue does.

-- Naked

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-03-11  5:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-09 22:30 Eric Dahlman
2004-03-09 22:52 ` Karl Zilles
2004-03-10  3:06 ` skaller
2004-03-10  4:10   ` David Brown
2004-03-10 13:14     ` Richard Zidlicky
2004-03-11  4:11       ` skaller
2004-03-11  3:24     ` skaller
2004-03-10 15:25   ` Nuutti Kotivuori
2004-03-11  3:42     ` skaller
2004-03-11  5:02       ` Nuutti Kotivuori [this message]
2004-03-11 15:21         ` skaller
2004-03-11  6:32       ` james woodyatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87znao0ydf.fsf@aka.i.naked.iki.fi \
    --to=naked+caml@naked.iki.fi \
    --cc=caml-list@pauillac.inria.fr \
    --cc=skaller@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).