caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Nuutti Kotivuori <naked+caml@naked.iki.fi>
To: Eric Dahlman <edahlman@atcorp.com>
Cc: skaller@users.sourceforge.net, caml-list@pauillac.inria.fr
Subject: Re: [Caml-list] Bug with really_input under cygwin
Date: Wed, 10 Mar 2004 17:25:42 +0200	[thread overview]
Message-ID: <87hdww4tc9.fsf@aka.i.naked.iki.fi> (raw)
In-Reply-To: <1078888018.2452.52.camel@pelican.wigram> (skaller@users.sourceforge.net's message of "10 Mar 2004 14:06:59 +1100")

skaller@users.sourceforge.net wrote:
> On Wed, 2004-03-10 at 09:30, Eric Dahlman wrote:
>> Howdy all,
>>
>> I have some code which is reads in a whole file in and returns it
>> as a string.

If you have a master's degree in reading in between the rant, you
probably picked out the right answer from the text below. But here it
is as a simple answer:

  Loop doing 'input' on the file, until 'input' returns zero.

'really_input' is ofcourse nice and easy, but since you have no really
proper way of knowing how large the entire file is going to be in the
end, you need to make a decision with the buffer size anyway.

Binary or non-binary mode only affects the \r\n -> \n translation while
reading the file - and vice versa while writing.

> The only correct way to do this is to read a block at a time
> until you get a partial block.
>
> This is so EVEN in 'binary' mode, which is just another
> ill conceived Unix hack :-)

[...]

> It is unfortunate that C and Unix do not provide a coherent
> abstraction in this area. Even binary I/O is ill-conceived:

[...]

> C has been plagued by extremely ill considered functions.
> Even the basic IO operation is not correctly defined.

[...]

> There is no such thing as 'the number of characters
> in a file'. Perhaps there is a number of bytes in a file.

[...]

> In MS-DOS, files *always* consist of a number of 256
> byte blocks. It is impossible to have a file with
> a non-256 byte multiple size. Of course, text files
> uses an encoding with a Ctrl-Z at the end.

[...]

> Under Linux, the Standard for text encoding is UTF-8.

[...]

> I personally believe the easiest way to work around this
> quagmire of malspecification is to 
>
> (a) ONLY use 8 bit binary I/O
> (b) ALWAYS read and write bytes
>
> even if you're processing text. Never depend on the
> language or OS conversion functions, its very unlikely
> they'll be right. Do all the conversions needed yourself.
> At least when you find a problem you're not handling
> correctly you can fix it.

Luckily not everybody sees the world as glum :-)

-- Naked

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2004-03-10 15:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-09 22:30 Eric Dahlman
2004-03-09 22:52 ` Karl Zilles
2004-03-10  3:06 ` skaller
2004-03-10  4:10   ` David Brown
2004-03-10 13:14     ` Richard Zidlicky
2004-03-11  4:11       ` skaller
2004-03-11  3:24     ` skaller
2004-03-10 15:25   ` Nuutti Kotivuori [this message]
2004-03-11  3:42     ` skaller
2004-03-11  5:02       ` Nuutti Kotivuori
2004-03-11 15:21         ` skaller
2004-03-11  6:32       ` james woodyatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87hdww4tc9.fsf@aka.i.naked.iki.fi \
    --to=naked+caml@naked.iki.fi \
    --cc=caml-list@pauillac.inria.fr \
    --cc=edahlman@atcorp.com \
    --cc=skaller@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).