caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] error opening large file
@ 2002-01-11 21:02 Garry Hodgson
  2002-01-11 21:51 ` Olivier Andrieu
  2002-01-15  9:56 ` Xavier Leroy
  0 siblings, 2 replies; 7+ messages in thread
From: Garry Hodgson @ 2002-01-11 21:02 UTC (permalink / raw)
  To: ocaml


i get the following error when i open a large file (2561435180 bytes):

   let chan = open_in( "com.zone" );;
   Uncaught exception:
   Sys_error
   "com.zone: Value too large for defined data type".

is there a 2G file size limitation?  if so, why?
i found the same bug in erlang, though C is ok with it.

this is on redhat 7.1 running ocaml 3.02

thanks

-- 
Garry Hodgson                   Let my inspiration flow
Senior Hacker                      in token rhyme suggesting rhythm
Software Innovation Services    that will not forsake me
AT&T Labs                          'til my tale is told and done.
garry@sage.att.com
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] error opening large file
  2002-01-11 21:02 [Caml-list] error opening large file Garry Hodgson
@ 2002-01-11 21:51 ` Olivier Andrieu
  2002-01-15  9:56 ` Xavier Leroy
  1 sibling, 0 replies; 7+ messages in thread
From: Olivier Andrieu @ 2002-01-11 21:51 UTC (permalink / raw)
  To: Garry Hodgson, caml-list

 Garry Hodgson [Friday 11 January 2002] :
 > 
 > i get the following error when i open a large file (2561435180 bytes):
 > 
 >    let chan = open_in( "com.zone" );;
 >    Uncaught exception:
 >    Sys_error
 >    "com.zone: Value too large for defined data type".
 > 
 > is there a 2G file size limitation? if so, why?

The problem might be with your C library which doesn't use 64 bits
file offsets for the usual I/O functions (open, lseek ...). On my
redhat 7.2, opening a >2G file works OK.

BUT, looking at the ocaml source code, I see that the offset in the
file is coded with a 'long' which is a bit problematic.

Maybe some I/O funtions with int64 arguments could be added to the
standard library (at least for the low-level functions of module
Unix) ?


        Objective Caml version 3.04

# let ic = open_in "hom.genom.fa" ;;
val ic : in_channel = <abstr>
# let s = String.make 20 ' ' ;;                         
val s : string = "                    "
# let _ = input ic s 0 20 in s ;;
- : string = ">chr10\nCCGTGGTGAAGAC"
# in_channel_length ic ;;
- : int = -848525384
#

-- 
   Olivier
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] error opening large file
  2002-01-11 21:02 [Caml-list] error opening large file Garry Hodgson
  2002-01-11 21:51 ` Olivier Andrieu
@ 2002-01-15  9:56 ` Xavier Leroy
  2002-01-16  8:53   ` Florian Douetteau
  1 sibling, 1 reply; 7+ messages in thread
From: Xavier Leroy @ 2002-01-15  9:56 UTC (permalink / raw)
  To: Garry Hodgson; +Cc: ocaml

> i get the following error when i open a large file (2561435180 bytes):
> 
>    let chan = open_in( "com.zone" );;
>    Uncaught exception:
>    Sys_error
>    "com.zone: Value too large for defined data type".
> 
> is there a 2G file size limitation?  if so, why?
> i found the same bug in erlang, though C is ok with it.

Actually, this limitation is in the kernel and C library.  To ensure
backward compatibility with old programs that assume that the size of
a file fits in a 32-bit signed integer, system calls come in two
versions and/or with special options, one to select 32-bit file sizes
(and fail on files larger than 2G), one to select 64-bit file sizes.
The choice between the two versions is done through compile-time
defines, and the default can be either 32 or 64 depending on the C library.

You could try to recompile the OCaml sources with the -D_FILE_OFFSET_BITS=64
flag.  That will let you open the large file, and read it sequentially,
but of course file positions and stats (as returned by
in_channel_length, seek_in, Unix.stat, etc) will be wrong, since they
wrap around at 2^30 on a 32-bit machine.

Now that I think I've figured it out, I plan to compile future
versions of OCaml in 64-bit-file-size mode, and add new library functions
to manipulate file positions and sizes as 64-bit integers (seek_in64,
Unix.stat64, etc).

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] error opening large file
  2002-01-15  9:56 ` Xavier Leroy
@ 2002-01-16  8:53   ` Florian Douetteau
  2002-01-23 15:49     ` Xavier Leroy
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Douetteau @ 2002-01-16  8:53 UTC (permalink / raw)
  To: ocaml

>
> Now that I think I've figured it out, I plan to compile future
> versions of OCaml in 64-bit-file-size mode, and add new library functions
> to manipulate file positions and sizes as 64-bit integers (seek_in64,
> Unix.stat64, etc).

Would it degrade arithmetic performance a lot if 'int' conformed to 64-bit
arithmetic  on all platforms ?
(on a 32-bit cpu,  small integers would be unboxed, big integers  would be
boxed)

--
Florian Douetteau

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] error opening large file
  2002-01-16  8:53   ` Florian Douetteau
@ 2002-01-23 15:49     ` Xavier Leroy
  0 siblings, 0 replies; 7+ messages in thread
From: Xavier Leroy @ 2002-01-23 15:49 UTC (permalink / raw)
  To: FD; +Cc: ocaml

> Would it degrade arithmetic performance a lot if 'int' conformed to 64-bit
> arithmetic  on all platforms ?
> (on a 32-bit cpu,  small integers would be unboxed, big integers  would be
> boxed)

I think that the cost of a mixed arithmetic like you outline would be
quite high, say, a factor of 5 or 10 on integer-intensive code.

If you're ready to pay that cost, it might be worth using
arbitrary-precision arithmetic for the "big" integers.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] error opening large file
  2002-01-11 23:32 Florian Douetteau
@ 2002-01-12  0:02 ` David Monniaux
  0 siblings, 0 replies; 7+ messages in thread
From: David Monniaux @ 2002-01-12  0:02 UTC (permalink / raw)
  To: Moi; +Cc: caml-list

On Sat, 12 Jan 2002, Florian Douetteau wrote:

> The maximal positive integer of type 'int' is about 2.1E9
> I guess the overflow causes the problem.

It may be a good idea to have another set of file handling functions using
the int64 type for sizes and offsets, using the appropriate 64-bit
system library functions.

David Monniaux            http://www.di.ens.fr/~monniaux
Laboratoire d'informatique de l'École Normale Supérieure,
Paris, France

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] error opening large file
@ 2002-01-11 23:32 Florian Douetteau
  2002-01-12  0:02 ` David Monniaux
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Douetteau @ 2002-01-11 23:32 UTC (permalink / raw)
  To: caml-list


The maximal positive integer of type 'int' is about 2.1E9
I guess the overflow causes the problem.


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-01-23 15:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-01-11 21:02 [Caml-list] error opening large file Garry Hodgson
2002-01-11 21:51 ` Olivier Andrieu
2002-01-15  9:56 ` Xavier Leroy
2002-01-16  8:53   ` Florian Douetteau
2002-01-23 15:49     ` Xavier Leroy
2002-01-11 23:32 Florian Douetteau
2002-01-12  0:02 ` David Monniaux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).