caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] unmarshaling large data from string on 32 bits
@ 2015-02-02 10:32 Enrico Tassi
  2015-02-02 12:00 ` Gabriel Scherer
  2015-02-05  8:56 ` Alain Frisch
  0 siblings, 2 replies; 14+ messages in thread
From: Enrico Tassi @ 2015-02-02 10:32 UTC (permalink / raw)
  To: caml-list

Hello, I've just discovered that on 32 bits systems strings are
limited to 16M.  I'm using strings as buffers holding data to
be unmarshaled.  I could use another data structure, like a Buffer.t,
but I see no API for unmarshaling from a Buffer.t.

Is there another way? Is there code out there implementing that?

Best,
-- 
Enrico Tassi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-02 10:32 [Caml-list] unmarshaling large data from string on 32 bits Enrico Tassi
@ 2015-02-02 12:00 ` Gabriel Scherer
  2015-02-02 13:08   ` Pierre-Marie Pédrot
  2015-02-04 16:47   ` Enrico Tassi
  2015-02-05  8:56 ` Alain Frisch
  1 sibling, 2 replies; 14+ messages in thread
From: Gabriel Scherer @ 2015-02-02 12:00 UTC (permalink / raw)
  To: Enrico Tassi; +Cc: caml users

If you don't mind going through a temporary file,
Marshal.{to,from}_channel should work fine.

You should consider opening a problem report to OCaml upstream (
http://caml.inria.fr/mantis/ ) explaining the use-case and asking for
a large-string-safe API (eg. taking and returning lists of strings).

Note that on 32bit architectures, Buffer.t is *also* limited to 16Mio.

On Mon, Feb 2, 2015 at 11:32 AM, Enrico Tassi <enrico.tassi@inria.fr> wrote:
> Hello, I've just discovered that on 32 bits systems strings are
> limited to 16M.  I'm using strings as buffers holding data to
> be unmarshaled.  I could use another data structure, like a Buffer.t,
> but I see no API for unmarshaling from a Buffer.t.
>
> Is there another way? Is there code out there implementing that?
>
> Best,
> --
> Enrico Tassi
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-02 12:00 ` Gabriel Scherer
@ 2015-02-02 13:08   ` Pierre-Marie Pédrot
  2015-02-04 16:47   ` Enrico Tassi
  1 sibling, 0 replies; 14+ messages in thread
From: Pierre-Marie Pédrot @ 2015-02-02 13:08 UTC (permalink / raw)
  To: caml users; +Cc: Enrico Tassi

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

On 02/02/2015 13:00, Gabriel Scherer wrote:
> You should consider opening a problem report to OCaml upstream (
> http://caml.inria.fr/mantis/ ) explaining the use-case and asking for
> a large-string-safe API (eg. taking and returning lists of strings).

For the current use-case of Coq 8.5, I believe we may just hack around
this temporarily by using char bigarrays and a dedicated C stub that
wraps the demarshalling code around the bigarray one.

There is already a dedicated caml_input_value_from_malloc in the C code...

PMP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-02 12:00 ` Gabriel Scherer
  2015-02-02 13:08   ` Pierre-Marie Pédrot
@ 2015-02-04 16:47   ` Enrico Tassi
  2015-02-04 23:51     ` Gerd Stolpmann
  1 sibling, 1 reply; 14+ messages in thread
From: Enrico Tassi @ 2015-02-04 16:47 UTC (permalink / raw)
  To: caml-list

On Mon, Feb 02, 2015 at 01:00:53PM +0100, Gabriel Scherer wrote:
> If you don't mind going through a temporary file,
> Marshal.{to,from}_channel should work fine.

Thanks for the suggestion, if Windows is as smart as Linux than a
tmpfile should work fine.  If not, well, better than nothing.

> You should consider opening a problem report to OCaml upstream (
> http://caml.inria.fr/mantis/ ) explaining the use-case and asking for
> a large-string-safe API (eg. taking and returning lists of strings).

The chain of workarounds that leads here is long an ugly :-/

1. I have a problems with threads on Windows and (rarely) on Linux.
   The model is simple, Coq sits between 1 user interface and many
   (usually only 1) worker process.  Coq's main thread talks to the
   UI via a socket and does blocking calls; worker manager
   threads (1 per worker) do the same with their respective workers.
   At some point all threads are blocked reading. Then
   a worker process writes data but no thread is woken up.
   On Linux I need at least 2 worker manager threads to see the problem,
   on Windows 1 is enough.  All that using the channels API and Marshal.

   OK, I say, let's go back to the old good Unix.select to read only when some
   data is there.
 
2. The Unix module lets you get the fd number associated to the channel
   and you can use Unix.select with it.  And you can still use the channels
   API to Marshal.from_channel.  Looks good but I still a problem.  I have
   LARGE and small messages.  The small ones fit, largely, in the
   channels buffer.  Result: you have 2 "values" in the buffers of the
   OS.  Select tells you that you can read.  You Marshal.from_channel.
   Both values are moved in the channel buffer, but clearly
   "input_value" reads only the first one.  You select again, but this
   time the OS buffers are empty.  So you wait until next message
   arrives to discover the one forgotten in the channel buffer.

   I can't bet all my money on the correctness of this diagnoses,
   but that seemed the cause at the time.  Artificially inflating
   messages was working, but this is not what you want.  There is no
   API, at least in 3.12, to peek a channel and see if there is
   data (and if so, don't call select).  I tried with non blocking
   channels, but I could not succeed using input_value there (I don't
   recall if input_value is always blocking or something else went
   wrong).

   OK, I say, let's not use the channels and do old good Unix select and
   read.  Unfortunately the size of buffers, strings, is limited and the
   LARGE messages I have do not fit.

So yes, Marshal.from_string_list would be an option here.

I still have around a simple example that locks up on Windows,
I'll open a bug for that.

Best,
-- 
Enrico Tassi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-04 16:47   ` Enrico Tassi
@ 2015-02-04 23:51     ` Gerd Stolpmann
  0 siblings, 0 replies; 14+ messages in thread
From: Gerd Stolpmann @ 2015-02-04 23:51 UTC (permalink / raw)
  To: Enrico Tassi; +Cc: caml-list

What about this: you change the protocol so that there is a single
character, say an 'X', before any marshalled value. The 'X' is something
you can use for non-blocking reads. So (if ch is the input channel):

Unix.set_nonblock (Unix.descr_of_in_channel ch);
let x = input_char ch in   (* or Sys_blocked_io *)
assert(x = 'X');
Unix.clear_nonblock (Unix.descr_of_in_channel ch);
let v = Marshal.from_channel ch

This will also work when there are several messages in the input buffer,
as input_char then simply succeeds. If you get a Sys_blocked_io, you can
even revert to using select() because you know that the buffer is empty
then.

Gerd


> On Mon, Feb 02, 2015 at 01:00:53PM +0100, Gabriel Scherer wrote:
>> If you don't mind going through a temporary file,
>> Marshal.{to,from}_channel should work fine.
>
> Thanks for the suggestion, if Windows is as smart as Linux than a
> tmpfile should work fine.  If not, well, better than nothing.
>
>> You should consider opening a problem report to OCaml upstream (
>> http://caml.inria.fr/mantis/ ) explaining the use-case and asking for
>> a large-string-safe API (eg. taking and returning lists of strings).
>
> The chain of workarounds that leads here is long an ugly :-/
>
> 1. I have a problems with threads on Windows and (rarely) on Linux.
>    The model is simple, Coq sits between 1 user interface and many
>    (usually only 1) worker process.  Coq's main thread talks to the
>    UI via a socket and does blocking calls; worker manager
>    threads (1 per worker) do the same with their respective workers.
>    At some point all threads are blocked reading. Then
>    a worker process writes data but no thread is woken up.
>    On Linux I need at least 2 worker manager threads to see the problem,
>    on Windows 1 is enough.  All that using the channels API and Marshal.
>
>    OK, I say, let's go back to the old good Unix.select to read only when
> some
>    data is there.
>
> 2. The Unix module lets you get the fd number associated to the channel
>    and you can use Unix.select with it.  And you can still use the
> channels
>    API to Marshal.from_channel.  Looks good but I still a problem.  I have
>    LARGE and small messages.  The small ones fit, largely, in the
>    channels buffer.  Result: you have 2 "values" in the buffers of the
>    OS.  Select tells you that you can read.  You Marshal.from_channel.
>    Both values are moved in the channel buffer, but clearly
>    "input_value" reads only the first one.  You select again, but this
>    time the OS buffers are empty.  So you wait until next message
>    arrives to discover the one forgotten in the channel buffer.
>
>    I can't bet all my money on the correctness of this diagnoses,
>    but that seemed the cause at the time.  Artificially inflating
>    messages was working, but this is not what you want.  There is no
>    API, at least in 3.12, to peek a channel and see if there is
>    data (and if so, don't call select).  I tried with non blocking
>    channels, but I could not succeed using input_value there (I don't
>    recall if input_value is always blocking or something else went
>    wrong).
>
>    OK, I say, let's not use the channels and do old good Unix select and
>    read.  Unfortunately the size of buffers, strings, is limited and the
>    LARGE messages I have do not fit.
>
> So yes, Marshal.from_string_list would be an option here.
>
> I still have around a simple example that locks up on Windows,
> I'll open a bug for that.
>
> Best,
> --
> Enrico Tassi
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-02 10:32 [Caml-list] unmarshaling large data from string on 32 bits Enrico Tassi
  2015-02-02 12:00 ` Gabriel Scherer
@ 2015-02-05  8:56 ` Alain Frisch
  2015-02-05  9:01   ` Gabriel Scherer
  2015-02-05  9:58   ` Pierre-Marie Pédrot
  1 sibling, 2 replies; 14+ messages in thread
From: Alain Frisch @ 2015-02-05  8:56 UTC (permalink / raw)
  To: Enrico Tassi, caml-list

Hello,

Be aware when using the generic demarshaling on 32 bit systems with 
large data (even when they fit in a string):  this will expand the heap 
(adding more pages to it) on every demarshaling, and unless you arrange 
so that the compacter runs often enough (calling manually Gc.compact for 
instance), you'll end up eating all the memory.

This is documented here:

http://caml.inria.fr/mantis/view.php?id=5813

One possible work-around is to use an alternative implementation of the 
demarshaler (there is such a pure OCaml implementation in Frama-C). 
Another is to avoid the generic marshaling, either by writing a manual 
version for your specific data type or by generating it from your type 
definitions (à la bin-prot, I assume).


Alain



On 02/02/2015 11:32 AM, Enrico Tassi wrote:
> Hello, I've just discovered that on 32 bits systems strings are
> limited to 16M.  I'm using strings as buffers holding data to
> be unmarshaled.  I could use another data structure, like a Buffer.t,
> but I see no API for unmarshaling from a Buffer.t.
>
> Is there another way? Is there code out there implementing that?
>
> Best,
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05  8:56 ` Alain Frisch
@ 2015-02-05  9:01   ` Gabriel Scherer
  2015-02-05  9:34     ` Alain Frisch
  2015-02-05  9:58   ` Pierre-Marie Pédrot
  1 sibling, 1 reply; 14+ messages in thread
From: Gabriel Scherer @ 2015-02-05  9:01 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Enrico Tassi, caml users

> One possible work-around is to use an alternative implementation of the demarshaler (there is such a pure OCaml implementation in Frama-C).

Is this implementation publicly available somewhere?

On Thu, Feb 5, 2015 at 9:56 AM, Alain Frisch <alain.frisch@lexifi.com> wrote:
> Hello,
>
> Be aware when using the generic demarshaling on 32 bit systems with large
> data (even when they fit in a string):  this will expand the heap (adding
> more pages to it) on every demarshaling, and unless you arrange so that the
> compacter runs often enough (calling manually Gc.compact for instance),
> you'll end up eating all the memory.
>
> This is documented here:
>
> http://caml.inria.fr/mantis/view.php?id=5813
>
> One possible work-around is to use an alternative implementation of the
> demarshaler (there is such a pure OCaml implementation in Frama-C). Another
> is to avoid the generic marshaling, either by writing a manual version for
> your specific data type or by generating it from your type definitions (à la
> bin-prot, I assume).
>
>
> Alain
>
>
>
> On 02/02/2015 11:32 AM, Enrico Tassi wrote:
>>
>> Hello, I've just discovered that on 32 bits systems strings are
>> limited to 16M.  I'm using strings as buffers holding data to
>> be unmarshaled.  I could use another data structure, like a Buffer.t,
>> but I see no API for unmarshaling from a Buffer.t.
>>
>> Is there another way? Is there code out there implementing that?
>>
>> Best,
>>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05  9:01   ` Gabriel Scherer
@ 2015-02-05  9:34     ` Alain Frisch
  0 siblings, 0 replies; 14+ messages in thread
From: Alain Frisch @ 2015-02-05  9:34 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Enrico Tassi, caml users

On 02/05/2015 10:01 AM, Gabriel Scherer wrote:
>> One possible work-around is to use an alternative implementation of the demarshaler (there is such a pure OCaml implementation in Frama-C).
>
> Is this implementation publicly available somewhere?

In the Frama-C distribution, external/unmarshal.ml

-- Alain

>
> On Thu, Feb 5, 2015 at 9:56 AM, Alain Frisch <alain.frisch@lexifi.com> wrote:
>> Hello,
>>
>> Be aware when using the generic demarshaling on 32 bit systems with large
>> data (even when they fit in a string):  this will expand the heap (adding
>> more pages to it) on every demarshaling, and unless you arrange so that the
>> compacter runs often enough (calling manually Gc.compact for instance),
>> you'll end up eating all the memory.
>>
>> This is documented here:
>>
>> http://caml.inria.fr/mantis/view.php?id=5813
>>
>> One possible work-around is to use an alternative implementation of the
>> demarshaler (there is such a pure OCaml implementation in Frama-C). Another
>> is to avoid the generic marshaling, either by writing a manual version for
>> your specific data type or by generating it from your type definitions (à la
>> bin-prot, I assume).
>>
>>
>> Alain
>>
>>
>>
>> On 02/02/2015 11:32 AM, Enrico Tassi wrote:
>>>
>>> Hello, I've just discovered that on 32 bits systems strings are
>>> limited to 16M.  I'm using strings as buffers holding data to
>>> be unmarshaled.  I could use another data structure, like a Buffer.t,
>>> but I see no API for unmarshaling from a Buffer.t.
>>>
>>> Is there another way? Is there code out there implementing that?
>>>
>>> Best,
>>>
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05  8:56 ` Alain Frisch
  2015-02-05  9:01   ` Gabriel Scherer
@ 2015-02-05  9:58   ` Pierre-Marie Pédrot
  2015-02-05 10:33     ` Enrico Tassi
  2015-02-05 10:50     ` Alain Frisch
  1 sibling, 2 replies; 14+ messages in thread
From: Pierre-Marie Pédrot @ 2015-02-05  9:58 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On 05/02/2015 09:56, Alain Frisch wrote:
> One possible work-around is to use an alternative implementation of the
> demarshaler (there is such a pure OCaml implementation in Frama-C).
> Another is to avoid the generic marshaling, either by writing a manual
> version for your specific data type or by generating it from your type
> definitions (à la bin-prot, I assume).

Both workarounds would not work. Indeed, we use closure marshalling in
Coq, which is not supported by the two proposed implementations.

(Plus, that would be so slow I do not even want to think about it.)

PMP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05  9:58   ` Pierre-Marie Pédrot
@ 2015-02-05 10:33     ` Enrico Tassi
  2015-02-05 10:50     ` Alain Frisch
  1 sibling, 0 replies; 14+ messages in thread
From: Enrico Tassi @ 2015-02-05 10:33 UTC (permalink / raw)
  To: caml-list

On Thu, Feb 05, 2015 at 10:58:27AM +0100, Pierre-Marie Pédrot wrote:
> Both workarounds would not work. Indeed, we use closure marshalling in
> Coq, which is not supported by the two proposed implementations.

JFTR no, we don't mashal closures.

Still a custom unmarshaler is a bit pulp for me...

Best,
-- 
Enrico Tassi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05  9:58   ` Pierre-Marie Pédrot
  2015-02-05 10:33     ` Enrico Tassi
@ 2015-02-05 10:50     ` Alain Frisch
  2015-02-05 12:22       ` Fabrice Le Fessant
  2015-02-05 12:27       ` Enrico Tassi
  1 sibling, 2 replies; 14+ messages in thread
From: Alain Frisch @ 2015-02-05 10:50 UTC (permalink / raw)
  To: Pierre-Marie Pédrot, caml-list

On 02/05/2015 10:58 AM, Pierre-Marie Pédrot wrote:
> On 05/02/2015 09:56, Alain Frisch wrote:
>> One possible work-around is to use an alternative implementation of the
>> demarshaler (there is such a pure OCaml implementation in Frama-C).
>> Another is to avoid the generic marshaling, either by writing a manual
>> version for your specific data type or by generating it from your type
>> definitions (à la bin-prot, I assume).
>
> Both workarounds would not work. Indeed, we use closure marshalling in
> Coq, which is not supported by the two proposed implementations.

Waow, closure marshaling across processes: you live dangerously :-)  I 
hope you know precisely which values go into the closures or not 
(exception slots, global mutable data structures, etc)...


> (Plus, that would be so slow I do not even want to think about it.)

I'm not so sure.  Serialization/deserialization routines specialized for 
specific data types avoid some runtime checks required by the generic 
functions (for the block tags and sizes, for detecting possible sharing 
everywhere) and can use more specialized (and thus more compact) data 
representation.

We have had that problem at LexiFi, where we used to rely on the generic 
marshaling for exchanges messages between processes (hence the ticket I 
referred to).  We finally decided to write (manually) specialized 
functions for our type of messages (no closures, no sharing), and the 
performance results where slower but reasonable compared to the generic 
marshaling (without putting too much engineering effort into it).  Anyway:

  - our previous workaround was to trigger Gc.compact explicitly for big 
messages, which was much worse, of course;

  - it's clear that the OCaml implementation of the generic demarshaler 
would be slower;

  - since this only impacts 32-bit systems, nobody seems motivated 
enough to put energy into fixing the core issue.


Alain

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05 10:50     ` Alain Frisch
@ 2015-02-05 12:22       ` Fabrice Le Fessant
  2015-02-05 12:24         ` Alain Frisch
  2015-02-05 12:27       ` Enrico Tassi
  1 sibling, 1 reply; 14+ messages in thread
From: Fabrice Le Fessant @ 2015-02-05 12:22 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Pierre-Marie Pédrot, Ocaml Mailing List

Hi,

  A long time ago, we had a patch on 32 bit systems to transparently
extend the size of strings and arrays over the 16 MB limit:

http://www.ocamlpro.com/blog/2011/05/06/longval.html

At the time, there was little interest in this patch. If I remember
well, it was quite small, we could probably include it in the next
version of OCPWin32, if you think having it under Windows would help
you.

--Fabrice


On Thu, Feb 5, 2015 at 11:50 AM, Alain Frisch <alain.frisch@lexifi.com> wrote:
> On 02/05/2015 10:58 AM, Pierre-Marie Pédrot wrote:
>>
>> On 05/02/2015 09:56, Alain Frisch wrote:
>>>
>>> One possible work-around is to use an alternative implementation of the
>>> demarshaler (there is such a pure OCaml implementation in Frama-C).
>>> Another is to avoid the generic marshaling, either by writing a manual
>>> version for your specific data type or by generating it from your type
>>> definitions (à la bin-prot, I assume).
>>
>>
>> Both workarounds would not work. Indeed, we use closure marshalling in
>> Coq, which is not supported by the two proposed implementations.
>
>
> Waow, closure marshaling across processes: you live dangerously :-)  I hope
> you know precisely which values go into the closures or not (exception
> slots, global mutable data structures, etc)...
>
>
>> (Plus, that would be so slow I do not even want to think about it.)
>
>
> I'm not so sure.  Serialization/deserialization routines specialized for
> specific data types avoid some runtime checks required by the generic
> functions (for the block tags and sizes, for detecting possible sharing
> everywhere) and can use more specialized (and thus more compact) data
> representation.
>
> We have had that problem at LexiFi, where we used to rely on the generic
> marshaling for exchanges messages between processes (hence the ticket I
> referred to).  We finally decided to write (manually) specialized functions
> for our type of messages (no closures, no sharing), and the performance
> results where slower but reasonable compared to the generic marshaling
> (without putting too much engineering effort into it).  Anyway:
>
>  - our previous workaround was to trigger Gc.compact explicitly for big
> messages, which was much worse, of course;
>
>  - it's clear that the OCaml implementation of the generic demarshaler would
> be slower;
>
>  - since this only impacts 32-bit systems, nobody seems motivated enough to
> put energy into fixing the core issue.
>
>
> Alain
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs



-- 
Fabrice LE FESSANT
Chercheur en Informatique
INRIA Paris Rocquencourt -- OCamlPro
Programming Languages and Distributed Systems

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05 12:22       ` Fabrice Le Fessant
@ 2015-02-05 12:24         ` Alain Frisch
  0 siblings, 0 replies; 14+ messages in thread
From: Alain Frisch @ 2015-02-05 12:24 UTC (permalink / raw)
  To: Fabrice Le Fessant; +Cc: Pierre-Marie Pédrot, Ocaml Mailing List

Hello Fabrice,

While it's certainly interesting to extend the size of strings (and it 
would solve the original problem), it would not address the problem 
related to unmarshaling large data on 32-bit systems.


Alain

On 02/05/2015 01:22 PM, Fabrice Le Fessant wrote:
> Hi,
>
>    A long time ago, we had a patch on 32 bit systems to transparently
> extend the size of strings and arrays over the 16 MB limit:
>
> http://www.ocamlpro.com/blog/2011/05/06/longval.html
>
> At the time, there was little interest in this patch. If I remember
> well, it was quite small, we could probably include it in the next
> version of OCPWin32, if you think having it under Windows would help
> you.
>
> --Fabrice
>
>
> On Thu, Feb 5, 2015 at 11:50 AM, Alain Frisch <alain.frisch@lexifi.com> wrote:
>> On 02/05/2015 10:58 AM, Pierre-Marie Pédrot wrote:
>>>
>>> On 05/02/2015 09:56, Alain Frisch wrote:
>>>>
>>>> One possible work-around is to use an alternative implementation of the
>>>> demarshaler (there is such a pure OCaml implementation in Frama-C).
>>>> Another is to avoid the generic marshaling, either by writing a manual
>>>> version for your specific data type or by generating it from your type
>>>> definitions (à la bin-prot, I assume).
>>>
>>>
>>> Both workarounds would not work. Indeed, we use closure marshalling in
>>> Coq, which is not supported by the two proposed implementations.
>>
>>
>> Waow, closure marshaling across processes: you live dangerously :-)  I hope
>> you know precisely which values go into the closures or not (exception
>> slots, global mutable data structures, etc)...
>>
>>
>>> (Plus, that would be so slow I do not even want to think about it.)
>>
>>
>> I'm not so sure.  Serialization/deserialization routines specialized for
>> specific data types avoid some runtime checks required by the generic
>> functions (for the block tags and sizes, for detecting possible sharing
>> everywhere) and can use more specialized (and thus more compact) data
>> representation.
>>
>> We have had that problem at LexiFi, where we used to rely on the generic
>> marshaling for exchanges messages between processes (hence the ticket I
>> referred to).  We finally decided to write (manually) specialized functions
>> for our type of messages (no closures, no sharing), and the performance
>> results where slower but reasonable compared to the generic marshaling
>> (without putting too much engineering effort into it).  Anyway:
>>
>>   - our previous workaround was to trigger Gc.compact explicitly for big
>> messages, which was much worse, of course;
>>
>>   - it's clear that the OCaml implementation of the generic demarshaler would
>> be slower;
>>
>>   - since this only impacts 32-bit systems, nobody seems motivated enough to
>> put energy into fixing the core issue.
>>
>>
>> Alain
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] unmarshaling large data from string on 32 bits
  2015-02-05 10:50     ` Alain Frisch
  2015-02-05 12:22       ` Fabrice Le Fessant
@ 2015-02-05 12:27       ` Enrico Tassi
  1 sibling, 0 replies; 14+ messages in thread
From: Enrico Tassi @ 2015-02-05 12:27 UTC (permalink / raw)
  To: caml-list

On Thu, Feb 05, 2015 at 11:50:26AM +0100, Alain Frisch wrote:
> I hope you know precisely which values go into the closures or not (exception
> slots, global mutable data structures, etc)...

This is exactly why we _don't_ marshal closures in Coq.  As far as I
know "what goes into a closure" is not documented, so I removed all
closures from the state we send on the wire.

I don't know why PMP ended up thinking we do, but I've just checked, no
Marshal.Closure flag ;-)

Thanks for the heads up anyway,
-- 
Enrico Tassi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-02-05 12:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02 10:32 [Caml-list] unmarshaling large data from string on 32 bits Enrico Tassi
2015-02-02 12:00 ` Gabriel Scherer
2015-02-02 13:08   ` Pierre-Marie Pédrot
2015-02-04 16:47   ` Enrico Tassi
2015-02-04 23:51     ` Gerd Stolpmann
2015-02-05  8:56 ` Alain Frisch
2015-02-05  9:01   ` Gabriel Scherer
2015-02-05  9:34     ` Alain Frisch
2015-02-05  9:58   ` Pierre-Marie Pédrot
2015-02-05 10:33     ` Enrico Tassi
2015-02-05 10:50     ` Alain Frisch
2015-02-05 12:22       ` Fabrice Le Fessant
2015-02-05 12:24         ` Alain Frisch
2015-02-05 12:27       ` Enrico Tassi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).