caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] FP's and HyperThreading Processors
@ 2003-06-13  6:44 David McClain
  2003-06-13  8:06 ` John Max Skaller
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: David McClain @ 2003-06-13  6:44 UTC (permalink / raw)
  To: caml-list

Hi,

I have a massive application that performs nonlinear fitting to 170+
parameters; a phase retrieval problem for discerning the aberrations of an
optical system. This program is written largely in compiled OCaml closures,
along with a multithreaded vendor supplied FFT routine (presumably optimized
for their processor).

On an old P2 single processor machine at 350 MHz, I am seeing almost 95%+
CPU utilization. But on a new 3 GHz P4 with HyperThreading enabled (dual
register banks for fast context switching and minimum cache coherence
overhead), this same program provides much less than 5% CPU utilization. The
net result is that this program runs only twice as fast on the new 3 GHz P4
as it runs on the old 350 MHz P2.

I suspect, but have yet to prove, that the low utilization is due to a low
CPU to memory bandwidth and to the failure of the L1 and L2 caches to supply
needed operations and data to the CPU. This, I would hypothesize, is going
to be demonstrated by any language that prefers fresh memory allocation for
results, e.g., OCaml, ML, Lisp, Smalltalk, etc.

If I am correct, then it implies that our hardware friends are moving
rapidly in the opposite direction to our advanced software systems. I
mention this in order to tickle the imaginations of both camps.

Cheers,

- David McClain, Sr. Scientist, Raytheon, Tucson, AZ


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13  6:44 [Caml-list] FP's and HyperThreading Processors David McClain
@ 2003-06-13  8:06 ` John Max Skaller
  2003-06-13 10:03   ` [Caml-list] Type safe affectation ? Christophe Raffalli
  2003-06-13 18:38 ` [Caml-list] FP's and HyperThreading Processors Kip Macy
  2003-06-13 19:07 ` Xavier Leroy
  2 siblings, 1 reply; 17+ messages in thread
From: John Max Skaller @ 2003-06-13  8:06 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

David McClain wrote:


> I suspect, but have yet to prove, that the low utilization is due to a low
> CPU to memory bandwidth and to the failure of the L1 and L2 caches to supply
> needed operations and data to the CPU. This, I would hypothesize, is going
> to be demonstrated by any language that prefers fresh memory allocation for
> results, e.g., OCaml, ML, Lisp, Smalltalk, etc.

Well, I did think Ocaml used a generational collector
that recycled 'young' memory fast. Perhaps the
'youthfulness' can be tuned down to the L2 cache size?

FYI: its my understanding there is a new trend.
Processor caches in multi-processor systems are
a bad idea. Instead, the memory chips should have
caches on them. I believe, quite a few do now:
DRAM with an SRAM cache.

Now that should actually mean that using
'fresh' memory is actually the *fastest* method,
since it tends to distribute the load over all
the memory chips. In particular .. writes become
much faster than reads (since all random writes
get cached, whilst some reads still have to wait
for the main store).

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Caml-list] Type safe affectation ?
  2003-06-13  8:06 ` John Max Skaller
@ 2003-06-13 10:03   ` Christophe Raffalli
  2003-06-14 13:35     ` Xavier Leroy
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Raffalli @ 2003-06-13 10:03 UTC (permalink / raw)
  Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]


Using the Obj module you can do affectation of cons cell of a list (or 
any onther data type), which is usefull for instance to write a tail 
recursive Map funtion ... (see a previous thread)

But this is not typesafe !

Why not allow a typesafe way to mute immutable data (sum, products, 
immutable records and so on) ?

---

By the way, another step would be to infer for each function if it mutes 
its arguments instead of annoting record with "mutable" indication.

This is better because the same data may be considered as mutable by 
some functions (for instance when you construct the data) but then be 
used only by immutable functions and this way the type inference can 
help you insure that you do not mute the data anymore.

But this is research topic ?


-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13  6:44 [Caml-list] FP's and HyperThreading Processors David McClain
  2003-06-13  8:06 ` John Max Skaller
@ 2003-06-13 18:38 ` Kip Macy
  2003-06-13 21:23   ` David McClain
  2003-06-13 19:07 ` Xavier Leroy
  2 siblings, 1 reply; 17+ messages in thread
From: Kip Macy @ 2003-06-13 18:38 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list



> along with a multithreaded vendor supplied FFT routine (presumably optimized
> for their processor).
If it was optimized for the P2 it will by definition not be optimized for
the P4, being potentially penalized by a much deeper pipeline and the use
of a trace cache instead of a standard I-cache. For example loop unrolling
is *bad* when you have a limited number of pre-decoded ops. Writes to the
D-cache write 64 bytes, reads bring in a "sector" or 2 cache lines to try
and mask the increased latency of the memory bus. The hardware pre-fetcher
kicks in after you access 256 bytes sequentially. What this all translates
to is that perfectly healthy data access patterns on the P2 may be
pathological on the P4. And in may in part be due to the FFT.  Little if
any of this applies if you already have an appropriate version of the 
FFT for the P4.

It is also worth noting that with the small L1 cache sizes on the P4,
hyperthreading running data intensive programs could easily end up being a
net loss with competing processes kicking out each others cache entries.

As a side note you could end up being partly TLB limited if your access
patterns jump around. if you are running a more recent version of Linux
you might want to try putting your data on 4MB pages.
 
> net result is that this program runs only twice as fast on the new 3 GHz P4
> as it runs on the old 350 MHz P2.
> 

I suspect your analysis is correct, but I'd really have to try out the
performance counters before I came to any conclusions. This doesn't
neccessarily mean that ML is intrinsically on the wrong track with
allocating new memory. It does mean that more work needs to be
done to make the memory allocator and garbage collector more locality 
aware. There is some discussion of this in "Compiling with 
Continuations" by Appel.


					-Kip
 


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13  6:44 [Caml-list] FP's and HyperThreading Processors David McClain
  2003-06-13  8:06 ` John Max Skaller
  2003-06-13 18:38 ` [Caml-list] FP's and HyperThreading Processors Kip Macy
@ 2003-06-13 19:07 ` Xavier Leroy
  2003-06-13 21:33   ` Jim Farrand
  2003-07-02 10:26   ` David Monniaux
  2 siblings, 2 replies; 17+ messages in thread
From: Xavier Leroy @ 2003-06-13 19:07 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

> On an old P2 single processor machine at 350 MHz, I am seeing almost 95%+
> CPU utilization. But on a new 3 GHz P4 with HyperThreading enabled (dual
> register banks for fast context switching and minimum cache coherence
> overhead), this same program provides much less than 5% CPU utilization. 

How do you measure "CPU utilization"?  Different tools have different
notions of CPU utilization.  For instance, OS-level tools such as
"top", "ps" or "uptime" treat the time spent by the CPU while stalled
by cache misses and the like as user CPU time, not as idle time.
Only processor-level performance monitor counters can distinguish
between active cycles and stalled cycles.

> The net result is that this program runs only twice as fast on the
> new 3 GHz P4 as it runs on the old 350 MHz P2.

Some operations take more cycles on the P4 than on the PPro/P2/P3,
e.g. shifts, integer multiplication and division.  I saw some programs
run slightly slower on a 2 Ghz P4 than on a 1 Ghz P3.  But that
doesn't suffice to explain the factor of 5 that you observed.

> I suspect, but have yet to prove, that the low utilization is due to a low
> CPU to memory bandwidth and to the failure of the L1 and L2 caches to supply
> needed operations and data to the CPU. This, I would hypothesize, is going
> to be demonstrated by any language that prefers fresh memory allocation for
> results, e.g., OCaml, ML, Lisp, Smalltalk, etc.
> 
> If I am correct, then it implies that our hardware friends are moving
> rapidly in the opposite direction to our advanced software systems. I
> mention this in order to tickle the imaginations of both camps.

This was a concern in the early to mid 90s, when processor speed
increased much more than the performances of the memory subsystems.
(I remember Caml benchmarks on an early Alpha system where the processor
was stalled 9 cycles out of 10.)

But since then the hardware manufacturers have done a very good job at
maintaining a decent balance between the memory subsystem and the ALU.
In particular, out-of-order execution is quite effective at hiding
cache miss latencies.  

My general feeling (although I have no proof of that) is that the
memory usage patterns of Caml programs aren't radically different from
those of more mainstream programs.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 18:38 ` [Caml-list] FP's and HyperThreading Processors Kip Macy
@ 2003-06-13 21:23   ` David McClain
  2003-06-13 21:39     ` Kip Macy
  2003-06-14  6:11     ` Ville-Pertti Keinonen
  0 siblings, 2 replies; 17+ messages in thread
From: David McClain @ 2003-06-13 21:23 UTC (permalink / raw)
  To: caml-list

> If it was optimized for the P2 it will by definition not be optimized for
> the P4,

Yes, you are correct about this, but I have the Intel numerical libs for the
P4 linked into the program. On the old P2 system, this code spend roughly
60% of its time inside that vendor code for FFT's. This is no longer true,
as other tests show very high performance of just those vendor routines.

Whereas, on the P2 at 350 MHz, I have been able to process audio through 5
stereo pairs of heavy duty (1-4 K) FFT's per data block at rates of around
200 MSamples/sec, this new computer is even faster by a huge amount. I
haven't any numerical rates to give on this just yet, but I can certainly
produce these. The new processor is fast enough to do serious DSP coding on
the P4 directly from compiled OCaml.

What is different about the audio processing code versus my optical phase
retreival code, is that I took care to reuse audo memory buffers. I know
this violates the spirit of FP to some extent, but it was needed to gain
this kind of speedy throughput in OCaml. I did no such optimizations in the
optical analysis code. So clearly, data locality is an important performance
parameter.

By the way, I did not mean to indict any of the high level languages, OCaml,
Lisp, SML, etc. I love these to death! But I also have some experience in
designing memory subsystems aimed at making OO code more efficient.
Specifically, we developed a hardware assist to garbage reclamation. What
I'm really asking is that hardware now become more aware of higher level
language needs. They have shown that they can do a superb job of running
hand crafted efficient low-level code. But they turned their backs on our
more forward looking needs.

Cheers,

- DM


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 19:07 ` Xavier Leroy
@ 2003-06-13 21:33   ` Jim Farrand
  2003-06-13 21:39     ` David McClain
  2003-07-02 10:26   ` David Monniaux
  1 sibling, 1 reply; 17+ messages in thread
From: Jim Farrand @ 2003-06-13 21:33 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: David McClain, caml-list

On Fri, Jun 13, 2003 at 09:07:51PM +0200, Xavier Leroy wrote:

> How do you measure "CPU utilization"?  Different tools have different
> notions of CPU utilization.  For instance, OS-level tools such as
> "top", "ps" or "uptime" treat the time spent by the CPU while stalled
> by cache misses and the like as user CPU time, not as idle time.
> Only processor-level performance monitor counters can distinguish
> between active cycles and stalled cycles.

Isn't the point of hyperthreading that the CPU doesn't stall on cache
misses, but instead is able to run another process?  Or am I
misunderstanding the point of hyperthreading?
-- 
Jim Farrand
-- 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 21:33   ` Jim Farrand
@ 2003-06-13 21:39     ` David McClain
  0 siblings, 0 replies; 17+ messages in thread
From: David McClain @ 2003-06-13 21:39 UTC (permalink / raw)
  To: caml-list

> Isn't the point of hyperthreading that the CPU doesn't stall on cache
> misses, but instead is able to run another process?  Or am I

True enough, if your computer has anything else to do... But if not, then
only one processor becomes involved in the running of the program thread.
But secondly, what happens when I so overwhelm the cache with my huge data
arrays that it flushes much of what might be needed by another thread?

- DM


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 21:23   ` David McClain
@ 2003-06-13 21:39     ` Kip Macy
  2003-06-13 21:56       ` David McClain
  2003-06-14  6:11     ` Ville-Pertti Keinonen
  1 sibling, 1 reply; 17+ messages in thread
From: Kip Macy @ 2003-06-13 21:39 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list



On Fri, 13 Jun 2003, David McClain wrote:

> > If it was optimized for the P2 it will by definition not be optimized for
> > the P4,
> 
> Yes, you are correct about this, but I have the Intel numerical libs for the
> P4 linked into the program. On the old P2 system, this code spend roughly
> 60% of its time inside that vendor code for FFT's. This is no longer true,
> as other tests show very high performance of just those vendor routines.

Althought he FFT code itself probably has large influence on memory access
patterns.

> hand crafted efficient low-level code. But they turned their backs on our
> more forward looking needs.

When I look at the sheer number of want ads for java programmers, I can't
help but wonder how much of an issue performance really is for most 
people. I honestly think it would be much cheaper to make GC systems
locality aware than for processor vendors to incorporate GC support. 
Especially considering the apparent lack of commercial demand for
it.




-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 21:39     ` Kip Macy
@ 2003-06-13 21:56       ` David McClain
  2003-06-14 22:08         ` John Max Skaller
  0 siblings, 1 reply; 17+ messages in thread
From: David McClain @ 2003-06-13 21:56 UTC (permalink / raw)
  To: caml-list

> Althought he FFT code itself probably has large influence on memory access
> patterns.

Yes! Good point! My FFT's in the optical code are all rather large'ish 2-D
FFT's (numbering 256x256 on average). The vendor supplied code is a 1-D FFT
wrapped by a hand coded C orchestrator to split these large 2-D arrays into
multiple threads of concurrent 1-D FFT's. But the sheer size of these arrays
will most likely flush the cache of anything needed by the higher level
OCaml analysis routines.

This machine is a consumer grade P4 and so it still incorporates a PCI bus,
and the cache is probably minimally sized. Such is the cost of doing science
on a shoestring budget.... But I ought not complain... I have a machine on
my desk with nearly 10^6 as much memory, and a thousand times faster, than I
once had, and I don't have to wait around all day for the printed program
output from my tray of cards representing a Fortran program...

- DM


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 21:23   ` David McClain
  2003-06-13 21:39     ` Kip Macy
@ 2003-06-14  6:11     ` Ville-Pertti Keinonen
  1 sibling, 0 replies; 17+ messages in thread
From: Ville-Pertti Keinonen @ 2003-06-14  6:11 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list


> P4 linked into the program. On the old P2 system, this code spend 
> roughly
> 60% of its time inside that vendor code for FFT's. This is no longer 
> true,
> as other tests show very high performance of just those vendor 
> routines.

Does calling from OCaml affect the performance of the FFTs?

You presumably use your own C wrappers to access the vendor code from 
your OCaml code - do you align the stack?

For floating point intensive code, lack of alignment can cause 
significant differences in performance.  I remember seeing 40% 
differences (worst case) in performance when I ran into this several 
years ago, and I don't think the situation has improved with more 
recent cpus.

OCaml currently doesn't ensure more than word alignment for stack or 
allocations on i386, so you are going to have suboptimal performance 
for any floating point code written in OCaml (as well as float arrays 
allocated in OCaml but manipulated elsewhere).  However, if you're 
calling external code (presumably compiled in an alignment-preserving 
way), you should at least try ensuring that the stack is aligned to a 
64-bit boundary (or 128-bit boundary if SSE is used for anything) in 
case that affects performance.

If you're actually making heavy use of hyperthreading, it may make 
memory access patterns less uniform and performance highly 
unpredictable.  Have you tried comparing performance with and without 
hyperthreading enabled?

Anyway, it sounds like you might be running into several issues.  The 
only certain way to find out which are the most significant ones is to 
do large amounts of profiling and testing.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Type safe affectation ?
  2003-06-13 10:03   ` [Caml-list] Type safe affectation ? Christophe Raffalli
@ 2003-06-14 13:35     ` Xavier Leroy
  2003-06-15 18:53       ` brogoff
  0 siblings, 1 reply; 17+ messages in thread
From: Xavier Leroy @ 2003-06-14 13:35 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

> Using the Obj module you can do affectation of cons cell of a list (or 
> any onther data type), which is usefull for instance to write a tail 
> recursive Map funtion ... (see a previous thread)
> But this is not typesafe !

Like all uses of module Obj, you can make this type-safe via proper
type constraints:

let set_cdr (l: 'a list) (x: 'a list) =
  match l with
    [] -> invalid_arg "set_cdr"
  | _::_ -> Obj.set_field (Obj.repr l) 1 (Obj.repr x)

> Why not allow a typesafe way to mute immutable data (sum, products, 
> immutable records and so on) ?

Because that would make this data mutable :-)  Immutability isn't there
just to annoy the programmer: it's a major improvement for software
reliability and quality.  Proving the correctness of functions that
manipulate lists or trees is feasible if the data is immutable, but
becomes essentially impossible if arbitrary in-place modifications are
allowed anywhere in the program.

Even if you're not interested in proofs of programs, I'm ready to bet
that there aren't 1 programmer in 100 who can write *fully correct*
code that manipulate mutable balanced trees, for instance.  (I think I
can't.)

Also, keep in mind that the compiler can optimize based on
immutability assumptions.  For instance, the OCaml compiler performs
propagation of structured constants and code motion for accesses
inside data structures that are immutable.  You might very well break
something by using the set_cdr function above.

> By the way, another step would be to infer for each function if it mutes 
> its arguments instead of annoting record with "mutable" indication.
> 
> This is better because the same data may be considered as mutable by 
> some functions (for instance when you construct the data) but then be 
> used only by immutable functions and this way the type inference can 
> help you insure that you do not mute the data anymore.
> 
> But this is research topic ?

This sounds reminiscent of effect systems.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 21:56       ` David McClain
@ 2003-06-14 22:08         ` John Max Skaller
  0 siblings, 0 replies; 17+ messages in thread
From: John Max Skaller @ 2003-06-14 22:08 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

David McClain wrote:

>>Althought he FFT code itself probably has large influence on memory access


> But I ought not complain... I have a machine on
> my desk with nearly 10^6 as much memory, and a thousand times faster, than I
> once had, and I don't have to wait around all day for the printed program
> output from my tray of cards representing a Fortran program...

So when do you find time to make coffee and buy doughnuts?
What excuse can you use to compete with 'oh, I tripped over
and dropped all the cards' when you're projects late?
What compares with masquerading as someone really important,
just so you can get into the 029 punch room ..?

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Type safe affectation ?
  2003-06-14 13:35     ` Xavier Leroy
@ 2003-06-15 18:53       ` brogoff
  2003-06-15 19:49         ` Brian Hurt
  0 siblings, 1 reply; 17+ messages in thread
From: brogoff @ 2003-06-15 18:53 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Christophe Raffalli, caml-list

On Sat, 14 Jun 2003, Xavier Leroy wrote:
> > Why not allow a typesafe way to mute immutable data (sum, products, 
> > immutable records and so on) ?
> 
> Because that would make this data mutable :-) 

Agreed. However, as has been discussed on the list before, the specific 
example of list mutations where the mutation corresponds to filling 
"holes" or "one hole contexts" and allow more tail recursion optimizations 
is an important one that quite a few people would like to see addressed. 

> Even if you're not interested in proofs of programs, I'm ready to bet
> that there aren't 1 programmer in 100 who can write *fully correct*
> code that manipulate mutable balanced trees, for instance.  (I think I
> can't.)

It's interesting that the theory for two (or n) hole contexts isn't worked out 
yet, or wasn't at the time Minamide wrote his paper.  

> Also, keep in mind that the compiler can optimize based on
> immutability assumptions.  For instance, the OCaml compiler performs
> propagation of structured constants and code motion for accesses
> inside data structures that are immutable.  You might very well break
> something by using the set_cdr function above.

Is it the case that those functions which use it in a disciplined way, 
basically rewriting those functions transformable from "ML + Minamide style 
holes" to "ML + setcdr" are going to break something? I believe ExtLib is 
doing this, and probably a few others wrote such libaries after the previous 
round on this topic.

It might make sense to provide immutable views of records which have privately 
mutable fields. This was discussed on the list too (by you and Pierre Weis I 
believe), so I guess the right people are thinking about it. 

> > By the way, another step would be to infer for each function if it mutes 
> > its arguments instead of annoting record with "mutable" indication.
> > 
> > This is better because the same data may be considered as mutable by 
> > some functions (for instance when you construct the data) but then be 
> > used only by immutable functions and this way the type inference can 
> > help you insure that you do not mute the data anymore.
> > 
> > But this is research topic ?
> 
> This sounds reminiscent of effect systems.

On a related note, I'd like a way to make an immutable representation of the 
built in array by not exporting the mutators, *and then* making the type 
parameter covariant, say a signature like the following 

module type FUNCTIONAL_ARRAY = 
  sig
    type (+'a) t
    val init : int -> (int -> 'a) -> 'a t
    val get : 'a t -> int -> 'a 
    val length : 'a t -> int 
    val map : ('a -> 'b) -> 'a t -> 'b t
  end

The only safe ways to do this using array are to hide array in a class 
definition or a function closure. It seems that I should be able to 
just write 

	type 'a t = 'a array

and have the type system figure out if it's OK. Arrays are efficient, and there 
are quite a few cases in my coding where they are not mutable.

-- Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Type safe affectation ?
  2003-06-15 18:53       ` brogoff
@ 2003-06-15 19:49         ` Brian Hurt
  2003-06-16  1:38           ` Jacques Garrigue
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Hurt @ 2003-06-15 19:49 UTC (permalink / raw)
  To: brogoff; +Cc: Xavier Leroy, Christophe Raffalli, caml-list

On Sun, 15 Jun 2003 brogoff@speakeasy.net wrote:

> On Sat, 14 Jun 2003, Xavier Leroy wrote:
> > > Why not allow a typesafe way to mute immutable data (sum, products, 
> > > immutable records and so on) ?
> > 
> > Because that would make this data mutable :-) 
> 
> Agreed. However, as has been discussed on the list before, the specific 
> example of list mutations where the mutation corresponds to filling 
> "holes" or "one hole contexts" and allow more tail recursion optimizations 
> is an important one that quite a few people would like to see addressed. 

I, personally, would perfer to see the optimizer do this.  Any code that 
*correctly* uses holes is just as correct specified in a (non-tail-) 
recursive manner.  List.append should be written like:

let rec append a b =
    match a with
        [] -> b
        | h :: t -> h :: (append t b)

Everything else is just optimization (and not overflowing the stack on
large lists).  Which, if it *can* (reasonably) be done by the compiler,
*should* be done by the compilier.

> > Even if you're not interested in proofs of programs, I'm ready to bet
> > that there aren't 1 programmer in 100 who can write *fully correct*
> > code that manipulate mutable balanced trees, for instance.  (I think I
> > can't.)
> 
> It's interesting that the theory for two (or n) hole contexts isn't
> worked out yet, or wasn't at the time Minamide wrote his paper.

Hmm.  If the same result simply gets used in two or more places, then 
extending the optimization is obvious.  If two different functions are 
called to fill the two different holes, I don't see how you can run both 
functions "at once", without either rampant inlining or coroutines.

> 
> > Also, keep in mind that the compiler can optimize based on
> > immutability assumptions.  For instance, the OCaml compiler performs
> > propagation of structured constants and code motion for accesses
> > inside data structures that are immutable.  You might very well break
> > something by using the set_cdr function above.
> 
> Is it the case that those functions which use it in a disciplined way, 
> basically rewriting those functions transformable from "ML + Minamide style 
> holes" to "ML + setcdr" are going to break something? I believe ExtLib is 
> doing this, and probably a few others wrote such libaries after the previous 
> round on this topic.

If there is one thing I'm regretting with ExtLib, is that we've seemed to 
make using Obj.magic "cool", or at least "common and usefull".  Were holes 
added to the standard Ocaml compiler, I'd be re-rewritting ExtList to take 
those optimizations *out*.  There's other stuff in ExtLib which is usefull 
even without the new List code- Enum, for example.  So the project will 
survive regardless.

Or maybe I'm overstating ExtLib's effect- and there has always been a lot 
of Obj.magic going around.

> On a related note, I'd like a way to make an immutable representation of the 
> built in array by not exporting the mutators, *and then* making the type 
> parameter covariant, say a signature like the following 
> 
> module type FUNCTIONAL_ARRAY = 
>   sig
>     type (+'a) t
>     val init : int -> (int -> 'a) -> 'a t
>     val get : 'a t -> int -> 'a 
>     val length : 'a t -> int 
>     val map : ('a -> 'b) -> 'a t -> 'b t
>   end
> 
> The only safe ways to do this using array are to hide array in a class 
> definition or a function closure. It seems that I should be able to 
> just write 
> 
> 	type 'a t = 'a array
> 
> and have the type system figure out if it's OK. Arrays are efficient,
> and there are quite a few cases in my coding where they are not
> mutable.

All you have to do in this code is just not allow a mutation to occur in 
your code.  As I understand things, to everyone outside of the module 'a t 
is an abstract type- the only way to mutate it is to pass it to a function 
in the module that mutates it.

The only other problem is that you currently have to give up a.(i) 
addressing style.  I wish operator defining would be extended to allow 
array references.

Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Type safe affectation ?
  2003-06-15 19:49         ` Brian Hurt
@ 2003-06-16  1:38           ` Jacques Garrigue
  0 siblings, 0 replies; 17+ messages in thread
From: Jacques Garrigue @ 2003-06-16  1:38 UTC (permalink / raw)
  To: brian.hurt; +Cc: caml-list

From: Brian Hurt <brian.hurt@qlogic.com>
> On Sun, 15 Jun 2003 brogoff@speakeasy.net wrote:
> > On Sat, 14 Jun 2003, Xavier Leroy wrote:
> > > Also, keep in mind that the compiler can optimize based on
> > > immutability assumptions.  For instance, the OCaml compiler performs
> > > propagation of structured constants and code motion for accesses
> > > inside data structures that are immutable.  You might very well break
> > > something by using the set_cdr function above.
> > 
> > Is it the case that those functions which use it in a disciplined way, 
> > basically rewriting those functions transformable from "ML +
> > Minamide style holes" to "ML + setcdr" are going to break
> > something? I believe ExtLib is doing this, and probably a few
> > others wrote such libaries after the previous round on this topic.

There are safe ways to do that: when you build such a structure with a
hole, first define a mutable type sharing the same representation with
your intended immutable type, and cast to it after you're fisnished.
Since ocaml doesn't do any fancy representation optimizations, this
will work, and if it changes all C libraries are broken anyway.

For instance, you can write for a list:

type 'a mut_list = {hd: 'a; mutable tl: 'a list}
external inj : 'a mut_list -> 'a list = "%identity"

Limitation: for sum types, this only works for the first parameterized
constructor (the tag has to be 0). For records there is no problem.

Also, you must be very careful about not letting a mut_list value leak
out (respect the linearity), otherwise inferred information such as
variance will be incorrect, and you can break the type system.

By the way, the above limitation is yet one more reason I think it
would be really cool to allow records inside sum-type definitions.

> If there is one thing I'm regretting with ExtLib, is that we've seemed to 
> make using Obj.magic "cool", or at least "common and usefull".  Were holes 
> added to the standard Ocaml compiler, I'd be re-rewritting ExtList to take 
> those optimizations *out*.  There's other stuff in ExtLib which is usefull 
> even without the new List code- Enum, for example.  So the project will 
> survive regardless.

The problem is that holes at the type level are a difficult feature to
offer: they require linear types in the compiler. As an optimization,
it is a rather high-level one, and maybe not so easy to know when it
will apply.
 
> Or maybe I'm overstating ExtLib's effect- and there has always been a lot 
> of Obj.magic going around.

I certainly hope this is not the case. Obj.magic is EVIL.
It is only acceptable in this case because we want to optimize some
well-known structure, and we can check the correctness for sure.
At the user level, it is certainly preferable to use a mutable
structure to start with.

> > On a related note, I'd like a way to make an immutable
> > representation of the built in array by not exporting the
> > mutators, *and then* making the type parameter covariant, say a
> > signature like the following  
> > 
> > module type FUNCTIONAL_ARRAY = 
> >   sig
> >     type (+'a) t
> >     val init : int -> (int -> 'a) -> 'a t
> >     val get : 'a t -> int -> 'a 
> >     val length : 'a t -> int 
> >     val map : ('a -> 'b) -> 'a t -> 'b t
> >   end
> > 
> > The only safe ways to do this using array are to hide array in a class 
> > definition or a function closure. It seems that I should be able to 
> > just write 
> > 
> > 	type 'a t = 'a array
> > 
> > and have the type system figure out if it's OK. Arrays are efficient,
> > and there are quite a few cases in my coding where they are not
> > mutable.

This one is just a typing problem. Nothing magic here, and it should
actually be possible to infer correctly the variance of an abstract
type, independently of its representation. But this is high-level and
potentially dangerous stuff, so don't hold your breath for this.

> All you have to do in this code is just not allow a mutation to occur in 
> your code.  As I understand things, to everyone outside of the module 'a t 
> is an abstract type- the only way to mutate it is to pass it to a function 
> in the module that mutates it.

Here the problem is variance. Currently ocaml does not allow the
above, because 'a array is not covariant.

Cheers,

        Jacques

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] FP's and HyperThreading Processors
  2003-06-13 19:07 ` Xavier Leroy
  2003-06-13 21:33   ` Jim Farrand
@ 2003-07-02 10:26   ` David Monniaux
  1 sibling, 0 replies; 17+ messages in thread
From: David Monniaux @ 2003-07-02 10:26 UTC (permalink / raw)
  To: Liste CAML

For all that's worth: we run a program doing lots of symbolic computations
and allocating lots of small memory blocks (< a few k) on a variety of
machines. The P4 performs roughly proportionally to its frequency compared
to the P3; the Athlon is slightly faster than the P3.

We ran the program under Linux + OProfile.

Roughly 12% of the time is spent within the GC, representing roughly 30%
of the cache faults. Surprisingly, it seems that accesses to unboxed
floats in some comparison procedures take up 10% of the time and 4% of the
page faults.

"Profile - don't speculate.". You should probably get OProfile or any
other kind of profiler that enables you to count events such as cache
faults.

David Monniaux            http://www.di.ens.fr/~monniaux
Laboratoire d'informatique de l'École Normale Supérieure,
Paris, France



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-07-02 10:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-13  6:44 [Caml-list] FP's and HyperThreading Processors David McClain
2003-06-13  8:06 ` John Max Skaller
2003-06-13 10:03   ` [Caml-list] Type safe affectation ? Christophe Raffalli
2003-06-14 13:35     ` Xavier Leroy
2003-06-15 18:53       ` brogoff
2003-06-15 19:49         ` Brian Hurt
2003-06-16  1:38           ` Jacques Garrigue
2003-06-13 18:38 ` [Caml-list] FP's and HyperThreading Processors Kip Macy
2003-06-13 21:23   ` David McClain
2003-06-13 21:39     ` Kip Macy
2003-06-13 21:56       ` David McClain
2003-06-14 22:08         ` John Max Skaller
2003-06-14  6:11     ` Ville-Pertti Keinonen
2003-06-13 19:07 ` Xavier Leroy
2003-06-13 21:33   ` Jim Farrand
2003-06-13 21:39     ` David McClain
2003-07-02 10:26   ` David Monniaux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).