Note that if Ocaml compiler would have a C backend, all these problems or architecture port would disappear...
Ocaml would have more than 30 target[1]
In my Opinion, trying to generate assembler is a bad idea because modern CPU require a lot of work to generate good assembler.
Only the GCC and LLVM team are big enough to be able to make a good job.
In the Lisaac project, we were able to compete with C[2]. Lisaac is a compiler for a Smalltalk like language : the if/then/else is unknown to the compiler, it is defined in the true/false object. So it is a proof that a very high level language can reach C performance. Ocaml can do this, because the compiler is able to know a lot of type informations.
The Lisaac compiler use strong flow analysis and, more importantly generate C code. To reach performance, Lisaac tailor C code to help GCC to generate very optimized code.
For instance, GCC is able to produce MMX/SSE/AVX code when you write code like this :
http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab
AutoVectorization is just an example of what you can do with GCC (or LLVM soon certainly) and which would require a lot of work with an own asm generator.
[1]
http://en.wikipedia.org/wiki/GNU_Compiler_Collection#Architectures
[2]
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=lisaac&lang2=gcc