caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Ocaml and cryptography
@ 2011-03-24 12:17 Jehan Pagès
  2011-03-24 13:26 ` Gerd Stolpmann
  2011-03-24 16:51 ` Francis Dupont
  0 siblings, 2 replies; 8+ messages in thread
From: Jehan Pagès @ 2011-03-24 12:17 UTC (permalink / raw)
  To: caml-list

Hi,

I was just having a few thoughts/questions.

I have developed, in the context of another project, a Sha1
implementation (also a HMAC implementation and above this all a SASL
implementation with SCRAM-SHA1 mechanism support). That's not a
binding to any existent library and is fully native Ocaml. No C in
particular.

That's all working fine and is fast enough for being comfortable. But
that's definitely not as fast as a C implementation.
I made a small benchmark with OpenSSL's SHA1 functions and mine. Both
tests are loops running 10.000 Sha1 of the same string, then exiting.
Basically C goes around 6-10 times faster.

Note: my machine is a small notebook which is not powerful enough to
play most video games, nor even watch videos when they have a little
too high quality (and I am not talking of HD)! So on common desktop
machines, the test should go much faster for both version, but I guess
keep the same speed proportions. You might want to raise the loop to
100.000 though in order to see the difference.

Here are examples of such a run checked with time (there is some
variance between 2 runs, but the Ocaml version usually computes the
10.000 hashes in around 0.4 seconds while the C version computes them
in about 0.06 seconds):

~/SHA1$ time ./obench

real	0m0.416s
user	0m0.400s
sys	0m0.000s
~/SHA1$ time ./cbench

real	0m0.069s
user	0m0.060s
sys	0m0.004s

I checked the standard lib code and the MD5 code has been written in C
over there. Apparently the cryptokit library as well is writing core
cryptography in C.

I still think my code is pretty clean. I have made as much
optimization as I saw possible and though maybe other may be able to
still optimize it, I wonder up to which point now.

I am doing all computations over Int32 because the Sha1 algorithm
works over 4 octets words. At first, I was doing the "naive" approach
and was working on strings directly (going back and forth from a
character code to the character) but that was like extremely slow
(really). Still it allowed me to get the first working implementation.

Implementing with int, which is supposed to be much faster than int32,
would be nice (even though integers can be 64 bits on a 64 bits
platform, I can just mask the 4 higher octets in such case), but int
in OCaml is actually 31/63 bits because of the flag bit so I cannot
represent SHA1 words with int.

For those interested, you can download the benchmark (which includes
the code of Sha1) here:
http://download.tuxfamily.org/ocamlxmpp/sha1_benchmark.tar.gz (just
3ko).
Just run make in the SHA1 directory which will be uncompressed, it
will create cbench and obench (a SSL library is needed for the C
benchmark, likely OpenSSL or other with the same API. Nothing external
is needed for the Ocaml one).

Maybe anyone has suggestions to improve?
Or Ocaml simply cannot compete with C here? (I don't mean to do
better, but getting closer would be already nice)

Thanks.

Jehan

P.S.: don't tell me there are already libraries out there which binds
to C libraries, so why did I ever bother write this Module. I did
this:
1/ for fun of writing a full SHA1 implementation in Ocaml.
2/ because I was wondering how fast it could be.
3/ No reason.
4/ Because and that's all. ;-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-24 12:17 [Caml-list] Ocaml and cryptography Jehan Pagès
@ 2011-03-24 13:26 ` Gerd Stolpmann
  2011-03-26  6:44   ` Jehan Pagès
  2011-03-24 16:51 ` Francis Dupont
  1 sibling, 1 reply; 8+ messages in thread
From: Gerd Stolpmann @ 2011-03-24 13:26 UTC (permalink / raw)
  To: Jehan Pagès; +Cc: caml-list

Am Donnerstag, den 24.03.2011, 21:17 +0900 schrieb Jehan Pagès:
> Hi,
> 
> I was just having a few thoughts/questions.
> 
> I have developed, in the context of another project, a Sha1
> implementation (also a HMAC implementation and above this all a SASL
> implementation with SCRAM-SHA1 mechanism support). That's not a
> binding to any existent library and is fully native Ocaml. No C in
> particular.

Funnily I did a SCRAM implementation for GSS-API a few weeks ago:
https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netmech-scram/

It does not implement the crypto primitives in Ocaml, though, but simply
uses XL's cryptokit package - which is a quite complete C
implementation.

> That's all working fine and is fast enough for being comfortable. But
> that's definitely not as fast as a C implementation.
> I made a small benchmark with OpenSSL's SHA1 functions and mine. Both
> tests are loops running 10.000 Sha1 of the same string, then exiting.
> Basically C goes around 6-10 times faster.

This is something I observed already earlier for my cryptgps package
(which implements Blowfish and DES -
https://godirepo.camlcity.org/svn/lib-cryptgps/trunk/). Ocaml is not a
good compiler for this kind of code. I tried it both with int32 and with
normal ints (i.e. a 32 bit word is represented as two ints, where each
int gets 16 bits). Both approaches achieve about the same speed (on 32
bit platforms), and are a small factor slower than C (I think it was
around 4-5 times slower after endless optimizations).

One of the problems seems to be that you cannot enforce to keep the
int32 values in registers all the time (at least in the inner loops). So
there are constantly boxing and unboxing operations. Even worse, this
also causes that memory is allocated all the time, and is of course also
cleaned up all the time.

I haven't checked on a 64 bit platform (at that time - ~10 years ago - I
did not have access to one). 64 bit platforms have more registers, and
there is no need to use int32.

My recommendation would be to avoid Ocaml for this type of code. The
compiler does not recognize that there is a loop it could completely
translate in unboxed mode. As far as I understand, a lot of work would
be required to make the compiler better here, and it would only affect a
few types of code (cryptography, pixel graphics, inner loops of
numerical algos).

Gerd


> Note: my machine is a small notebook which is not powerful enough to
> play most video games, nor even watch videos when they have a little
> too high quality (and I am not talking of HD)! So on common desktop
> machines, the test should go much faster for both version, but I guess
> keep the same speed proportions. You might want to raise the loop to
> 100.000 though in order to see the difference.
> 
> Here are examples of such a run checked with time (there is some
> variance between 2 runs, but the Ocaml version usually computes the
> 10.000 hashes in around 0.4 seconds while the C version computes them
> in about 0.06 seconds):
> 
> ~/SHA1$ time ./obench
> 
> real	0m0.416s
> user	0m0.400s
> sys	0m0.000s
> ~/SHA1$ time ./cbench
> 
> real	0m0.069s
> user	0m0.060s
> sys	0m0.004s
> 
> I checked the standard lib code and the MD5 code has been written in C
> over there. Apparently the cryptokit library as well is writing core
> cryptography in C.
> 
> I still think my code is pretty clean. I have made as much
> optimization as I saw possible and though maybe other may be able to
> still optimize it, I wonder up to which point now.
> 
> I am doing all computations over Int32 because the Sha1 algorithm
> works over 4 octets words. At first, I was doing the "naive" approach
> and was working on strings directly (going back and forth from a
> character code to the character) but that was like extremely slow
> (really). Still it allowed me to get the first working implementation.
> 
> Implementing with int, which is supposed to be much faster than int32,
> would be nice (even though integers can be 64 bits on a 64 bits
> platform, I can just mask the 4 higher octets in such case), but int
> in OCaml is actually 31/63 bits because of the flag bit so I cannot
> represent SHA1 words with int.
> 
> For those interested, you can download the benchmark (which includes
> the code of Sha1) here:
> http://download.tuxfamily.org/ocamlxmpp/sha1_benchmark.tar.gz (just
> 3ko).
> Just run make in the SHA1 directory which will be uncompressed, it
> will create cbench and obench (a SSL library is needed for the C
> benchmark, likely OpenSSL or other with the same API. Nothing external
> is needed for the Ocaml one).
> 
> Maybe anyone has suggestions to improve?
> Or Ocaml simply cannot compete with C here? (I don't mean to do
> better, but getting closer would be already nice)
> 
> Thanks.
> 
> Jehan
> 
> P.S.: don't tell me there are already libraries out there which binds
> to C libraries, so why did I ever bother write this Module. I did
> this:
> 1/ for fun of writing a full SHA1 implementation in Ocaml.
> 2/ because I was wondering how fast it could be.
> 3/ No reason.
> 4/ Because and that's all. ;-)
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-24 12:17 [Caml-list] Ocaml and cryptography Jehan Pagès
  2011-03-24 13:26 ` Gerd Stolpmann
@ 2011-03-24 16:51 ` Francis Dupont
  2011-03-26  6:47   ` Jehan Pagès
  1 sibling, 1 reply; 8+ messages in thread
From: Francis Dupont @ 2011-03-24 16:51 UTC (permalink / raw)
  To: Jehan Pagès; +Cc: caml-list

This is not the first study about crypto implementation speeds.
Usually the winner for heavily used algorithms is OpenSSL,
BTW not because it is well written but simply because it is
optmized in assembly for common platforms (including SSE* & co
on x86).

Regards

Francis.Dupont@fdupont.fr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-24 13:26 ` Gerd Stolpmann
@ 2011-03-26  6:44   ` Jehan Pagès
  2011-03-28 19:13     ` Gerd Stolpmann
  0 siblings, 1 reply; 8+ messages in thread
From: Jehan Pagès @ 2011-03-26  6:44 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Hi,

2011/3/24 Gerd Stolpmann <info@gerd-stolpmann.de>:
> Am Donnerstag, den 24.03.2011, 21:17 +0900 schrieb Jehan Pagès:
>> Hi,
>>
>> I was just having a few thoughts/questions.
>>
>> I have developed, in the context of another project, a Sha1
>> implementation (also a HMAC implementation and above this all a SASL
>> implementation with SCRAM-SHA1 mechanism support). That's not a
>> binding to any existent library and is fully native Ocaml. No C in
>> particular.
>
> Funnily I did a SCRAM implementation for GSS-API a few weeks ago:
> https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netmech-scram/

Nice. What will be the use? (if there is any upper level project
unless adding a feature for the library was the only goal right now?)

> It does not implement the crypto primitives in Ocaml, though, but simply
> uses XL's cryptokit package - which is a quite complete C
> implementation.
>
>> That's all working fine and is fast enough for being comfortable. But
>> that's definitely not as fast as a C implementation.
>> I made a small benchmark with OpenSSL's SHA1 functions and mine. Both
>> tests are loops running 10.000 Sha1 of the same string, then exiting.
>> Basically C goes around 6-10 times faster.
>
> This is something I observed already earlier for my cryptgps package
> (which implements Blowfish and DES -
> https://godirepo.camlcity.org/svn/lib-cryptgps/trunk/). Ocaml is not a
> good compiler for this kind of code. I tried it both with int32 and with
> normal ints (i.e. a 32 bit word is represented as two ints, where each
> int gets 16 bits). Both approaches achieve about the same speed (on 32
> bit platforms), and are a small factor slower than C (I think it was
> around 4-5 times slower after endless optimizations).

Indeed. About these endless optimization... I had another response
(not made it to the list) with a link to uuidm code (also
implementation a native Ocaml SHA1). I saw it uses Char.unsafe_chr
instead of Char.chr. I didn't know this function, checked the source
of ocaml, saw it is indeed in the interface but hidden behind (**/**)
(so it did not appear in the doc).

In my benchmark, I think it improves a little, but barely (enough so
that I am not fully sure if the improvement I see is the randomness of
consecutive benchmark tests). Still I keep it (as you say, endless
optimizations: that's mostly when you don't know what else to do) as
my code already makes sure that the code I pass to make a char is
valid. But I wonder if this hidden function is meant to disappear some
day... Is there some official status about this function?

> One of the problems seems to be that you cannot enforce to keep the
> int32 values in registers all the time (at least in the inner loops). So
> there are constantly boxing and unboxing operations. Even worse, this
> also causes that memory is allocated all the time, and is of course also
> cleaned up all the time.

I see. So that's a limitation of the Ocaml compiler or is it anyway
very difficult to have control on this kind of thing?

> I haven't checked on a 64 bit platform (at that time - ~10 years ago - I
> did not have access to one). 64 bit platforms have more registers, and
> there is no need to use int32.

Yes I was thinking for quite some time (I mean, some days as this code
is a few days old anyway) to try a int implementation when the
platform is 64 bits (masking over 4 bytes). I just decided to do so
yesterday. And it divides the benchmark time by 2.

On the small 64 bits machine I have access (slow but slightly faster
than my netbook), my Int32 implementation was running my benchmark in
0.25 seconds, and the int implementation of the same code otherwise in
0.12 seconds! So now I make some conditional implementation with
macros so that I use int on 64 bits platform and the generic Int32 on
32 bits (or if I can't detect for sure I am on 64 bits).

But the C implementation must have made some crazy optimization in
assembly for 64 bits platform because they run in around 0.008
seconds! So they are like 15 times faster now from my Ocaml
implementation when it uses int!

Oh and to come back to macros (using pa_macro with camlp4o), why is
int validity checked by camlp4?!
I had some tests like this:

IFNDEF ARCH_64 THEN
        0x8F1BBCDCl
ELSE
        0x8F1BBCDC
ENDIF

Then when on a 32 bits machine, this stupid camlp4 ends in error
because 0x8F1BBCDC is over max_int! But that's why I do the IFDEF
test! Why does it ever bother checking this? I thought camlp4 was only
about syntax, not about code validation. If camlp4 was to only do its
job and pass the code to the compiler, this one would tell me if I
really made an int error (which was not the case).
Anyway that's not nice, I had to pass these int values as additional
macros instead which is not pretty in my opinion.

> My recommendation would be to avoid Ocaml for this type of code. The
> compiler does not recognize that there is a loop it could completely
> translate in unboxed mode. As far as I understand, a lot of work would
> be required to make the compiler better here, and it would only affect a
> few types of code (cryptography, pixel graphics, inner loops of
> numerical algos).

I see. Still I am pretty happy of my code right now. It is not as good
as OpenSSL, but that's still pretty good when you think of it. I am
still doing 10.000 computations now in around 0.1 second on very weak
machines (even my notebook which does not read most videos well, I
have now optimized down to around 0.35 seconds).

I think I'll stop here for my sha1 implementation (unless someone
points me to some really neat improvement I did not see). I am happy
with it. :-)

Thanks all!

Jehan

> Gerd
>
>
>> Note: my machine is a small notebook which is not powerful enough to
>> play most video games, nor even watch videos when they have a little
>> too high quality (and I am not talking of HD)! So on common desktop
>> machines, the test should go much faster for both version, but I guess
>> keep the same speed proportions. You might want to raise the loop to
>> 100.000 though in order to see the difference.
>>
>> Here are examples of such a run checked with time (there is some
>> variance between 2 runs, but the Ocaml version usually computes the
>> 10.000 hashes in around 0.4 seconds while the C version computes them
>> in about 0.06 seconds):
>>
>> ~/SHA1$ time ./obench
>>
>> real  0m0.416s
>> user  0m0.400s
>> sys   0m0.000s
>> ~/SHA1$ time ./cbench
>>
>> real  0m0.069s
>> user  0m0.060s
>> sys   0m0.004s
>>
>> I checked the standard lib code and the MD5 code has been written in C
>> over there. Apparently the cryptokit library as well is writing core
>> cryptography in C.
>>
>> I still think my code is pretty clean. I have made as much
>> optimization as I saw possible and though maybe other may be able to
>> still optimize it, I wonder up to which point now.
>>
>> I am doing all computations over Int32 because the Sha1 algorithm
>> works over 4 octets words. At first, I was doing the "naive" approach
>> and was working on strings directly (going back and forth from a
>> character code to the character) but that was like extremely slow
>> (really). Still it allowed me to get the first working implementation.
>>
>> Implementing with int, which is supposed to be much faster than int32,
>> would be nice (even though integers can be 64 bits on a 64 bits
>> platform, I can just mask the 4 higher octets in such case), but int
>> in OCaml is actually 31/63 bits because of the flag bit so I cannot
>> represent SHA1 words with int.
>>
>> For those interested, you can download the benchmark (which includes
>> the code of Sha1) here:
>> http://download.tuxfamily.org/ocamlxmpp/sha1_benchmark.tar.gz (just
>> 3ko).
>> Just run make in the SHA1 directory which will be uncompressed, it
>> will create cbench and obench (a SSL library is needed for the C
>> benchmark, likely OpenSSL or other with the same API. Nothing external
>> is needed for the Ocaml one).
>>
>> Maybe anyone has suggestions to improve?
>> Or Ocaml simply cannot compete with C here? (I don't mean to do
>> better, but getting closer would be already nice)
>>
>> Thanks.
>>
>> Jehan
>>
>> P.S.: don't tell me there are already libraries out there which binds
>> to C libraries, so why did I ever bother write this Module. I did
>> this:
>> 1/ for fun of writing a full SHA1 implementation in Ocaml.
>> 2/ because I was wondering how fast it could be.
>> 3/ No reason.
>> 4/ Because and that's all. ;-)
>>
>
>
> --
> ------------------------------------------------------------
> Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
> gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
> Phone: +49-6151-153855                  Fax: +49-6151-997714
> ------------------------------------------------------------
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-24 16:51 ` Francis Dupont
@ 2011-03-26  6:47   ` Jehan Pagès
  2011-03-26  9:06     ` Anil Madhavapeddy
  0 siblings, 1 reply; 8+ messages in thread
From: Jehan Pagès @ 2011-03-26  6:47 UTC (permalink / raw)
  To: Francis Dupont; +Cc: caml-list

Hi,

I see. Indeed I checked OpenSSL, it is using massively assembly code
for SHA1 and many other (all?) crypto code...

And my tests on a 64 bits machine and the huge improvement for OpenSSL
there makes me think that they must be making very good use of the 64
bits register for saving the 32 bits words of SHA1 algorithm (they
probably make some calculation on 2 words at once thanks to this).

Jehan

2011/3/25 Francis Dupont <Francis.Dupont@fdupont.fr>:
> This is not the first study about crypto implementation speeds.
> Usually the winner for heavily used algorithms is OpenSSL,
> BTW not because it is well written but simply because it is
> optmized in assembly for common platforms (including SSE* & co
> on x86).
>
> Regards
>
> Francis.Dupont@fdupont.fr
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-26  6:47   ` Jehan Pagès
@ 2011-03-26  9:06     ` Anil Madhavapeddy
  2011-03-26 11:53       ` Jehan Pagès
  0 siblings, 1 reply; 8+ messages in thread
From: Anil Madhavapeddy @ 2011-03-26  9:06 UTC (permalink / raw)
  To: Jehan Pagès; +Cc: Francis Dupont, caml-list

OpenSSL may also be using direct hardware instructions to speed up SHA1/AES, rather doing the work itself. Check out the AES-NI in newer Intel chips, or Padlock SHA1 in Via chips.

This is quite handy, since it means you can also use them to speed up your library on modern chips, but fall back to a reasonably performant implementation if it's not available (without ever having to bind to the rather gnarly OpenSSL monster).

Anil

On 26 Mar 2011, at 06:47, Jehan Pagès <jehan.marmottard@gmail.com> wrote:

> Hi,
> 
> I see. Indeed I checked OpenSSL, it is using massively assembly code
> for SHA1 and many other (all?) crypto code...
> 
> And my tests on a 64 bits machine and the huge improvement for OpenSSL
> there makes me think that they must be making very good use of the 64
> bits register for saving the 32 bits words of SHA1 algorithm (they
> probably make some calculation on 2 words at once thanks to this).
> 
> Jehan
> 
> 2011/3/25 Francis Dupont <Francis.Dupont@fdupont.fr>:
>> This is not the first study about crypto implementation speeds.
>> Usually the winner for heavily used algorithms is OpenSSL,
>> BTW not because it is well written but simply because it is
>> optmized in assembly for common platforms (including SSE* & co
>> on x86).
>> 
>> Regards
>> 
>> Francis.Dupont@fdupont.fr
>> 
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-26  9:06     ` Anil Madhavapeddy
@ 2011-03-26 11:53       ` Jehan Pagès
  0 siblings, 0 replies; 8+ messages in thread
From: Jehan Pagès @ 2011-03-26 11:53 UTC (permalink / raw)
  To: Anil Madhavapeddy; +Cc: Francis Dupont, caml-list

Hi,

2011/3/26 Anil Madhavapeddy <anil@recoil.org>:
> OpenSSL may also be using direct hardware instructions to speed up SHA1/AES, rather doing the work itself. Check out the AES-NI in newer Intel chips, or Padlock SHA1 in Via chips.
>
> This is quite handy, since it means you can also use them to speed up your library on modern chips, but fall back to a reasonably performant implementation if it's not available (without ever having to bind to the rather gnarly OpenSSL monster).
>

That's very interesting. If I had such a material, I would improve my
library to support any of them, just for the fun of doing so and
testing the difference.
Plus that must be interesting to see how from Ocaml I can access this
kind of material.

Unfortunately I don't (have).

Jehan

> Anil
>
> On 26 Mar 2011, at 06:47, Jehan Pagès <jehan.marmottard@gmail.com> wrote:
>
>> Hi,
>>
>> I see. Indeed I checked OpenSSL, it is using massively assembly code
>> for SHA1 and many other (all?) crypto code...
>>
>> And my tests on a 64 bits machine and the huge improvement for OpenSSL
>> there makes me think that they must be making very good use of the 64
>> bits register for saving the 32 bits words of SHA1 algorithm (they
>> probably make some calculation on 2 words at once thanks to this).
>>
>> Jehan
>>
>> 2011/3/25 Francis Dupont <Francis.Dupont@fdupont.fr>:
>>> This is not the first study about crypto implementation speeds.
>>> Usually the winner for heavily used algorithms is OpenSSL,
>>> BTW not because it is well written but simply because it is
>>> optmized in assembly for common platforms (including SSE* & co
>>> on x86).
>>>
>>> Regards
>>>
>>> Francis.Dupont@fdupont.fr
>>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa-roc.inria.fr/wws/info/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocaml and cryptography
  2011-03-26  6:44   ` Jehan Pagès
@ 2011-03-28 19:13     ` Gerd Stolpmann
  0 siblings, 0 replies; 8+ messages in thread
From: Gerd Stolpmann @ 2011-03-28 19:13 UTC (permalink / raw)
  To: Jehan Pagès; +Cc: caml-list

Am Samstag, den 26.03.2011, 15:44 +0900 schrieb Jehan Pagès:
> Hi,
> 
> 2011/3/24 Gerd Stolpmann <info@gerd-stolpmann.de>:
> > Am Donnerstag, den 24.03.2011, 21:17 +0900 schrieb Jehan Pagès:
> >> Hi,
> >>
> >> I was just having a few thoughts/questions.
> >>
> >> I have developed, in the context of another project, a Sha1
> >> implementation (also a HMAC implementation and above this all a SASL
> >> implementation with SCRAM-SHA1 mechanism support). That's not a
> >> binding to any existent library and is fully native Ocaml. No C in
> >> particular.
> >
> > Funnily I did a SCRAM implementation for GSS-API a few weeks ago:
> > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netmech-scram/
> 
> Nice. What will be the use? (if there is any upper level project
> unless adding a feature for the library was the only goal right now?)

The motivation is that I currently develop a map/reduce platform (Plasma
- plasma.camlcity.org), and I was looking for a way to secure the
communication channels between the various components. Plasma uses
ONC-RPC, and hence a GSS-API mechanism fits best. I chose SCRAM because
of its simplicity - the alternatives would have been Kerberos, or
SPNEGO. However, these alternatives would have complicated the
deployment of the platform.

For GSS-API, SCRAM also supports encryption and integrity, so it is a
full-featured solution.

> > It does not implement the crypto primitives in Ocaml, though, but simply
> > uses XL's cryptokit package - which is a quite complete C
> > implementation.
> >
> >> That's all working fine and is fast enough for being comfortable. But
> >> that's definitely not as fast as a C implementation.
> >> I made a small benchmark with OpenSSL's SHA1 functions and mine. Both
> >> tests are loops running 10.000 Sha1 of the same string, then exiting.
> >> Basically C goes around 6-10 times faster.
> >
> > This is something I observed already earlier for my cryptgps package
> > (which implements Blowfish and DES -
> > https://godirepo.camlcity.org/svn/lib-cryptgps/trunk/). Ocaml is not a
> > good compiler for this kind of code. I tried it both with int32 and with
> > normal ints (i.e. a 32 bit word is represented as two ints, where each
> > int gets 16 bits). Both approaches achieve about the same speed (on 32
> > bit platforms), and are a small factor slower than C (I think it was
> > around 4-5 times slower after endless optimizations).
> 
> Indeed. About these endless optimization... I had another response
> (not made it to the list) with a link to uuidm code (also
> implementation a native Ocaml SHA1). I saw it uses Char.unsafe_chr
> instead of Char.chr. I didn't know this function, checked the source
> of ocaml, saw it is indeed in the interface but hidden behind (**/**)
> (so it did not appear in the doc).
> 
> In my benchmark, I think it improves a little, but barely (enough so
> that I am not fully sure if the improvement I see is the randomness of
> consecutive benchmark tests). Still I keep it (as you say, endless
> optimizations: that's mostly when you don't know what else to do) as
> my code already makes sure that the code I pass to make a char is
> valid. But I wonder if this hidden function is meant to disappear some
> day... Is there some official status about this function?

I don't think so, but such functions tend not to disappear from the
Ocaml runtime (and I'm watching this for more than 10 years).

> > One of the problems seems to be that you cannot enforce to keep the
> > int32 values in registers all the time (at least in the inner loops). So
> > there are constantly boxing and unboxing operations. Even worse, this
> > also causes that memory is allocated all the time, and is of course also
> > cleaned up all the time.
> 
> I see. So that's a limitation of the Ocaml compiler or is it anyway
> very difficult to have control on this kind of thing?

It's a representation issue. If an int32 value cannot remain in a
register, it needs to be "boxed", i.e. it is stored in a small memory
block that is specially allocated for this purpose. Although Ocaml is
very good at managing short-living memory blocks, there is a measurable
slowdown.

There is also the analysis complexity in the compiler which has to find
out when it is possible to keep a value in a register. The compiler
seems not to be written for optimizing small imperative loops as they
occur in cryptography. I guess this is just a matter of how much effort
you put into this.

> > I haven't checked on a 64 bit platform (at that time - ~10 years ago - I
> > did not have access to one). 64 bit platforms have more registers, and
> > there is no need to use int32.
> 
> Yes I was thinking for quite some time (I mean, some days as this code
> is a few days old anyway) to try a int implementation when the
> platform is 64 bits (masking over 4 bytes). I just decided to do so
> yesterday. And it divides the benchmark time by 2.
> 
> On the small 64 bits machine I have access (slow but slightly faster
> than my netbook), my Int32 implementation was running my benchmark in
> 0.25 seconds, and the int implementation of the same code otherwise in
> 0.12 seconds!

This is the price tag for these boxing operations.

>  So now I make some conditional implementation with
> macros so that I use int on 64 bits platform and the generic Int32 on
> 32 bits (or if I can't detect for sure I am on 64 bits).
> 
> But the C implementation must have made some crazy optimization in
> assembly for 64 bits platform because they run in around 0.008
> seconds! So they are like 15 times faster now from my Ocaml
> implementation when it uses int!

There are probably some more optimizations you can do. For example, CPUs
have a feature called speculative execution - when a conditional jump is
found, they cannot easily continue filling their execution pipeline,
because they do not know whether the jump is done or not done. So what
they do is that they guess (and they are good at guessing, but not
perfect), and execute one of the outcomes of the condition speculatively
(so that the effects can be undone). Although this is an interesting
optimization in the CPU there is still some price to pay. The point is
now that it is best to avoid such jumps at all, because the normal
execution flow can then continue. If you write the assembly code
directly, and know the runtime characteristics of the CPU well, one can
greatly speedup the code by avoiding conditional jumps (e.g. by using
conditional moves, or by bit shifting tricks). Compilers have a hard
time generating such code.

Gerd

> Oh and to come back to macros (using pa_macro with camlp4o), why is
> int validity checked by camlp4?!
> I had some tests like this:
> 
> IFNDEF ARCH_64 THEN
>         0x8F1BBCDCl
> ELSE
>         0x8F1BBCDC
> ENDIF
> 
> Then when on a 32 bits machine, this stupid camlp4 ends in error
> because 0x8F1BBCDC is over max_int! But that's why I do the IFDEF
> test! Why does it ever bother checking this? I thought camlp4 was only
> about syntax, not about code validation. If camlp4 was to only do its
> job and pass the code to the compiler, this one would tell me if I
> really made an int error (which was not the case).
> Anyway that's not nice, I had to pass these int values as additional
> macros instead which is not pretty in my opinion.
> 
> > My recommendation would be to avoid Ocaml for this type of code. The
> > compiler does not recognize that there is a loop it could completely
> > translate in unboxed mode. As far as I understand, a lot of work would
> > be required to make the compiler better here, and it would only affect a
> > few types of code (cryptography, pixel graphics, inner loops of
> > numerical algos).
> 
> I see. Still I am pretty happy of my code right now. It is not as good
> as OpenSSL, but that's still pretty good when you think of it. I am
> still doing 10.000 computations now in around 0.1 second on very weak
> machines (even my notebook which does not read most videos well, I
> have now optimized down to around 0.35 seconds).
> 
> I think I'll stop here for my sha1 implementation (unless someone
> points me to some really neat improvement I did not see). I am happy
> with it. :-)
> 
> Thanks all!
> 
> Jehan
> 
> > Gerd
> >
> >
> >> Note: my machine is a small notebook which is not powerful enough to
> >> play most video games, nor even watch videos when they have a little
> >> too high quality (and I am not talking of HD)! So on common desktop
> >> machines, the test should go much faster for both version, but I guess
> >> keep the same speed proportions. You might want to raise the loop to
> >> 100.000 though in order to see the difference.
> >>
> >> Here are examples of such a run checked with time (there is some
> >> variance between 2 runs, but the Ocaml version usually computes the
> >> 10.000 hashes in around 0.4 seconds while the C version computes them
> >> in about 0.06 seconds):
> >>
> >> ~/SHA1$ time ./obench
> >>
> >> real  0m0.416s
> >> user  0m0.400s
> >> sys   0m0.000s
> >> ~/SHA1$ time ./cbench
> >>
> >> real  0m0.069s
> >> user  0m0.060s
> >> sys   0m0.004s
> >>
> >> I checked the standard lib code and the MD5 code has been written in C
> >> over there. Apparently the cryptokit library as well is writing core
> >> cryptography in C.
> >>
> >> I still think my code is pretty clean. I have made as much
> >> optimization as I saw possible and though maybe other may be able to
> >> still optimize it, I wonder up to which point now.
> >>
> >> I am doing all computations over Int32 because the Sha1 algorithm
> >> works over 4 octets words. At first, I was doing the "naive" approach
> >> and was working on strings directly (going back and forth from a
> >> character code to the character) but that was like extremely slow
> >> (really). Still it allowed me to get the first working implementation.
> >>
> >> Implementing with int, which is supposed to be much faster than int32,
> >> would be nice (even though integers can be 64 bits on a 64 bits
> >> platform, I can just mask the 4 higher octets in such case), but int
> >> in OCaml is actually 31/63 bits because of the flag bit so I cannot
> >> represent SHA1 words with int.
> >>
> >> For those interested, you can download the benchmark (which includes
> >> the code of Sha1) here:
> >> http://download.tuxfamily.org/ocamlxmpp/sha1_benchmark.tar.gz (just
> >> 3ko).
> >> Just run make in the SHA1 directory which will be uncompressed, it
> >> will create cbench and obench (a SSL library is needed for the C
> >> benchmark, likely OpenSSL or other with the same API. Nothing external
> >> is needed for the Ocaml one).
> >>
> >> Maybe anyone has suggestions to improve?
> >> Or Ocaml simply cannot compete with C here? (I don't mean to do
> >> better, but getting closer would be already nice)
> >>
> >> Thanks.
> >>
> >> Jehan
> >>
> >> P.S.: don't tell me there are already libraries out there which binds
> >> to C libraries, so why did I ever bother write this Module. I did
> >> this:
> >> 1/ for fun of writing a full SHA1 implementation in Ocaml.
> >> 2/ because I was wondering how fast it could be.
> >> 3/ No reason.
> >> 4/ Because and that's all. ;-)
> >>
> >
> >
> > --
> > ------------------------------------------------------------
> > Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
> > gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
> > Phone: +49-6151-153855                  Fax: +49-6151-997714
> > ------------------------------------------------------------
> >
> >
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-03-28 19:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-24 12:17 [Caml-list] Ocaml and cryptography Jehan Pagès
2011-03-24 13:26 ` Gerd Stolpmann
2011-03-26  6:44   ` Jehan Pagès
2011-03-28 19:13     ` Gerd Stolpmann
2011-03-24 16:51 ` Francis Dupont
2011-03-26  6:47   ` Jehan Pagès
2011-03-26  9:06     ` Anil Madhavapeddy
2011-03-26 11:53       ` Jehan Pagès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).