From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 67BA8BBAF for ; Mon, 2 Nov 2009 21:48:11 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcCAE/U7krZSMDqkWdsb2JhbACBUJoGAQEBAQkLCgcTA8EVhDwEgWI X-IronPort-AV: E=Sophos;i="4.44,670,1249250400"; d="scan'208";a="36042120" Received: from fmmailgate03.web.de ([217.72.192.234]) by mail2-smtp-roc.national.inria.fr with ESMTP; 02 Nov 2009 21:48:11 +0100 Received: from smtp06.web.de (fmsmtp06.dlan.cinetic.de [172.20.5.172]) by fmmailgate03.web.de (Postfix) with ESMTP id 8F64112ED118E for ; Mon, 2 Nov 2009 21:48:10 +0100 (CET) Received: from [95.208.117.111] (helo=frosties.localdomain) by smtp06.web.de with asmtp (TLSv1:AES256-SHA:256) (WEB.DE 4.110 #314) id 1N53ow-0001Uh-00 for caml-list@inria.fr; Mon, 02 Nov 2009 21:48:10 +0100 Received: from mrvn by frosties.localdomain with local (Exim 4.69) (envelope-from ) id 1N53ov-0000RA-Ig for caml-list@inria.fr; Mon, 02 Nov 2009 21:48:09 +0100 From: Goswin von Brederlow To: caml-list@inria.fr Subject: Re: [Caml-list] Re: How to read different ints from a Bigarray? References: <87tyxj5rkv.fsf@frosties.localdomain> <527cf6bc0910281548s53a00ec9s99402f4249b2d411@mail.gmail.com> <873a52wmu0.fsf@frosties.localdomain> <20091029122043.GA18905@annexia.org> <87iqdyb028.fsf@frosties.localdomain> <20091030203011.GA30746@annexia.org> <87tyxeqnyf.fsf@frosties.localdomain> <20091101195749.GA15428@annexia.org> <87bpjkyki8.fsf@frosties.localdomain> <20091102163324.GH17061@NANA.localdomain> Date: Mon, 02 Nov 2009 21:48:09 +0100 In-Reply-To: <20091102163324.GH17061@NANA.localdomain> (Mauricio Fernandez's message of "Mon, 2 Nov 2009 17:33:24 +0100") Message-ID: <87bpjkmz5i.fsf@frosties.localdomain> User-Agent: Gnus/5.110006 (No Gnus v0.6) XEmacs/21.4.22 (linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: goswin-v-b@web.de X-Sender: goswin-v-b@web.de X-Provags-ID: V01U2FsdGVkX1/6dLlYbFiFbsVYMOI6GT2UKdj4KpfQDm+cNEq4 c/4Ugegk91kpdK7PniqBQ49ePI/lOcU6xnYYJymkmcDE7v8MkA JaA2L4ILo= X-Spam: no; 0.00; bigarray:01 0100,:01 0100,:01 ocaml:01 bigarray:01 ocaml:01 compiler:01 c--:01 asmcomp:01 cmmgen:01 compiler:01 c--:01 buffer:01 buffer:01 redistribute:01 Mauricio Fernandez writes: > On Mon, Nov 02, 2009 at 05:11:27PM +0100, Goswin von Brederlow wrote: >> Richard Jones writes: >> >> > On Sun, Nov 01, 2009 at 04:11:52PM +0100, Goswin von Brederlow wrote: >> >> But C calls are still 33% slower than direct access in ocaml (if one >> >> doesn't use the polymorphic functions). >> > >> > Are you using noalloc calls? >> > >> > http://camltastic.blogspot.com/2008/08/tip-calling-c-functions-directly-with.html >> >> Yes. And I looked at the bigarray module and couldn't figure out how >> they differ from my own external function. Only difference I see is >> the leading "%" on the external name. What does that do? > > That means that it is using a hardcoded OCaml primitive, whose code can be > generated by the compiler via C--. See asmcomp/cmmgen.ml. > >> > I would love to see inline assembler supported by the compiler. > > It might be possible to hack support for C-- expressions in external > declarations. That'd be a sort of portable assembler. This brings me a lot closer to a fast buffer structure. I know have this code: (* buffer.ml: Buffer module for libaio-ocaml * Copyright (C) 2009 Goswin von Brederlow * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as * published by the Free Software Foundation, either version 3 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see . * Under Debian a copy can be found in /usr/share/common-licenses/LGPL-3. *) open Bigarray type buffer = (int, int8_unsigned_elt, c_layout) Array1.t exception Unaligned let create size = (Array1.create int8_unsigned c_layout size : buffer) let unsafe_get_uint8 (buf : buffer) off = Array1.unsafe_get buf off let unsafe_get_uint16 (buf : buffer) off = let off = off asr 1 in let buf = ((Obj.magic buf) : (int, int16_unsigned_elt, c_layout) Array1.t) in Array1.unsafe_get buf off let unsafe_get_int31 (buf : buffer) off = let off = off asr 2 in let buf = ((Obj.magic buf) : (int32, int32, c_layout) Array1.t) in let x = Array1.unsafe_get buf off in Int32.to_int x let unsafe_get_int63 (buf : buffer) off = let off = off asr 3 in let buf = ((Obj.magic buf) : (int, int, c_layout) Array1.t) in Array1.unsafe_get buf off Looking at the generated code I see that this works nicely for 8 and 16bit: 0000000000404a50 : 404a50: 48 d1 fb sar %rbx 404a53: 48 8b 40 08 mov 0x8(%rax),%rax 404a57: 48 0f b6 04 18 movzbq (%rax,%rbx,1),%rax 404a5c: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax 404a61: c3 retq 0000000000404a90 : 404a90: 48 d1 fb sar %rbx 404a93: 48 83 cb 01 or $0x1,%rbx 404a97: 48 d1 fb sar %rbx 404a9a: 48 8b 40 08 mov 0x8(%rax),%rax 404a9e: 48 0f b7 04 58 movzwq (%rax,%rbx,2),%rax 404aa3: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax 404aa8: c3 retq But for 31/63 bits I get: 0000000000404b90 : 404b90: 48 83 ec 08 sub $0x8,%rsp 404b94: 48 c1 fb 02 sar $0x2,%rbx 404b98: 48 83 cb 01 or $0x1,%rbx 404b9c: 48 89 c7 mov %rax,%rdi 404b9f: 48 89 de mov %rbx,%rsi 404ba2: 48 8b 05 5f bc 21 00 mov 0x21bc5f(%rip),%rax # 620808 <_DYNAMIC+0x7e0> 404ba9: e8 92 2a 01 00 callq 417640 404bae: 48 63 40 08 movslq 0x8(%rax),%rax 404bb2: 48 d1 e0 shl %rax 404bb5: 48 83 c8 01 or $0x1,%rax 404bb9: 48 83 c4 08 add $0x8,%rsp 404bbd: c3 retq 0000000000404ca0 : 404ca0: 48 83 ec 08 sub $0x8,%rsp 404ca4: 48 c1 fb 03 sar $0x3,%rbx 404ca8: 48 83 cb 01 or $0x1,%rbx 404cac: 48 89 c7 mov %rax,%rdi 404caf: 48 89 de mov %rbx,%rsi 404cb2: 48 8b 05 4f bb 21 00 mov 0x21bb4f(%rip),%rax # 620808 <_DYNAMIC+0x7e0> 404cb9: e8 82 29 01 00 callq 417640 404cbe: 48 83 c4 08 add $0x8,%rsp 404cc2: c3 retq At least in the int63 case I would have thought the compiler would emit asm code to read the int instead of a function call. In the 31bit case I would have hoped it would optimize the intermittend int32 away. Is there something I can do better to get_int31? I was hoping for code like this: 0000000000404a90 : 404c90: 48 c1 fb 03 sar $0x3,%rbx 404a94: 48 83 cb 01 or $0x1,%rbx 404a98: 48 d1 fb sar %rbx 404a9b: 48 8b 40 08 mov 0x8(%rax),%rax 404a9f: xx xx xx xx xx movzwq (%rax,%rbx,4),%rax 404aa4: 48 8d 44 00 01 lea 0x1(%rax,%rax,1),%rax 404aa9: c3 retq MfG Goswin