From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 5C796BC37; Thu, 29 Oct 2009 18:05:55 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApcCAJNo6UrZSMDdi2dsb2JhbACBT5l5AQEBCgsKBxEFwm2EPQSBYg X-IronPort-AV: E=Sophos;i="4.44,647,1249250400"; d="scan'208";a="37218974" Received: from fmmailgate01.web.de ([217.72.192.221]) by mail3-smtp-sop.national.inria.fr with ESMTP; 29 Oct 2009 18:05:40 +0100 Received: from smtp08.web.de (fmsmtp08.dlan.cinetic.de [172.20.5.216]) by fmmailgate01.web.de (Postfix) with ESMTP id E87181345DDAE; Thu, 29 Oct 2009 18:05:39 +0100 (CET) Received: from [95.208.117.111] (helo=frosties.localdomain) by smtp08.web.de with asmtp (TLSv1:AES256-SHA:256) (WEB.DE 4.110 #314) id 1N3YRN-0004yj-00; Thu, 29 Oct 2009 18:05:38 +0100 Received: from mrvn by frosties.localdomain with local (Exim 4.69) (envelope-from ) id 1N3YRN-0004hz-I5; Thu, 29 Oct 2009 18:05:37 +0100 From: Goswin von Brederlow To: Xavier Leroy Cc: Goswin von Brederlow , caml-list@inria.fr Subject: Re: [Caml-list] How to read different ints from a Bigarray? References: <87eiond3of.fsf@frosties.localdomain> <4AE87AB9.5020607@inria.fr> Date: Thu, 29 Oct 2009 18:05:37 +0100 In-Reply-To: <4AE87AB9.5020607@inria.fr> (Xavier Leroy's message of "Wed, 28 Oct 2009 18:09:13 +0100") Message-ID: <87pr86b066.fsf@frosties.localdomain> User-Agent: Gnus/5.110006 (No Gnus v0.6) XEmacs/21.4.22 (linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: goswin-v-b@web.de X-Sender: goswin-v-b@web.de X-Provags-ID: V01U2FsdGVkX19VU+mEmuQeBfWn/t8hhSqS2IswNCW+ski1U4Uf riTgLrdRPMtNjEII9aSQza6sxaHUJGSiagnxpXg3qtLOl5LuIr 1x+Nlk8Ho= X-Spam: no; 0.00; bigarray:01 buffer:01 elt:01 bigarray:01 buf:01 buf:01 elt:01 ocaml's:01 typechecker:01 ocaml:01 ocaml:01 endian:01 overkill:01 byte:01 bigarrays:01 Xavier Leroy writes: > Goswin von Brederlow wrote: > >> I'm working on binding s for linux libaio library (asynchron IO) with >> a sharp eye on efficiency. That means no copying must be done on the >> data, which in turn means I can not use string as buffer type. >> >> The best type for this seems to be a (int, int8_unsigned_elt, >> c_layout) Bigarray.Array1.t. So far so good. > > That's a reasonable choice. > >> Now I define helper functions: >> >> let get_uint8 buf off = buf.{off} >> let set_uint8 buf off x = buf.{off} <- x >> >> But I want more: >> >> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? > > Not at all. If you ask OCaml's typechecker to infer the type of > get_uint8, you'll see that it returns a plain OCaml "int" (in the > 0...255 range). Likewise, the "x" parameter to "set_uint8" has type > "int" (of which only the 8 low bits are used). > > Repeat after me: "Obj.magic is not part of the OCaml language". > >> And endian correcting access for larger ints: >> >> get/set_big_uint16 >> get/set_big_int16 >> get/set_little_uint16 >> get/set_little_int16 >> get/set_big_uint24 >> ... >> get/set_little_int56 >> get/set_big_int64 >> get/set_little_int64 > > The "56" functions look like a bit of overkill to me :-) > >> What is the best way there? For uintXX I can get_uint8 each byte and >> shift and add them together. But that feels inefficient as each access >> will range check > > Not necessarily. OCaml 3.11 introduced unchecked accesses to > bigarrays, so you can range-check yourself once, then perform > unchecked accesses. Use with caution... > >> and the shifting generates a lot of code while cpus >> can usualy endian correct an int more elegantly. >> >> Is it worth the overhead of calling a C function to write optimized >> stubs for this? > > The only way to know is to benchmark both approaches :-( My guess is > that for 16-bit accesses, you're better off with a pure Caml solution, > but for 64-bit accesses, a C function could be faster. > > - Xavier Leroy Here are some benchmark results: get an int out of a string: C Ocaml uint8 le 19.496 17.433 int8 le 19.298 17.850 uint16 le 19.427 25.046 int16 le 19.383 27.664 uint16 be 20.502 23.200 int16 be 20.350 27.535 get an int out of a Bigarray.Array1.t: safe unsafe uint8 le 55.194s 54.508s uint64 le 80.51s 81.46s Now to be fair the C code is unsafe as it does no boundary check. I intend to get/set larger structures so I only need to check if all of the structure fits. So most of the time I want unsafe calls and String does not have any. And storing an int64, int32 does not need to check for overflow for every single byte written in char_of_int. The Bigarray unsafe_get is really disapointing. Note that uint64 is so much slower because of allocating the result (my guess). Array1.set runs the same speed for uint8 and uint64. Overall it looks like C calls just aren't that expensive and endian and sign conversions in ocaml plain suck. I can not use an ocaml string as my buffer must be aligned and unmovable (required by the linux kernel). A string manually created outside the GC heap will never be freeed by the GC so that is out of the question too. And Bigarray is plain too slow. So a well defined custom type with access functions from both C and Ocaml seems to be the way to go. As needed one can then also write a stub for get/set of e.g. struct { uint64_t kind : 8; unit64_t inode; uint64_t data; } <-> type key = Meta of int64 * int64 | Inode of inode_t | Block of inode_t * block_t. So much for the idea to get rid of the custom buffer type in libaio. MfG Goswin