caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Internals details for cmmgen.ml
@ 1999-12-10  8:12 John Prevost
  1999-12-11 18:09 ` Xavier Leroy
  0 siblings, 1 reply; 5+ messages in thread
From: John Prevost @ 1999-12-10  8:12 UTC (permalink / raw)
  To: caml-list

Recently I posted about my work to add support for fast byte access to
regions of memory outside the O'Caml heap.  Since Francois Rouaix
helfully pointed me at fcntl for being sure of the access rights on
file descriptors, I'm continuing to polish things up.

My main focus has been on adding small bits of code to
asmcomp/cmmgen.ml for the new functionality.  Two of the new
primitives (Pregionrefs and Pregionsets) need to check the protection
bits recorded for the region before trying to access it, and somehow
raise an exception if the access check fails.

I'm currently using the following kludge to make bad region access
checks fail:

let region_checkaccess exp = function
  | Reg_Read ->
    Cifthenelse(
      Cop(Cand, [region_prot exp; Cconst_int 1]),
      Cconst_pointer 1,
      Cop(Ccheckbound, [Cconst_int 0; Cconst_int 0]))
  | Reg_Write ->
    Cifthenelse(
      Cop(Cand, [region_prot exp; Cconst_int 2]),
      Cconst_pointer 1,
      Cop(Ccheckbound, [Cconst_int 0; Cconst_int 0]))
(* XXX Bounds check on 0 is a kludge to force exception *)

(Please pardon my code--I may not know enough about C-- to produce the
best possible output, even without trying to finagle this access check.)

Is there a better way for me to cause an exception to be thrown at
this point?  Do I need to fall back to a Cextcall to ask someone to
throw an exception for me?  Or could (and should) I actually use
Craise with arcane knowledge that a certain exception maps to a
certain integer value?

John


P.S.  Kudos to everyone who's involved with the O'Caml compiler.  Now
that I'm getting into the internals a bit, I'm enjoying myself
immensely.  I'm convinced I could never deal with this style of
hackery in the Perl source.  Not only is the code of the various parts
of the compiler fairly easy to follow, but static typing has saved me
from burning myself several times already while modifying cmmgen.ml
and the associated type definitions.  Again, thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Internals details for cmmgen.ml
  1999-12-10  8:12 Internals details for cmmgen.ml John Prevost
@ 1999-12-11 18:09 ` Xavier Leroy
  1999-12-18  0:51   ` John Prevost
  0 siblings, 1 reply; 5+ messages in thread
From: Xavier Leroy @ 1999-12-11 18:09 UTC (permalink / raw)
  To: John Prevost, caml-list

> My main focus has been on adding small bits of code to
> asmcomp/cmmgen.ml for the new functionality.  Two of the new
> primitives (Pregionrefs and Pregionsets) need to check the protection
> bits recorded for the region before trying to access it, and somehow
> raise an exception if the access check fails.
> 
> I'm currently using the following kludge to make bad region access
> checks fail:
> 
> let region_checkaccess exp = function
>   | Reg_Read ->
>     Cifthenelse(
>       Cop(Cand, [region_prot exp; Cconst_int 1]),
>       Cconst_pointer 1,
>       Cop(Ccheckbound, [Cconst_int 0; Cconst_int 0]))
>   | Reg_Write ->
>     Cifthenelse(
>       Cop(Cand, [region_prot exp; Cconst_int 2]),
>       Cconst_pointer 1,
>       Cop(Ccheckbound, [Cconst_int 0; Cconst_int 0]))
> (* XXX Bounds check on 0 is a kludge to force exception *)
> 
> Is there a better way for me to cause an exception to be thrown at
> this point?  Do I need to fall back to a Cextcall to ask someone to
> throw an exception for me?  Or could (and should) I actually use
> Craise with arcane knowledge that a certain exception maps to a
> certain integer value?

Using a "checkbound" is perhaps the simplest solution.  Otherwise,
some system-wide exceptions such as Invalid_argument are assigned
global symbols and you don't need to guess their integer index inside
their defining module: just emit the C-- code corresponding to

        (raise (symbol "Invalid_argument") (string "my message"))

If you're really into high-performance stuff, you could fold the
permission check and the bounds check in one "checkbound" instruction.
Just arrange the "write enable" flag to be (the Caml integer) 0 if
write is allowed, and -1 if it is not.  Then, generate something like

        (checkbound (or index (write_enable_flag region) (size region)))

Although hacking cmmgen.ml is fun, you could get a more portable
implementation by writing it in ML using unsafe string accesses.
Those will happily work on any char *, not necessarily on well-formed
Caml strings.  Something like:

        external mmap : ... -> string
        type t = { data: string; length: int }

        let read_char reg idx =
          if idx < 0 || idx >= reg.length
          then raise (Invalid_argument "Region.read_char")
          else String.unsafe_get reg.data idx

It will be a bit slower, but maybe not too much.

- Xavier Leroy




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Internals details for cmmgen.ml
  1999-12-11 18:09 ` Xavier Leroy
@ 1999-12-18  0:51   ` John Prevost
  1999-12-18 18:56     ` Jerome Vouillon
  0 siblings, 1 reply; 5+ messages in thread
From: John Prevost @ 1999-12-18  0:51 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Xavier Leroy <Xavier.Leroy@inria.fr> writes:

> Using a "checkbound" is perhaps the simplest solution.  Otherwise,
> some system-wide exceptions such as Invalid_argument are assigned
> global symbols and you don't need to guess their integer index inside
> their defining module: just emit the C-- code corresponding to
> 
>         (raise (symbol "Invalid_argument") (string "my message"))

I'll try this.

> If you're really into high-performance stuff, you could fold the
> permission check and the bounds check in one "checkbound" instruction.
> Just arrange the "write enable" flag to be (the Caml integer) 0 if
> write is allowed, and -1 if it is not.  Then, generate something like
> 
>         (checkbound (or index (write_enable_flag region) (size region)))

I don't think this is necessary--especially since both reading and
writing are things that could fail (so I need more than one bit).
When you're going for extreme speed, you'll probably use the unsafe
versions.

(How does -unsafe work, by the way?  Does it make the C-- "checkbound"
stuff work differently?)

> Although hacking cmmgen.ml is fun, you could get a more portable
> implementation by writing it in ML using unsafe string accesses.
> Those will happily work on any char *, not necessarily on well-formed
> Caml strings.  Something like:
> 
>         external mmap : ... -> string
>         type t = { data: string; length: int }
> 
>         let read_char reg idx =
>           if idx < 0 || idx >= reg.length
>           then raise (Invalid_argument "Region.read_char")
>           else String.unsafe_get reg.data idx
> 
> It will be a bit slower, but maybe not too much.

Hmm.  What do you mean by "a more portable implementation"?  One which
doesn't require compiler modifications, or one which works with
bytecode?  I believe that with bytecode, the C functions are
sufficient.

As for unsafe string access: but doesn't the pointer point to an
O'Caml block, which includes a tag and length information?

John.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Internals details for cmmgen.ml
  1999-12-18  0:51   ` John Prevost
@ 1999-12-18 18:56     ` Jerome Vouillon
  1999-12-23  5:26       ` John Prevost
  0 siblings, 1 reply; 5+ messages in thread
From: Jerome Vouillon @ 1999-12-18 18:56 UTC (permalink / raw)
  To: John Prevost, Xavier Leroy; +Cc: caml-list

On Fri, Dec 17, 1999 at 07:51:02PM -0500, John Prevost wrote:
> > If you're really into high-performance stuff, you could fold the
> > permission check and the bounds check in one "checkbound" instruction.
> > Just arrange the "write enable" flag to be (the Caml integer) 0 if
> > write is allowed, and -1 if it is not.  Then, generate something like

You could also probably make the permission check only once, when the
region is created, and use the type system to enforce safety.  For
instance, the interface could look something like this:

    type 'a perms
    type 'a t

    val read_only : <read:unit> perms
    val write_only : <write:unit> perms
    val read_write : <read:unit;write:unit> perms

    val create : Unix.file_desc -> 'a perms -> int -> int -> 'a t
    val unsafe_get : <read:unit;..> t -> int -> char
    val unsafe_set : <write:unit;..> t -> int -> char -> unit
    val get : <read:unit;..> t -> int -> char
    val set : <write:unit;..> t -> int -> char -> unit

The implementation would then be something like this:

    type 'a perms = int
    type 'a t = 'a string * int

    let read_only = 1
    let write_only = 2
    let read_write = 3

    let region_create fd perms offset count =
      (* Check that "fd" permissions and "perms" matches *)
      (* Mmap the region and returns it *)

    let check_bound n i = if i < 0 || i >= n then raise ...
    let unsafe_get (r, _) i = String.unsafe_get r i
    let unsafe_set (r, _) i v = String.unsafe_set r i v
    let get (r, n) i = check_bound n i; String.unsafe_get r i
    let set (r, n) i v = check_bound n i; String.unsafe_set r i v

> (How does -unsafe work, by the way?  Does it make the C-- "checkbound"
> stuff work differently?)

No, this flag is taken into account much earlier: it makes the parser
translate "e.(e')" as "Array.unsafe_get e e'" rather than "Array.get e
e'" (similarly, it makes use of the unsafe variant instead of the safe
one for the other array and string operations).

> Hmm.  What do you mean by "a more portable implementation"?  One which
> doesn't require compiler modifications, or one which works with
> bytecode?  I believe that with bytecode, the C functions are
> sufficient.

He probably means an implementation that does not require compiler
modifications.

> As for unsafe string access: but doesn't the pointer point to an
> O'Caml block, which includes a tag and length information?

The pointer points to the beginning of the data, just after the
header.  Moreover, this header is ignored for the unsafe string
access.

-- Jérôme




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Internals details for cmmgen.ml
  1999-12-18 18:56     ` Jerome Vouillon
@ 1999-12-23  5:26       ` John Prevost
  0 siblings, 0 replies; 5+ messages in thread
From: John Prevost @ 1999-12-23  5:26 UTC (permalink / raw)
  To: Jerome Vouillon; +Cc: Xavier Leroy, caml-list

Jerome Vouillon <Jerome.Vouillon@inria.fr> writes:

> You could also probably make the permission check only once, when the
> region is created, and use the type system to enforce safety.  For
> instance, the interface could look something like this:
> 
>     type 'a perms
>     type 'a t
> 
>     val read_only : <read:unit> perms
>     val write_only : <write:unit> perms
>     val read_write : <read:unit;write:unit> perms

Oho!  Nice trick.  I'd thought of using the type system, but it hadn't
occurred to me that I could use object types in such a nefarious way!
Nice!

John.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-12-23 16:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-10  8:12 Internals details for cmmgen.ml John Prevost
1999-12-11 18:09 ` Xavier Leroy
1999-12-18  0:51   ` John Prevost
1999-12-18 18:56     ` Jerome Vouillon
1999-12-23  5:26       ` John Prevost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).