caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Error with and Proper Library Usage
@ 2015-03-07  5:28 Kenneth Adam Miller
  2015-03-07  5:50 ` Kenneth Adam Miller
  2015-03-07  5:56 ` Ivan Gotovchits
  0 siblings, 2 replies; 4+ messages in thread
From: Kenneth Adam Miller @ 2015-03-07  5:28 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 4581 bytes --]

So, I want to use CMU's BAP to do some internal processing for a task that
I have been assigned. One of the pertinent parts is transforming assembler
representations of CPU instructions into the BAP Intermediate Language, or
BIL. It's kind of difficult, because there's only so much documentation
that is really anything more than just the MLI interface and the OCaml Doc
generated stuff. I have a lot of questions about how to proceed, but before
I begin eliciting the problem and all, let me explain about how I got where
I am.

You can install BAP through opam, but you don't get the documentation I
don't think. So,

git clone github.com/BinaryAnalysisPlatform/bap/

and then just follow the instructions on how to build it, it's not hard at
all, I got it going on Ubuntu 14.04. The only thing I ran into was an error
on a llvm dependency, which required that I edit the opam file so that I do
"--with-llvm-version=3.4" on the configure command line as an option. After
that everything ran smoothly.

Once you run bapbuild and make and all that, if you read the Makefile you
can see that you can generate all the documentation with:

make doc

which will place the HTML files at:

_build/bap.docdir

Opening up the index file at _build/bap.docdir/index.html, you can see that
the documentation starts off with a note about using Bap.Std as everything
else is interface files. What confused me is the seeming repetition of the
documentation that is generated. It seems that some of the documentation on
some of the very same pages is duplicated for certain sections. Why does it
do so much duplication?

The next question I have has to do with code organization. It seems that
ocamldoc derives the documentation from MLI files, and I know, I know-you
can limit or edit the exposure and type definition of your ml files so that
they form modules that control access from the outside. But I don't see how
to combine usage of the modules that the BAP author has done (I haven't
read the BAP code itself in terms of the ml, just searched around, consumed
the documentation, the examples directory contents (meager) and read over
readbin.ml and bap_mc.ml

Most important regarding proper consumption of the code that I have is are
the two avenues that I think would have made it the easiest for me to make
use of the library.

First, using the toplevel I tried to construct a BIL set of statements. But
the way the code works, you actually have to compose a disassembler that is
specific to your architecture (x32/64 and ARM vs Intel or whatever). You
then have to construct memory, and from that memory construct an Insn type,
which is meant to be the canonical, cross disassembler type representation
of an instruction. I can see how module use makes for great reusability of
code. Problem is, the type definitions that the toplevel reports (baptop)
and those of which are reported in the documentation seem to differ often.
TL;DR here, I tried to get as close to the front page mention of how to use
module Disasm, which meant Disasm.insn_at_mem function, but I had a hard
time navigating the modules to create what I wanted. It seems like each one
thing depends on some other portion of the library, and at one point I hit
a dead end. The documentation mentions the same functions being exposed
copiously, but that's when the type definitions wouldn't match up or
something.

Lastly, and ultimately even more confusing is that of bap_mc.ml, which I
saw as my second easiest avenue for usage of the BAP library. I saw
bap_mc.ml line 55 as my chance;

https://github.com/BinaryAnalysisPlatform/bap/blob/master/src/bap_mc/bap_mc.ml#L55

If I just were to modify it so that it, instead of watering down the string
constructed, were to just pipe the insn object to a BIL constructor, and
then use the sexp_of_bil transformer, then I could just drop it from there
to be printed or converted to string and then printed.

Naturally, I tried with several different module's bil constructor. But
most notably I think that the Std bil constructor blew up, so here's what I
replaced that line with:

  let s = (* Sexp.to_string @@ Sexp.List (List.rev res) in*)
    sexp_of_bil (Bap.Std.Insn.bil insn) in

But then even that blew up with:

Error: This expression has type ('a, 'b) Insn.t
       but an expression was expected of type insn

Anyway that describes the past two days of jumping into the biggest ocaml
project that I've even been into. I'm really excited to become more
proficient, but I think there's something I'm missing about OCaml library
organization strategies writ large.

[-- Attachment #2: Type: text/html, Size: 5477 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Error with and Proper Library Usage
  2015-03-07  5:28 [Caml-list] Error with and Proper Library Usage Kenneth Adam Miller
@ 2015-03-07  5:50 ` Kenneth Adam Miller
  2015-03-07  5:56 ` Ivan Gotovchits
  1 sibling, 0 replies; 4+ messages in thread
From: Kenneth Adam Miller @ 2015-03-07  5:50 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 5262 bytes --]

I actually got it down to as little as this:

let x = Bap_memory.create Bap_common.LittleEndian
(Bap_types.Std.Word.of_int 32 0) (Core_kernel.Bigstring.of_string "\xc3")
in
let y = ok_exn x in
let d = Bap_disasm.disassemble `x86_64 y in
Bap_disasm.Disasm.insn_at_mem d y;;

- : insn option = None

I don't understand; c3 is the opcode for return. I tried it with both `x86
and 64 as in the example. Both return none...

On Sat, Mar 7, 2015 at 12:28 AM, Kenneth Adam Miller <
kennethadammiller@gmail.com> wrote:

> So, I want to use CMU's BAP to do some internal processing for a task that
> I have been assigned. One of the pertinent parts is transforming assembler
> representations of CPU instructions into the BAP Intermediate Language, or
> BIL. It's kind of difficult, because there's only so much documentation
> that is really anything more than just the MLI interface and the OCaml Doc
> generated stuff. I have a lot of questions about how to proceed, but before
> I begin eliciting the problem and all, let me explain about how I got where
> I am.
>
> You can install BAP through opam, but you don't get the documentation I
> don't think. So,
>
> git clone github.com/BinaryAnalysisPlatform/bap/
>
> and then just follow the instructions on how to build it, it's not hard at
> all, I got it going on Ubuntu 14.04. The only thing I ran into was an error
> on a llvm dependency, which required that I edit the opam file so that I do
> "--with-llvm-version=3.4" on the configure command line as an option. After
> that everything ran smoothly.
>
> Once you run bapbuild and make and all that, if you read the Makefile you
> can see that you can generate all the documentation with:
>
> make doc
>
> which will place the HTML files at:
>
> _build/bap.docdir
>
> Opening up the index file at _build/bap.docdir/index.html, you can see
> that the documentation starts off with a note about using Bap.Std as
> everything else is interface files. What confused me is the seeming
> repetition of the documentation that is generated. It seems that some of
> the documentation on some of the very same pages is duplicated for certain
> sections. Why does it do so much duplication?
>
> The next question I have has to do with code organization. It seems that
> ocamldoc derives the documentation from MLI files, and I know, I know-you
> can limit or edit the exposure and type definition of your ml files so that
> they form modules that control access from the outside. But I don't see how
> to combine usage of the modules that the BAP author has done (I haven't
> read the BAP code itself in terms of the ml, just searched around, consumed
> the documentation, the examples directory contents (meager) and read over
> readbin.ml and bap_mc.ml
>
> Most important regarding proper consumption of the code that I have is are
> the two avenues that I think would have made it the easiest for me to make
> use of the library.
>
> First, using the toplevel I tried to construct a BIL set of statements.
> But the way the code works, you actually have to compose a disassembler
> that is specific to your architecture (x32/64 and ARM vs Intel or
> whatever). You then have to construct memory, and from that memory
> construct an Insn type, which is meant to be the canonical, cross
> disassembler type representation of an instruction. I can see how module
> use makes for great reusability of code. Problem is, the type definitions
> that the toplevel reports (baptop) and those of which are reported in the
> documentation seem to differ often. TL;DR here, I tried to get as close to
> the front page mention of how to use module Disasm, which meant
> Disasm.insn_at_mem function, but I had a hard time navigating the modules
> to create what I wanted. It seems like each one thing depends on some other
> portion of the library, and at one point I hit a dead end. The
> documentation mentions the same functions being exposed copiously, but
> that's when the type definitions wouldn't match up or something.
>
> Lastly, and ultimately even more confusing is that of bap_mc.ml, which I
> saw as my second easiest avenue for usage of the BAP library. I saw
> bap_mc.ml line 55 as my chance;
>
>
> https://github.com/BinaryAnalysisPlatform/bap/blob/master/src/bap_mc/bap_mc.ml#L55
>
> If I just were to modify it so that it, instead of watering down the
> string constructed, were to just pipe the insn object to a BIL constructor,
> and then use the sexp_of_bil transformer, then I could just drop it from
> there to be printed or converted to string and then printed.
>
> Naturally, I tried with several different module's bil constructor. But
> most notably I think that the Std bil constructor blew up, so here's what I
> replaced that line with:
>
>   let s = (* Sexp.to_string @@ Sexp.List (List.rev res) in*)
>     sexp_of_bil (Bap.Std.Insn.bil insn) in
>
> But then even that blew up with:
>
> Error: This expression has type ('a, 'b) Insn.t
>        but an expression was expected of type insn
>
> Anyway that describes the past two days of jumping into the biggest ocaml
> project that I've even been into. I'm really excited to become more
> proficient, but I think there's something I'm missing about OCaml library
> organization strategies writ large.
>

[-- Attachment #2: Type: text/html, Size: 6517 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Error with and Proper Library Usage
  2015-03-07  5:28 [Caml-list] Error with and Proper Library Usage Kenneth Adam Miller
  2015-03-07  5:50 ` Kenneth Adam Miller
@ 2015-03-07  5:56 ` Ivan Gotovchits
  2015-03-07  6:21   ` Kenneth Adam Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Ivan Gotovchits @ 2015-03-07  5:56 UTC (permalink / raw)
  To: Kenneth Adam Miller; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 5443 bytes --]



On Mar 7, 2015, at 12:28 AM, Kenneth Adam Miller <kennethadammiller@gmail.com> wrote:

> So, I want to use CMU's BAP to do some internal processing for a task that I have been assigned. One of the pertinent parts is transforming assembler representations of CPU instructions into the BAP Intermediate Language, or BIL. 

BAP is more about disassembling. So it can easily lift binary string into the BIL. But if you have assembly you need to compile it into machine code first. So, you need to find assembler. (For example, you can use `llvm-mc` from llvm toolkit) 


> It's kind of difficult, because there's only so much documentation that is really anything more than just the MLI interface and the OCaml Doc generated stuff. I have a lot of questions about how to proceed, but before I begin eliciting the problem and all, let me explain about how I got where I am.

Yes, unfortunately that’s true. First off all BAP is currently under active development. Next, there is no properly working doc generator for OCaml right now, that will handle complex project containing many modules and sub libraries. I’m looking at odoc with a great hope.

 
> 
> You can install BAP through opam, but you don't get the documentation I don't think. So,
> 
> git clone github.com/BinaryAnalysisPlatform/bap/
> 
> and then just follow the instructions on how to build it, it's not hard at all, I got it going on Ubuntu 14.04. The only thing I ran into was an error on a llvm dependency, which required that I edit the opam file so that I do "--with-llvm-version=3.4" on the configure command line as an option. After that everything ran smoothly.

Actually when you install BAP with opam you will have the documentation installed also. It is automatically installed at `~/.opam/???/doc/bap`. You can query the path to the documentation with the following command: 

     opam config var gap:doc

Moreover, we provide a compiled API documentation on github pages. I will update the main site with the link. 
Also, you may find this [1] page interesting.

[1]: https://github.com/BinaryAnalysisPlatform/bap/wiki/Build-tips-and-tricks


> Once you run bapbuild and make and all that….

You don’t need to run bapbuild at all, this is not a tool to build BAP, this is a tool to build applications and plugins that use BAP. 

> Opening up the index file at _build/bap.docdir/index.html, you can see that the documentation starts off with a note about using Bap.Std as everything else is interface files. What confused me is the seeming repetition of the documentation that is generated. It seems that some of the documentation on some of the very same pages is duplicated for certain sections. Why does it do so much duplication?

Thats how ocamldoc works. Actually, the auto generated documentation is of very low quality. I personally suggest you to setup your Emacs environment, with merlin and everything else.  Then you can navigate through the project using `C-c C-l` (jump to definition). Look here [2] for instructions about how to configure Emacs

[2]: https://github.com/BinaryAnalysisPlatform/bap/wiki/Emacs

> 
> First, using the toplevel I tried to construct a BIL set of statements. But the way the code works, you actually have to compose a disassembler that is specific to your architecture (x32/64 and ARM vs Intel or whatever). You then have to construct memory, and from that memory construct an Insn type, which is meant to be the canonical, cross disassembler type representation of an instruction. I can see how module use makes for great reusability of code. Problem is, the type definitions that the toplevel reports (baptop) and those of which are reported in the documentation seem to differ often. TL;DR here, I tried to get as close to the front page mention of how to use module Disasm, which meant Disasm.insn_at_mem function, but I had a hard time navigating the modules to create what I wanted. It seems like each one thing depends on some other portion of the library, and at one point I hit a dead end. The documentation mentions the same functions being exposed copiously, but that's when the type definitions wouldn't match up or something.

I’m not sure that I understand you correctly. If you have just bytes, the use function `disassemble` that accepts memory and arch.  You can use `Memory.create` to make memory, and `Bigstring.of_string` to create a bigstring of string/



> Lastly, and ultimately even more confusing is that of bap_mc.ml, which I saw as my second easiest avenue for usage of the BAP library. I saw bap_mc.ml line 55 as my chance;
> 
> https://github.com/BinaryAnalysisPlatform/bap/blob/master/src/bap_mc/bap_mc.ml#L55
> 
> If I just were to modify it so that it, instead of watering down the string constructed, were to just pipe the insn object to a BIL constructor, and then use the sexp_of_bil transformer, then I could just drop it from there to be printed or converted to string and then printed.
> 
> Naturally, I tried with several different module's bil constructor. But most notably I think that the Std bil constructor blew up, so here's what I replaced that line with:

Oh, please, don’t use bap_mc as an example, as it is very low level. It is intended for debugging the underlying disassembly and uses very low-level interface, with lots of hard to understand phantom types. Please, try to stay with convenient Disasm module.



[-- Attachment #2: Type: text/html, Size: 7107 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Error with and Proper Library Usage
  2015-03-07  5:56 ` Ivan Gotovchits
@ 2015-03-07  6:21   ` Kenneth Adam Miller
  0 siblings, 0 replies; 4+ messages in thread
From: Kenneth Adam Miller @ 2015-03-07  6:21 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 7336 bytes --]

Epic! Library author!

Were you influenced by recent research from UTD concerning high accuracy
disassembly using machine learning techniques for distinguishing data &
code?

On Sat, Mar 7, 2015 at 12:56 AM, Ivan Gotovchits <ivg@ieee.org> wrote:

>
>
> On Mar 7, 2015, at 12:28 AM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
> So, I want to use CMU's BAP to do some internal processing for a task that
> I have been assigned. One of the pertinent parts is transforming assembler
> representations of CPU instructions into the BAP Intermediate Language, or
> BIL.
>
>
> BAP is more about disassembling. So it can easily lift binary string into
> the BIL. But if you have assembly you need to compile it into machine code
> first. So, you need to find assembler. (For example, you can use `llvm-mc`
> from llvm toolkit)
>
>
>

Right, but shouldn't "\x90" or "\xc3" be interpreted by the toplevel as hex
in order that it also be disassembled properly? Our current use case is
primitive, but it involves passing each instruction individually by command
line to a binary to be transformed to BIL.

> It's kind of difficult, because there's only so much documentation that is
> really anything more than just the MLI interface and the OCaml Doc
> generated stuff. I have a lot of questions about how to proceed, but before
> I begin eliciting the problem and all, let me explain about how I got where
> I am.
>
>
> Yes, unfortunately that’s true. First off all BAP is currently under
> active development. Next, there is no properly working doc generator for
> OCaml right now, that will handle complex project containing many modules
> and sub libraries. I’m looking at odoc with a great hope.
>
>
No loss, I'm happy to dive in with BAP any way it goes documentation just
makes things easier. Ok, possibly I could do some write ups about what I
learn and pass them back or something to help?


>
>
>
> You can install BAP through opam, but you don't get the documentation I
> don't think. So,
>
> git clone github.com/BinaryAnalysisPlatform/bap/
>
> and then just follow the instructions on how to build it, it's not hard at
> all, I got it going on Ubuntu 14.04. The only thing I ran into was an error
> on a llvm dependency, which required that I edit the opam file so that I do
> "--with-llvm-version=3.4" on the configure command line as an option. After
> that everything ran smoothly.
>
>
> Actually when you install BAP with opam you will have the documentation
> installed also. It is automatically installed at `~/.opam/???/doc/bap`. You
> can query the path to the documentation with the following command:
>
>      opam config var gap:doc
>
>
Yeah, I didn't exactly research the directions because writing the 2 day
logging was a pell mell effort.


> Moreover, we provide a compiled API documentation on github pages. I will
> update the main site with the link.
> Also, you may find this [1] page interesting.
>
> [1]:
> https://github.com/BinaryAnalysisPlatform/bap/wiki/Build-tips-and-tricks
>
>
>
Cool :) I'm just happy I got it built on my machine. BAP is looking really
decent, several of the things I had sitting on a private machine and had
thought about giving back already I see done very well. BAP as a service
(ZMQ too!), dependency segregation, baptop, OCamlMakefile elminination
(only even wished for this one).


> Once you run bapbuild and make and all that….
>
>
> You don’t need to run bapbuild at all, this is not a tool to build BAP,
> this is a tool to build applications and plugins that use BAP.
>
> Opening up the index file at _build/bap.docdir/index.html, you can see
> that the documentation starts off with a note about using Bap.Std as
> everything else is interface files. What confused me is the seeming
> repetition of the documentation that is generated. It seems that some of
> the documentation on some of the very same pages is duplicated for certain
> sections. Why does it do so much duplication?
>
>
> Thats how ocamldoc works. Actually, the auto generated documentation is of
> very low quality. I personally suggest you to setup your Emacs environment,
> with merlin and everything else.  Then you can navigate through the project
> using `C-c C-l` (jump to definition). Look here [2] for instructions about
> how to configure Emacs
>
> [2]: https://github.com/BinaryAnalysisPlatform/bap/wiki/Emacs
>
>
> First, using the toplevel I tried to construct a BIL set of statements.
> But the way the code works, you actually have to compose a disassembler
> that is specific to your architecture (x32/64 and ARM vs Intel or
> whatever). You then have to construct memory, and from that memory
> construct an Insn type, which is meant to be the canonical, cross
> disassembler type representation of an instruction. I can see how module
> use makes for great reusability of code. Problem is, the type definitions
> that the toplevel reports (baptop) and those of which are reported in the
> documentation seem to differ often. TL;DR here, I tried to get as close to
> the front page mention of how to use module Disasm, which meant
> Disasm.insn_at_mem function, but I had a hard time navigating the modules
> to create what I wanted. It seems like each one thing depends on some other
> portion of the library, and at one point I hit a dead end. The
> documentation mentions the same functions being exposed copiously, but
> that's when the type definitions wouldn't match up or something.
>
>
> I’m not sure that I understand you correctly. If you have just bytes, the
> use function `disassemble` that accepts memory and arch.  You can use
> `Memory.create` to make memory, and `Bigstring.of_string` to create a
> bigstring of string/
>
>
>
I re-read the documentation on the front page and went back to that, as per
my most recent email. :)


> Lastly, and ultimately even more confusing is that of bap_mc.ml, which I
> saw as my second easiest avenue for usage of the BAP library. I saw
> bap_mc.ml line 55 as my chance;
>
>
> https://github.com/BinaryAnalysisPlatform/bap/blob/master/src/bap_mc/bap_mc.ml#L55
>
> If I just were to modify it so that it, instead of watering down the
> string constructed, were to just pipe the insn object to a BIL constructor,
> and then use the sexp_of_bil transformer, then I could just drop it from
> there to be printed or converted to string and then printed.
>
> Naturally, I tried with several different module's bil constructor. But
> most notably I think that the Std bil constructor blew up, so here's what I
> replaced that line with:
>
>
> Oh, please, don’t use bap_mc as an example, as it is very low level. It is
> intended for debugging the underlying disassembly and uses very low-level
> interface, with lots of hard to understand phantom types. Please, try to
> stay with convenient Disasm module.
>
>
> Perhaps there are subtleties to the module language that I'm missing, and
my lack of understanding how the modules fit together combined with an
overlapping module language vernacular would be why it's more difficult. Ok
sure, after interacting more with the meant interface, I can see a bit
better how BAP is intended to be used.

[-- Attachment #2: Type: text/html, Size: 10236 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-07  6:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-07  5:28 [Caml-list] Error with and Proper Library Usage Kenneth Adam Miller
2015-03-07  5:50 ` Kenneth Adam Miller
2015-03-07  5:56 ` Ivan Gotovchits
2015-03-07  6:21   ` Kenneth Adam Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).