Hello Kenneth,

Sorry for the late reply. I have several deadlines during this weekend. 

To answer your question, our current approach cannot ensure 100% "reposition" correct. 
The most challenging part is to identify code pointers in global data sections, as we discussed 
in our paper, it is quite difficult to handle even with some static analysis techniques 
(type inference, for instance). We do have some false positive, as shown in the appendix of our paper [1]. 
We will research more to eliminate the false positive. 

I believe it is doable to present a sound solution. It indeed requires some additional
trampolines inserted in the binary code. You may refer to this paper for some enlightens [2]. 
 
As for the disassembling challenges, we directly adopt a disassembly approach proposed 
by an excellent work [3]. You can check out their evaluation section, and find that their approach 
can correctly disassemble large-size applications without any error. My experience is that Linux ELF 
binaries are indeed easier to disassemble, and typical compilers (gcc; icc; llvm) would not 
insert data into code sections (the embedded data can trouble linear disassembler a lot). 

However, if I am asked to work on PE binaries, then I will probably start from IDA-Pro. 
We consider the disassembling challenge is orthogonal to our research. 

IMHO, our research reveals the (important) fact that even though theoretically relocation issue 
is hard to solve with 100% accuracy, it might not be as troublesome as it was assumed by previous work.
Simple solutions can achieve good results. 

I hope it answers your questions, otherwise, please let me know :) 

Best,
Shuai

[1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
[2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component Extraction and Embedding for Software Security Applications
[3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.







On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <kennethadammiller@gmail.com> wrote:
Well it's interesting that you've gone with a binary recompilation approach. How do you ensure that, statically, for any given edit, you reposition all the jump targets correctly? How do you deal with the difficulty of disassembly reducing to the halting problem?

On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com> wrote:
Hi guys,

I am glad that you are interested in our work!! 

Actually this project starts over 1.5 years ago, and I believe at that time, BAP (version 0.7 I believe?) is still a research prototype..

I choose to implement from the stretch is because I want to have a nice tool for my own research projects, also I can have an opportunity
to learn OCaml... :)

Yes, I definitely would like to unite our efforts!! 

Best,
Shuai




On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
Hi Shuai,

Nice work! But I'm curious, why didn't you use [bap][1] as a disassembler? 

Do you know, that we have a low-level interface to disassembling, like [linear_sweep][2] or even
lower [Disasm_expert.Basic][3] interface, that can disassemble on instruction level granularity.

It will be very interesting, if we can unite our efforts.

Best wishes,
Ivan Gotovchits





On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com> wrote:
Dear List,

I’m glad to announce the first release of Uroboros:  an infrastructure for reassembleable disassembling and transformation.

You can find the code here: https://github.com/s3team/uroboros 
You can find our research paper which describes the core technique implemented in Uroboros here: 

We will provide a project home page, as well as more detailed documents in the near future.  Issues and pull requests welcomed.

Happy hacking!

Sincerely,
Shuai