Replied inline

On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com> wrote:

> Hello Kenneth,
>
> Sorry for the late reply. I have several deadlines during this weekend.
>
> To answer your question, our current approach cannot ensure 100%
> "reposition" correct.
> The most challenging part is to identify code pointers in global data
> sections, as we discussed
> in our paper, it is quite difficult to handle even with some static
> analysis techniques
> (type inference, for instance). We do have some false positive, as shown
> in the appendix of our paper [1].
> We will research more to eliminate the false positive.
>
> I believe it is doable to present a sound solution. It indeed requires
> some additional
> trampolines inserted in the binary code. You may refer to this paper for
> some enlightens [2].
>
> As for the disassembling challenges, we directly adopt a disassembly
> approach proposed
> by an excellent work [3]. You can check out their evaluation section, and
> find that their approach
> can correctly disassemble large-size applications without any error. My
> experience is that Linux ELF
> binaries are indeed easier to disassemble, and typical compilers (gcc;
> icc; llvm) would not
> insert data into code sections (the embedded data can trouble linear
> disassembler a lot).
>
>
I remember reading about [3] when it came out. That was a year after the
original REINS system came out that proposed re-writing binaries, along
with it's companion STIR. Shingled disassembly originated with Wartell et
al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
working on the integration of a sheering and PFSM enhanced Shingled
Disassembler into BAP. But if you've already implemented something like
that, what would be really valuable is if you were to review my shingled
disassembler implementation and I review yours that way we have some cross
review feedback.

Regarding the need for 100% accuracy, in the REINS and STIR papers, the
approach taken is to obtain very very high classification accuracy, but in
the case that correctness cannot be established, to simply retain each
interpretation of a byte sequence, so you are still correct in the instance
that it's code by treating it as such. Then, a companion technique is
introduced wherein the code section is retained in order that should such a
data reference in code instance occur and interpretation was incorrect,
such reference can read and write into the kept section. But if it's code,
it has been rewritten in the new section. Then it should remain correct in
any scenario.


> However, if I am asked to work on PE binaries, then I will probably start
> from IDA-Pro.
> We consider the disassembling challenge is orthogonal to our research.
>

It is good to have good interoperabiblity with IDA as a guided disassembler
and the actual new research tools. One of the most valuable things I can
think of is to write some plugin that will mechanize data extraction as
needed in order to accelerate manual intervention with the newer tools,
such as in the case of training.


>
> IMHO, our research reveals the (important) fact that even though
> theoretically relocation issue
> is hard to solve with 100% accuracy, it might not be as troublesome as it
> was assumed by previous work.
> Simple solutions can achieve good results.
>

Agreed; there are failback stratagems.


>
> I hope it answers your questions, otherwise, please let me know :)
>
> Best,
> Shuai
>
> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
> Extraction and Embedding for Software Security Applications
> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>
>
>
There's a good utility for working with white papers and interacting with
colleagues; mendeley.


>
>
>
>
>
> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
>> Well it's interesting that you've gone with a binary recompilation
>> approach. How do you ensure that, statically, for any given edit, you
>> reposition all the jump targets correctly? How do you deal with the
>> difficulty of disassembly reducing to the halting problem?
>>
>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I am glad that you are interested in our work!!
>>>
>>> Actually this project starts over 1.5 years ago, and I believe at that
>>> time, BAP (version 0.7 I believe?) is still a research prototype..
>>>
>>> I choose to implement from the stretch is because I want to have a nice
>>> tool for my own research projects, also I can have an opportunity
>>> to learn OCaml... :)
>>>
>>> Yes, I definitely would like to unite our efforts!!
>>>
>>> Best,
>>> Shuai
>>>
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>>
>>>> Hi Shuai,
>>>>
>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>> disassembler?
>>>>
>>>> Do you know, that we have a low-level interface to disassembling, like
>>>> [linear_sweep][2] or even
>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>> instruction level granularity.
>>>>
>>>> It will be very interesting, if we can unite our efforts.
>>>>
>>>> Best wishes,
>>>> Ivan Gotovchits
>>>>
>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>> [2]:
>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>> [3]:
>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear List,
>>>>>
>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>> for reassembleable disassembling and transformation.
>>>>>
>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>> You can find our research paper which describes the core technique
>>>>> implemented in Uroboros here:
>>>>>
>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>
>>>>> We will provide a project home page, as well as more detailed
>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>
>>>>> Happy hacking!
>>>>>
>>>>> Sincerely,
>>>>> Shuai
>>>>>
>>>>
>>>>
>>>
>>
>