caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list]  [ANN] Uroboros 0.1
@ 2015-10-23 17:05 Shuai Wang
  2015-10-23 17:30 ` Ivan Gotovchits
  0 siblings, 1 reply; 13+ messages in thread
From: Shuai Wang @ 2015-10-23 17:05 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

Dear List,

I’m glad to announce the first release of Uroboros:  an infrastructure for
reassembleable disassembling and transformation.

You can find the code here: https://github.com/s3team/uroboros
You can find our research paper which describes the core technique
implemented in Uroboros here:
https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf

We will provide a project home page, as well as more detailed documents in
the near future.  Issues and pull requests welcomed.

Happy hacking!

Sincerely,
Shuai

[-- Attachment #2: Type: text/html, Size: 1659 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-23 17:05 [Caml-list] [ANN] Uroboros 0.1 Shuai Wang
@ 2015-10-23 17:30 ` Ivan Gotovchits
  2015-10-23 17:45   ` Kenneth Adam Miller
  2015-10-23 20:59   ` Shuai Wang
  0 siblings, 2 replies; 13+ messages in thread
From: Ivan Gotovchits @ 2015-10-23 17:30 UTC (permalink / raw)
  To: Shuai Wang; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 1315 bytes --]

Hi Shuai,

Nice work! But I'm curious, why didn't you use [bap][1] as a disassembler?

Do you know, that we have a low-level interface to disassembling, like
[linear_sweep][2] or even
lower [Disasm_expert.Basic][3] interface, that can disassemble on
instruction level granularity.

It will be very interesting, if we can unite our efforts.

Best wishes,
Ivan Gotovchits

[1]: https://github.com/BinaryAnalysisPlatform/bap
[2]:
http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
[3]:
http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html




On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com> wrote:

> Dear List,
>
> I’m glad to announce the first release of Uroboros:  an infrastructure
> for reassembleable disassembling and transformation.
>
> You can find the code here: https://github.com/s3team/uroboros
> You can find our research paper which describes the core technique
> implemented in Uroboros here:
>
> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>
> We will provide a project home page, as well as more detailed documents in
> the near future.  Issues and pull requests welcomed.
>
> Happy hacking!
>
> Sincerely,
> Shuai
>

[-- Attachment #2: Type: text/html, Size: 3139 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-23 17:30 ` Ivan Gotovchits
@ 2015-10-23 17:45   ` Kenneth Adam Miller
  2015-10-26 17:04     ` Eric Cooper
  2015-10-23 20:59   ` Shuai Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Kenneth Adam Miller @ 2015-10-23 17:45 UTC (permalink / raw)
  To: Ivan Gotovchits; +Cc: Shuai Wang, caml users

[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]

I agree. I immediately saw this and thought the same things.

Rewriting binaries is going to be retarded hard. We should rely on one
another, and discussion will be critical when it comes to addressing more
formal aspects of it, like enforcement computability.

On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:

> Hi Shuai,
>
> Nice work! But I'm curious, why didn't you use [bap][1] as a disassembler?
>
> Do you know, that we have a low-level interface to disassembling, like
> [linear_sweep][2] or even
> lower [Disasm_expert.Basic][3] interface, that can disassemble on
> instruction level granularity.
>
> It will be very interesting, if we can unite our efforts.
>
> Best wishes,
> Ivan Gotovchits
>
> [1]: https://github.com/BinaryAnalysisPlatform/bap
> [2]:
> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
> [3]:
> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>
>
>
>
> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
> wrote:
>
>> Dear List,
>>
>> I’m glad to announce the first release of Uroboros:  an infrastructure
>> for reassembleable disassembling and transformation.
>>
>> You can find the code here: https://github.com/s3team/uroboros
>> You can find our research paper which describes the core technique
>> implemented in Uroboros here:
>>
>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>
>> We will provide a project home page, as well as more detailed documents
>> in the near future.  Issues and pull requests welcomed.
>>
>> Happy hacking!
>>
>> Sincerely,
>> Shuai
>>
>
>

[-- Attachment #2: Type: text/html, Size: 3882 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-23 17:30 ` Ivan Gotovchits
  2015-10-23 17:45   ` Kenneth Adam Miller
@ 2015-10-23 20:59   ` Shuai Wang
  2015-10-23 22:31     ` Kenneth Adam Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Shuai Wang @ 2015-10-23 20:59 UTC (permalink / raw)
  To: Ivan Gotovchits; +Cc: caml users, Kenneth Miller

[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]

Hi guys,

I am glad that you are interested in our work!!

Actually this project starts over 1.5 years ago, and I believe at that
time, BAP (version 0.7 I believe?) is still a research prototype..

I choose to implement from the stretch is because I want to have a nice
tool for my own research projects, also I can have an opportunity
to learn OCaml... :)

Yes, I definitely would like to unite our efforts!!

Best,
Shuai




On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:

> Hi Shuai,
>
> Nice work! But I'm curious, why didn't you use [bap][1] as a disassembler?
>
> Do you know, that we have a low-level interface to disassembling, like
> [linear_sweep][2] or even
> lower [Disasm_expert.Basic][3] interface, that can disassemble on
> instruction level granularity.
>
> It will be very interesting, if we can unite our efforts.
>
> Best wishes,
> Ivan Gotovchits
>
> [1]: https://github.com/BinaryAnalysisPlatform/bap
> [2]:
> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
> [3]:
> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>
>
>
>
> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
> wrote:
>
>> Dear List,
>>
>> I’m glad to announce the first release of Uroboros:  an infrastructure
>> for reassembleable disassembling and transformation.
>>
>> You can find the code here: https://github.com/s3team/uroboros
>> You can find our research paper which describes the core technique
>> implemented in Uroboros here:
>>
>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>
>> We will provide a project home page, as well as more detailed documents
>> in the near future.  Issues and pull requests welcomed.
>>
>> Happy hacking!
>>
>> Sincerely,
>> Shuai
>>
>
>

[-- Attachment #2: Type: text/html, Size: 4208 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-23 20:59   ` Shuai Wang
@ 2015-10-23 22:31     ` Kenneth Adam Miller
  2015-10-25 19:04       ` Shuai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Kenneth Adam Miller @ 2015-10-23 22:31 UTC (permalink / raw)
  To: Shuai Wang; +Cc: Ivan Gotovchits, caml users

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

Well it's interesting that you've gone with a binary recompilation
approach. How do you ensure that, statically, for any given edit, you
reposition all the jump targets correctly? How do you deal with the
difficulty of disassembly reducing to the halting problem?

On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com> wrote:

> Hi guys,
>
> I am glad that you are interested in our work!!
>
> Actually this project starts over 1.5 years ago, and I believe at that
> time, BAP (version 0.7 I believe?) is still a research prototype..
>
> I choose to implement from the stretch is because I want to have a nice
> tool for my own research projects, also I can have an opportunity
> to learn OCaml... :)
>
> Yes, I definitely would like to unite our efforts!!
>
> Best,
> Shuai
>
>
>
>
> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>
>> Hi Shuai,
>>
>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>> disassembler?
>>
>> Do you know, that we have a low-level interface to disassembling, like
>> [linear_sweep][2] or even
>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>> instruction level granularity.
>>
>> It will be very interesting, if we can unite our efforts.
>>
>> Best wishes,
>> Ivan Gotovchits
>>
>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>> [2]:
>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>> [3]:
>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>
>>
>>
>>
>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>> wrote:
>>
>>> Dear List,
>>>
>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>> for reassembleable disassembling and transformation.
>>>
>>> You can find the code here: https://github.com/s3team/uroboros
>>> You can find our research paper which describes the core technique
>>> implemented in Uroboros here:
>>>
>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>
>>> We will provide a project home page, as well as more detailed documents
>>> in the near future.  Issues and pull requests welcomed.
>>>
>>> Happy hacking!
>>>
>>> Sincerely,
>>> Shuai
>>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 4884 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-23 22:31     ` Kenneth Adam Miller
@ 2015-10-25 19:04       ` Shuai Wang
  2015-10-25 19:25         ` Kenneth Adam Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Shuai Wang @ 2015-10-25 19:04 UTC (permalink / raw)
  To: Kenneth Adam Miller; +Cc: Ivan Gotovchits, caml users

[-- Attachment #1: Type: text/plain, Size: 4510 bytes --]

Hello Kenneth,

Sorry for the late reply. I have several deadlines during this weekend.

To answer your question, our current approach cannot ensure 100%
"reposition" correct.
The most challenging part is to identify code pointers in global data
sections, as we discussed
in our paper, it is quite difficult to handle even with some static
analysis techniques
(type inference, for instance). We do have some false positive, as shown in
the appendix of our paper [1].
We will research more to eliminate the false positive.

I believe it is doable to present a sound solution. It indeed requires some
additional
trampolines inserted in the binary code. You may refer to this paper for
some enlightens [2].

As for the disassembling challenges, we directly adopt a disassembly
approach proposed
by an excellent work [3]. You can check out their evaluation section, and
find that their approach
can correctly disassemble large-size applications without any error. My
experience is that Linux ELF
binaries are indeed easier to disassemble, and typical compilers (gcc; icc;
llvm) would not
insert data into code sections (the embedded data can trouble linear
disassembler a lot).

However, if I am asked to work on PE binaries, then I will probably start
from IDA-Pro.
We consider the disassembling challenge is orthogonal to our research.

IMHO, our research reveals the (important) fact that even though
theoretically relocation issue
is hard to solve with 100% accuracy, it might not be as troublesome as it
was assumed by previous work.
Simple solutions can achieve good results.

I hope it answers your questions, otherwise, please let me know :)

Best,
Shuai

[1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
[2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
Extraction and Embedding for Software Security Applications
[3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.







On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
kennethadammiller@gmail.com> wrote:

> Well it's interesting that you've gone with a binary recompilation
> approach. How do you ensure that, statically, for any given edit, you
> reposition all the jump targets correctly? How do you deal with the
> difficulty of disassembly reducing to the halting problem?
>
> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
> wrote:
>
>> Hi guys,
>>
>> I am glad that you are interested in our work!!
>>
>> Actually this project starts over 1.5 years ago, and I believe at that
>> time, BAP (version 0.7 I believe?) is still a research prototype..
>>
>> I choose to implement from the stretch is because I want to have a nice
>> tool for my own research projects, also I can have an opportunity
>> to learn OCaml... :)
>>
>> Yes, I definitely would like to unite our efforts!!
>>
>> Best,
>> Shuai
>>
>>
>>
>>
>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>
>>> Hi Shuai,
>>>
>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>> disassembler?
>>>
>>> Do you know, that we have a low-level interface to disassembling, like
>>> [linear_sweep][2] or even
>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>> instruction level granularity.
>>>
>>> It will be very interesting, if we can unite our efforts.
>>>
>>> Best wishes,
>>> Ivan Gotovchits
>>>
>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>> [2]:
>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>> [3]:
>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>> wrote:
>>>
>>>> Dear List,
>>>>
>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>> for reassembleable disassembling and transformation.
>>>>
>>>> You can find the code here: https://github.com/s3team/uroboros
>>>> You can find our research paper which describes the core technique
>>>> implemented in Uroboros here:
>>>>
>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>
>>>> We will provide a project home page, as well as more detailed documents
>>>> in the near future.  Issues and pull requests welcomed.
>>>>
>>>> Happy hacking!
>>>>
>>>> Sincerely,
>>>> Shuai
>>>>
>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 8780 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-25 19:04       ` Shuai Wang
@ 2015-10-25 19:25         ` Kenneth Adam Miller
  2015-10-25 20:49           ` Shuai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Kenneth Adam Miller @ 2015-10-25 19:25 UTC (permalink / raw)
  To: Shuai Wang; +Cc: Ivan Gotovchits, caml users

[-- Attachment #1: Type: text/plain, Size: 6581 bytes --]

Replied inline

On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com> wrote:

> Hello Kenneth,
>
> Sorry for the late reply. I have several deadlines during this weekend.
>
> To answer your question, our current approach cannot ensure 100%
> "reposition" correct.
> The most challenging part is to identify code pointers in global data
> sections, as we discussed
> in our paper, it is quite difficult to handle even with some static
> analysis techniques
> (type inference, for instance). We do have some false positive, as shown
> in the appendix of our paper [1].
> We will research more to eliminate the false positive.
>
> I believe it is doable to present a sound solution. It indeed requires
> some additional
> trampolines inserted in the binary code. You may refer to this paper for
> some enlightens [2].
>
> As for the disassembling challenges, we directly adopt a disassembly
> approach proposed
> by an excellent work [3]. You can check out their evaluation section, and
> find that their approach
> can correctly disassemble large-size applications without any error. My
> experience is that Linux ELF
> binaries are indeed easier to disassemble, and typical compilers (gcc;
> icc; llvm) would not
> insert data into code sections (the embedded data can trouble linear
> disassembler a lot).
>
>
I remember reading about [3] when it came out. That was a year after the
original REINS system came out that proposed re-writing binaries, along
with it's companion STIR. Shingled disassembly originated with Wartell et
al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
working on the integration of a sheering and PFSM enhanced Shingled
Disassembler into BAP. But if you've already implemented something like
that, what would be really valuable is if you were to review my shingled
disassembler implementation and I review yours that way we have some cross
review feedback.

Regarding the need for 100% accuracy, in the REINS and STIR papers, the
approach taken is to obtain very very high classification accuracy, but in
the case that correctness cannot be established, to simply retain each
interpretation of a byte sequence, so you are still correct in the instance
that it's code by treating it as such. Then, a companion technique is
introduced wherein the code section is retained in order that should such a
data reference in code instance occur and interpretation was incorrect,
such reference can read and write into the kept section. But if it's code,
it has been rewritten in the new section. Then it should remain correct in
any scenario.


> However, if I am asked to work on PE binaries, then I will probably start
> from IDA-Pro.
> We consider the disassembling challenge is orthogonal to our research.
>

It is good to have good interoperabiblity with IDA as a guided disassembler
and the actual new research tools. One of the most valuable things I can
think of is to write some plugin that will mechanize data extraction as
needed in order to accelerate manual intervention with the newer tools,
such as in the case of training.


>
> IMHO, our research reveals the (important) fact that even though
> theoretically relocation issue
> is hard to solve with 100% accuracy, it might not be as troublesome as it
> was assumed by previous work.
> Simple solutions can achieve good results.
>

Agreed; there are failback stratagems.


>
> I hope it answers your questions, otherwise, please let me know :)
>
> Best,
> Shuai
>
> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
> Extraction and Embedding for Software Security Applications
> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>
>
>
There's a good utility for working with white papers and interacting with
colleagues; mendeley.


>
>
>
>
>
> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
>> Well it's interesting that you've gone with a binary recompilation
>> approach. How do you ensure that, statically, for any given edit, you
>> reposition all the jump targets correctly? How do you deal with the
>> difficulty of disassembly reducing to the halting problem?
>>
>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I am glad that you are interested in our work!!
>>>
>>> Actually this project starts over 1.5 years ago, and I believe at that
>>> time, BAP (version 0.7 I believe?) is still a research prototype..
>>>
>>> I choose to implement from the stretch is because I want to have a nice
>>> tool for my own research projects, also I can have an opportunity
>>> to learn OCaml... :)
>>>
>>> Yes, I definitely would like to unite our efforts!!
>>>
>>> Best,
>>> Shuai
>>>
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>>
>>>> Hi Shuai,
>>>>
>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>> disassembler?
>>>>
>>>> Do you know, that we have a low-level interface to disassembling, like
>>>> [linear_sweep][2] or even
>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>> instruction level granularity.
>>>>
>>>> It will be very interesting, if we can unite our efforts.
>>>>
>>>> Best wishes,
>>>> Ivan Gotovchits
>>>>
>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>> [2]:
>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>> [3]:
>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear List,
>>>>>
>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>> for reassembleable disassembling and transformation.
>>>>>
>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>> You can find our research paper which describes the core technique
>>>>> implemented in Uroboros here:
>>>>>
>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>
>>>>> We will provide a project home page, as well as more detailed
>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>
>>>>> Happy hacking!
>>>>>
>>>>> Sincerely,
>>>>> Shuai
>>>>>
>>>>
>>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 11727 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-25 19:25         ` Kenneth Adam Miller
@ 2015-10-25 20:49           ` Shuai Wang
  2015-10-25 21:23             ` Kenneth Adam Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Shuai Wang @ 2015-10-25 20:49 UTC (permalink / raw)
  To: Kenneth Adam Miller; +Cc: Ivan Gotovchits, caml users

[-- Attachment #1: Type: text/plain, Size: 8144 bytes --]

Hello Kenneth,

Yes, I agree Binary Stirring system can eliminate symbolization false
positive as well. Actually I believe
many research work, and  tools (DynInst, for instance) have implemented
this a so-called "replica-based"
binary instrumentation framework. This is a quite robust way to
instrument binary code, although size expansion and
performance penalty cannot be ignored in the instrumentation outputs.

However, I found those solutions are all quite complex, and difficult to
understand. And it might not be inaccurate
to assume "aggressive" instrumentation methods can break the functionality
due to the limitation of design,
or challenges in bug-less implementation. I even found that some
the-state-of-the-art binary instrumentation tools
cannot preserve the correct functionality when employing them to instrument
some SPEC2006 test cases.

I personally would like to find some cleaner solutions, which can introduce
very little overhead in terms of binary
size and execution. Besides, some research work reveals that binary
security applications built on top of previous
instrumentation framework do leave certain exploitable vulnerabilities due
to the design limitations.

Sincerely,
Shuai






On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
kennethadammiller@gmail.com> wrote:

> Replied inline
>
> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
> wrote:
>
>> Hello Kenneth,
>>
>> Sorry for the late reply. I have several deadlines during this weekend.
>>
>> To answer your question, our current approach cannot ensure 100%
>> "reposition" correct.
>> The most challenging part is to identify code pointers in global data
>> sections, as we discussed
>> in our paper, it is quite difficult to handle even with some static
>> analysis techniques
>> (type inference, for instance). We do have some false positive, as shown
>> in the appendix of our paper [1].
>> We will research more to eliminate the false positive.
>>
>> I believe it is doable to present a sound solution. It indeed requires
>> some additional
>> trampolines inserted in the binary code. You may refer to this paper for
>> some enlightens [2].
>>
>> As for the disassembling challenges, we directly adopt a disassembly
>> approach proposed
>> by an excellent work [3]. You can check out their evaluation section, and
>> find that their approach
>> can correctly disassemble large-size applications without any error. My
>> experience is that Linux ELF
>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>> icc; llvm) would not
>> insert data into code sections (the embedded data can trouble linear
>> disassembler a lot).
>>
>>
> I remember reading about [3] when it came out. That was a year after the
> original REINS system came out that proposed re-writing binaries, along
> with it's companion STIR. Shingled disassembly originated with Wartell et
> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
> working on the integration of a sheering and PFSM enhanced Shingled
> Disassembler into BAP. But if you've already implemented something like
> that, what would be really valuable is if you were to review my shingled
> disassembler implementation and I review yours that way we have some cross
> review feedback.
>
> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
> approach taken is to obtain very very high classification accuracy, but in
> the case that correctness cannot be established, to simply retain each
> interpretation of a byte sequence, so you are still correct in the instance
> that it's code by treating it as such. Then, a companion technique is
> introduced wherein the code section is retained in order that should such a
> data reference in code instance occur and interpretation was incorrect,
> such reference can read and write into the kept section. But if it's code,
> it has been rewritten in the new section. Then it should remain correct in
> any scenario.
>
>
>> However, if I am asked to work on PE binaries, then I will probably start
>> from IDA-Pro.
>> We consider the disassembling challenge is orthogonal to our research.
>>
>
> It is good to have good interoperabiblity with IDA as a guided
> disassembler and the actual new research tools. One of the most valuable
> things I can think of is to write some plugin that will mechanize data
> extraction as needed in order to accelerate manual intervention with the
> newer tools, such as in the case of training.
>
>
>>
>> IMHO, our research reveals the (important) fact that even though
>> theoretically relocation issue
>> is hard to solve with 100% accuracy, it might not be as troublesome as it
>> was assumed by previous work.
>> Simple solutions can achieve good results.
>>
>
> Agreed; there are failback stratagems.
>
>
>>
>> I hope it answers your questions, otherwise, please let me know :)
>>
>> Best,
>> Shuai
>>
>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>> Extraction and Embedding for Software Security Applications
>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>
>>
>>
> There's a good utility for working with white papers and interacting with
> colleagues; mendeley.
>
>
>>
>>
>>
>>
>>
>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>> kennethadammiller@gmail.com> wrote:
>>
>>> Well it's interesting that you've gone with a binary recompilation
>>> approach. How do you ensure that, statically, for any given edit, you
>>> reposition all the jump targets correctly? How do you deal with the
>>> difficulty of disassembly reducing to the halting problem?
>>>
>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I am glad that you are interested in our work!!
>>>>
>>>> Actually this project starts over 1.5 years ago, and I believe at that
>>>> time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>
>>>> I choose to implement from the stretch is because I want to have a nice
>>>> tool for my own research projects, also I can have an opportunity
>>>> to learn OCaml... :)
>>>>
>>>> Yes, I definitely would like to unite our efforts!!
>>>>
>>>> Best,
>>>> Shuai
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>>>
>>>>> Hi Shuai,
>>>>>
>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>> disassembler?
>>>>>
>>>>> Do you know, that we have a low-level interface to disassembling, like
>>>>> [linear_sweep][2] or even
>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>> instruction level granularity.
>>>>>
>>>>> It will be very interesting, if we can unite our efforts.
>>>>>
>>>>> Best wishes,
>>>>> Ivan Gotovchits
>>>>>
>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>> [2]:
>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>> [3]:
>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear List,
>>>>>>
>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>> for reassembleable disassembling and transformation.
>>>>>>
>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>> You can find our research paper which describes the core technique
>>>>>> implemented in Uroboros here:
>>>>>>
>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>
>>>>>> We will provide a project home page, as well as more detailed
>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>
>>>>>> Happy hacking!
>>>>>>
>>>>>> Sincerely,
>>>>>> Shuai
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 13731 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-25 20:49           ` Shuai Wang
@ 2015-10-25 21:23             ` Kenneth Adam Miller
  2015-10-25 23:11               ` Shuai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Kenneth Adam Miller @ 2015-10-25 21:23 UTC (permalink / raw)
  To: Shuai Wang; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 9263 bytes --]

I'm quite sure we are thinking of different things.

STIR is a binary randomization technique used to mitigate rop, and was
developed in sitsu with the binary rewriting techniques. The technique of
retaining the original code section is a failback to guard against errors
in rewriting, but to my knowledge doesn't impose a performance penalty.
Size required is a constant multiple, so I don't consider it an adoption
hurdle. But everybody has different use scenarios.

Right. Correctness is critical. I think co program proof methodologies with
tools like coq will shine here in proofs to remove required trust that a
rewrtten binary is conformant to certain execution properties.

I hadn't know static rewriters even existed. I presume you are you talking
about dynamic tools.
On Oct 25, 2015 4:49 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:

> Hello Kenneth,
>
> Yes, I agree Binary Stirring system can eliminate symbolization false
> positive as well. Actually I believe
> many research work, and  tools (DynInst, for instance) have implemented
> this a so-called "replica-based"
> binary instrumentation framework. This is a quite robust way to
> instrument binary code, although size expansion and
> performance penalty cannot be ignored in the instrumentation outputs.
>
> However, I found those solutions are all quite complex, and difficult to
> understand. And it might not be inaccurate
> to assume "aggressive" instrumentation methods can break the functionality
> due to the limitation of design,
> or challenges in bug-less implementation. I even found that some
> the-state-of-the-art binary instrumentation tools
> cannot preserve the correct functionality when employing them to
> instrument some SPEC2006 test cases.
>
> I personally would like to find some cleaner solutions, which can
> introduce very little overhead in terms of binary
> size and execution. Besides, some research work reveals that binary
> security applications built on top of previous
> instrumentation framework do leave certain exploitable vulnerabilities due
> to the design limitations.
>
> Sincerely,
> Shuai
>
>
>
>
>
>
> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
>> Replied inline
>>
>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
>> wrote:
>>
>>> Hello Kenneth,
>>>
>>> Sorry for the late reply. I have several deadlines during this weekend.
>>>
>>> To answer your question, our current approach cannot ensure 100%
>>> "reposition" correct.
>>> The most challenging part is to identify code pointers in global data
>>> sections, as we discussed
>>> in our paper, it is quite difficult to handle even with some static
>>> analysis techniques
>>> (type inference, for instance). We do have some false positive, as shown
>>> in the appendix of our paper [1].
>>> We will research more to eliminate the false positive.
>>>
>>> I believe it is doable to present a sound solution. It indeed requires
>>> some additional
>>> trampolines inserted in the binary code. You may refer to this paper for
>>> some enlightens [2].
>>>
>>> As for the disassembling challenges, we directly adopt a disassembly
>>> approach proposed
>>> by an excellent work [3]. You can check out their evaluation section,
>>> and find that their approach
>>> can correctly disassemble large-size applications without any error. My
>>> experience is that Linux ELF
>>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>>> icc; llvm) would not
>>> insert data into code sections (the embedded data can trouble linear
>>> disassembler a lot).
>>>
>>>
>> I remember reading about [3] when it came out. That was a year after the
>> original REINS system came out that proposed re-writing binaries, along
>> with it's companion STIR. Shingled disassembly originated with Wartell et
>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
>> working on the integration of a sheering and PFSM enhanced Shingled
>> Disassembler into BAP. But if you've already implemented something like
>> that, what would be really valuable is if you were to review my shingled
>> disassembler implementation and I review yours that way we have some cross
>> review feedback.
>>
>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
>> approach taken is to obtain very very high classification accuracy, but in
>> the case that correctness cannot be established, to simply retain each
>> interpretation of a byte sequence, so you are still correct in the instance
>> that it's code by treating it as such. Then, a companion technique is
>> introduced wherein the code section is retained in order that should such a
>> data reference in code instance occur and interpretation was incorrect,
>> such reference can read and write into the kept section. But if it's code,
>> it has been rewritten in the new section. Then it should remain correct in
>> any scenario.
>>
>>
>>> However, if I am asked to work on PE binaries, then I will probably
>>> start from IDA-Pro.
>>> We consider the disassembling challenge is orthogonal to our research.
>>>
>>
>> It is good to have good interoperabiblity with IDA as a guided
>> disassembler and the actual new research tools. One of the most valuable
>> things I can think of is to write some plugin that will mechanize data
>> extraction as needed in order to accelerate manual intervention with the
>> newer tools, such as in the case of training.
>>
>>
>>>
>>> IMHO, our research reveals the (important) fact that even though
>>> theoretically relocation issue
>>> is hard to solve with 100% accuracy, it might not be as troublesome as
>>> it was assumed by previous work.
>>> Simple solutions can achieve good results.
>>>
>>
>> Agreed; there are failback stratagems.
>>
>>
>>>
>>> I hope it answers your questions, otherwise, please let me know :)
>>>
>>> Best,
>>> Shuai
>>>
>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>>> Extraction and Embedding for Software Security Applications
>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>>
>>>
>>>
>> There's a good utility for working with white papers and interacting with
>> colleagues; mendeley.
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>>> kennethadammiller@gmail.com> wrote:
>>>
>>>> Well it's interesting that you've gone with a binary recompilation
>>>> approach. How do you ensure that, statically, for any given edit, you
>>>> reposition all the jump targets correctly? How do you deal with the
>>>> difficulty of disassembly reducing to the halting problem?
>>>>
>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I am glad that you are interested in our work!!
>>>>>
>>>>> Actually this project starts over 1.5 years ago, and I believe at that
>>>>> time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>>
>>>>> I choose to implement from the stretch is because I want to have a
>>>>> nice tool for my own research projects, also I can have an opportunity
>>>>> to learn OCaml... :)
>>>>>
>>>>> Yes, I definitely would like to unite our efforts!!
>>>>>
>>>>> Best,
>>>>> Shuai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>>>>
>>>>>> Hi Shuai,
>>>>>>
>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>>> disassembler?
>>>>>>
>>>>>> Do you know, that we have a low-level interface to disassembling,
>>>>>> like [linear_sweep][2] or even
>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>>> instruction level granularity.
>>>>>>
>>>>>> It will be very interesting, if we can unite our efforts.
>>>>>>
>>>>>> Best wishes,
>>>>>> Ivan Gotovchits
>>>>>>
>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>>> [2]:
>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>>> [3]:
>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear List,
>>>>>>>
>>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>>> for reassembleable disassembling and transformation.
>>>>>>>
>>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>>> You can find our research paper which describes the core technique
>>>>>>> implemented in Uroboros here:
>>>>>>>
>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>>
>>>>>>> We will provide a project home page, as well as more detailed
>>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>>
>>>>>>> Happy hacking!
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Shuai
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 14847 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-25 21:23             ` Kenneth Adam Miller
@ 2015-10-25 23:11               ` Shuai Wang
  2015-10-25 23:46                 ` Kenneth Adam Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Shuai Wang @ 2015-10-25 23:11 UTC (permalink / raw)
  To: Kenneth Adam Miller; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 10026 bytes --]

I though "STIR" refers to Binary Stirring [1].

Leveraging coq to verify the instrumentation correctness sounds
very interesting to me, although I am not aware of any existing related
work.
Do you any existing work?

I think there do exist some static rewriter, such as DynInst.

[1] Binary Stirring: http://www.utdallas.edu/~hamlen/wartell12ccs.pdf
[2] DynInst : http://www.dyninst.org/



On Sun, Oct 25, 2015 at 5:23 PM, Kenneth Adam Miller <
kennethadammiller@gmail.com> wrote:

> I'm quite sure we are thinking of different things.
>
> STIR is a binary randomization technique used to mitigate rop, and was
> developed in sitsu with the binary rewriting techniques. The technique of
> retaining the original code section is a failback to guard against errors
> in rewriting, but to my knowledge doesn't impose a performance penalty.
> Size required is a constant multiple, so I don't consider it an adoption
> hurdle. But everybody has different use scenarios.
>
> Right. Correctness is critical. I think co program proof methodologies
> with tools like coq will shine here in proofs to remove required trust that
> a rewrtten binary is conformant to certain execution properties.
>
> I hadn't know static rewriters even existed. I presume you are you talking
> about dynamic tools.
> On Oct 25, 2015 4:49 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:
>
>> Hello Kenneth,
>>
>> Yes, I agree Binary Stirring system can eliminate symbolization false
>> positive as well. Actually I believe
>> many research work, and  tools (DynInst, for instance) have implemented
>> this a so-called "replica-based"
>> binary instrumentation framework. This is a quite robust way to
>> instrument binary code, although size expansion and
>> performance penalty cannot be ignored in the instrumentation outputs.
>>
>> However, I found those solutions are all quite complex, and difficult to
>> understand. And it might not be inaccurate
>> to assume "aggressive" instrumentation methods can break the
>> functionality due to the limitation of design,
>> or challenges in bug-less implementation. I even found that some
>> the-state-of-the-art binary instrumentation tools
>> cannot preserve the correct functionality when employing them to
>> instrument some SPEC2006 test cases.
>>
>> I personally would like to find some cleaner solutions, which can
>> introduce very little overhead in terms of binary
>> size and execution. Besides, some research work reveals that binary
>> security applications built on top of previous
>> instrumentation framework do leave certain exploitable vulnerabilities
>> due to the design limitations.
>>
>> Sincerely,
>> Shuai
>>
>>
>>
>>
>>
>>
>> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
>> kennethadammiller@gmail.com> wrote:
>>
>>> Replied inline
>>>
>>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
>>> wrote:
>>>
>>>> Hello Kenneth,
>>>>
>>>> Sorry for the late reply. I have several deadlines during this weekend.
>>>>
>>>> To answer your question, our current approach cannot ensure 100%
>>>> "reposition" correct.
>>>> The most challenging part is to identify code pointers in global data
>>>> sections, as we discussed
>>>> in our paper, it is quite difficult to handle even with some static
>>>> analysis techniques
>>>> (type inference, for instance). We do have some false positive, as
>>>> shown in the appendix of our paper [1].
>>>> We will research more to eliminate the false positive.
>>>>
>>>> I believe it is doable to present a sound solution. It indeed requires
>>>> some additional
>>>> trampolines inserted in the binary code. You may refer to this paper
>>>> for some enlightens [2].
>>>>
>>>> As for the disassembling challenges, we directly adopt a disassembly
>>>> approach proposed
>>>> by an excellent work [3]. You can check out their evaluation section,
>>>> and find that their approach
>>>> can correctly disassemble large-size applications without any error. My
>>>> experience is that Linux ELF
>>>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>>>> icc; llvm) would not
>>>> insert data into code sections (the embedded data can trouble linear
>>>> disassembler a lot).
>>>>
>>>>
>>> I remember reading about [3] when it came out. That was a year after the
>>> original REINS system came out that proposed re-writing binaries, along
>>> with it's companion STIR. Shingled disassembly originated with Wartell et
>>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
>>> working on the integration of a sheering and PFSM enhanced Shingled
>>> Disassembler into BAP. But if you've already implemented something like
>>> that, what would be really valuable is if you were to review my shingled
>>> disassembler implementation and I review yours that way we have some cross
>>> review feedback.
>>>
>>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
>>> approach taken is to obtain very very high classification accuracy, but in
>>> the case that correctness cannot be established, to simply retain each
>>> interpretation of a byte sequence, so you are still correct in the instance
>>> that it's code by treating it as such. Then, a companion technique is
>>> introduced wherein the code section is retained in order that should such a
>>> data reference in code instance occur and interpretation was incorrect,
>>> such reference can read and write into the kept section. But if it's code,
>>> it has been rewritten in the new section. Then it should remain correct in
>>> any scenario.
>>>
>>>
>>>> However, if I am asked to work on PE binaries, then I will probably
>>>> start from IDA-Pro.
>>>> We consider the disassembling challenge is orthogonal to our research.
>>>>
>>>
>>> It is good to have good interoperabiblity with IDA as a guided
>>> disassembler and the actual new research tools. One of the most valuable
>>> things I can think of is to write some plugin that will mechanize data
>>> extraction as needed in order to accelerate manual intervention with the
>>> newer tools, such as in the case of training.
>>>
>>>
>>>>
>>>> IMHO, our research reveals the (important) fact that even though
>>>> theoretically relocation issue
>>>> is hard to solve with 100% accuracy, it might not be as troublesome as
>>>> it was assumed by previous work.
>>>> Simple solutions can achieve good results.
>>>>
>>>
>>> Agreed; there are failback stratagems.
>>>
>>>
>>>>
>>>> I hope it answers your questions, otherwise, please let me know :)
>>>>
>>>> Best,
>>>> Shuai
>>>>
>>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>>>> Extraction and Embedding for Software Security Applications
>>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>>>
>>>>
>>>>
>>> There's a good utility for working with white papers and interacting
>>> with colleagues; mendeley.
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>>>> kennethadammiller@gmail.com> wrote:
>>>>
>>>>> Well it's interesting that you've gone with a binary recompilation
>>>>> approach. How do you ensure that, statically, for any given edit, you
>>>>> reposition all the jump targets correctly? How do you deal with the
>>>>> difficulty of disassembly reducing to the halting problem?
>>>>>
>>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I am glad that you are interested in our work!!
>>>>>>
>>>>>> Actually this project starts over 1.5 years ago, and I believe at
>>>>>> that time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>>>
>>>>>> I choose to implement from the stretch is because I want to have a
>>>>>> nice tool for my own research projects, also I can have an opportunity
>>>>>> to learn OCaml... :)
>>>>>>
>>>>>> Yes, I definitely would like to unite our efforts!!
>>>>>>
>>>>>> Best,
>>>>>> Shuai
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Shuai,
>>>>>>>
>>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>>>> disassembler?
>>>>>>>
>>>>>>> Do you know, that we have a low-level interface to disassembling,
>>>>>>> like [linear_sweep][2] or even
>>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>>>> instruction level granularity.
>>>>>>>
>>>>>>> It will be very interesting, if we can unite our efforts.
>>>>>>>
>>>>>>> Best wishes,
>>>>>>> Ivan Gotovchits
>>>>>>>
>>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>>>> [2]:
>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>>>> [3]:
>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear List,
>>>>>>>>
>>>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>>>> for reassembleable disassembling and transformation.
>>>>>>>>
>>>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>>>> You can find our research paper which describes the core technique
>>>>>>>> implemented in Uroboros here:
>>>>>>>>
>>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>>>
>>>>>>>> We will provide a project home page, as well as more detailed
>>>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>>>
>>>>>>>> Happy hacking!
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Shuai
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

[-- Attachment #2: Type: text/html, Size: 15940 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-25 23:11               ` Shuai Wang
@ 2015-10-25 23:46                 ` Kenneth Adam Miller
  0 siblings, 0 replies; 13+ messages in thread
From: Kenneth Adam Miller @ 2015-10-25 23:46 UTC (permalink / raw)
  To: Shuai Wang; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 10653 bytes --]

No, that's right, but I was making a distinguishment in the two techniques,
and I originally only mentioned stir because that and its companion papers
is where the technique of retaining the original code region made its
debut.

Thanks I hadn't heard about dyn inst.
On Oct 25, 2015 7:11 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:

> I though "STIR" refers to Binary Stirring [1].
>
> Leveraging coq to verify the instrumentation correctness sounds
> very interesting to me, although I am not aware of any existing related
> work.
> Do you any existing work?
>
> I think there do exist some static rewriter, such as DynInst.
>
> [1] Binary Stirring: http://www.utdallas.edu/~hamlen/wartell12ccs.pdf
> [2] DynInst : http://www.dyninst.org/
>
>
>
> On Sun, Oct 25, 2015 at 5:23 PM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
>> I'm quite sure we are thinking of different things.
>>
>> STIR is a binary randomization technique used to mitigate rop, and was
>> developed in sitsu with the binary rewriting techniques. The technique of
>> retaining the original code section is a failback to guard against errors
>> in rewriting, but to my knowledge doesn't impose a performance penalty.
>> Size required is a constant multiple, so I don't consider it an adoption
>> hurdle. But everybody has different use scenarios.
>>
>> Right. Correctness is critical. I think co program proof methodologies
>> with tools like coq will shine here in proofs to remove required trust that
>> a rewrtten binary is conformant to certain execution properties.
>>
>> I hadn't know static rewriters even existed. I presume you are you
>> talking about dynamic tools.
>> On Oct 25, 2015 4:49 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:
>>
>>> Hello Kenneth,
>>>
>>> Yes, I agree Binary Stirring system can eliminate symbolization false
>>> positive as well. Actually I believe
>>> many research work, and  tools (DynInst, for instance) have implemented
>>> this a so-called "replica-based"
>>> binary instrumentation framework. This is a quite robust way to
>>> instrument binary code, although size expansion and
>>> performance penalty cannot be ignored in the instrumentation outputs.
>>>
>>> However, I found those solutions are all quite complex, and difficult to
>>> understand. And it might not be inaccurate
>>> to assume "aggressive" instrumentation methods can break the
>>> functionality due to the limitation of design,
>>> or challenges in bug-less implementation. I even found that some
>>> the-state-of-the-art binary instrumentation tools
>>> cannot preserve the correct functionality when employing them to
>>> instrument some SPEC2006 test cases.
>>>
>>> I personally would like to find some cleaner solutions, which can
>>> introduce very little overhead in terms of binary
>>> size and execution. Besides, some research work reveals that binary
>>> security applications built on top of previous
>>> instrumentation framework do leave certain exploitable vulnerabilities
>>> due to the design limitations.
>>>
>>> Sincerely,
>>> Shuai
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
>>> kennethadammiller@gmail.com> wrote:
>>>
>>>> Replied inline
>>>>
>>>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Kenneth,
>>>>>
>>>>> Sorry for the late reply. I have several deadlines during this
>>>>> weekend.
>>>>>
>>>>> To answer your question, our current approach cannot ensure 100%
>>>>> "reposition" correct.
>>>>> The most challenging part is to identify code pointers in global data
>>>>> sections, as we discussed
>>>>> in our paper, it is quite difficult to handle even with some static
>>>>> analysis techniques
>>>>> (type inference, for instance). We do have some false positive, as
>>>>> shown in the appendix of our paper [1].
>>>>> We will research more to eliminate the false positive.
>>>>>
>>>>> I believe it is doable to present a sound solution. It indeed requires
>>>>> some additional
>>>>> trampolines inserted in the binary code. You may refer to this paper
>>>>> for some enlightens [2].
>>>>>
>>>>> As for the disassembling challenges, we directly adopt a disassembly
>>>>> approach proposed
>>>>> by an excellent work [3]. You can check out their evaluation section,
>>>>> and find that their approach
>>>>> can correctly disassemble large-size applications without any error.
>>>>> My experience is that Linux ELF
>>>>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>>>>> icc; llvm) would not
>>>>> insert data into code sections (the embedded data can trouble linear
>>>>> disassembler a lot).
>>>>>
>>>>>
>>>> I remember reading about [3] when it came out. That was a year after
>>>> the original REINS system came out that proposed re-writing binaries, along
>>>> with it's companion STIR. Shingled disassembly originated with Wartell et
>>>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
>>>> working on the integration of a sheering and PFSM enhanced Shingled
>>>> Disassembler into BAP. But if you've already implemented something like
>>>> that, what would be really valuable is if you were to review my shingled
>>>> disassembler implementation and I review yours that way we have some cross
>>>> review feedback.
>>>>
>>>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
>>>> approach taken is to obtain very very high classification accuracy, but in
>>>> the case that correctness cannot be established, to simply retain each
>>>> interpretation of a byte sequence, so you are still correct in the instance
>>>> that it's code by treating it as such. Then, a companion technique is
>>>> introduced wherein the code section is retained in order that should such a
>>>> data reference in code instance occur and interpretation was incorrect,
>>>> such reference can read and write into the kept section. But if it's code,
>>>> it has been rewritten in the new section. Then it should remain correct in
>>>> any scenario.
>>>>
>>>>
>>>>> However, if I am asked to work on PE binaries, then I will probably
>>>>> start from IDA-Pro.
>>>>> We consider the disassembling challenge is orthogonal to our research.
>>>>>
>>>>
>>>> It is good to have good interoperabiblity with IDA as a guided
>>>> disassembler and the actual new research tools. One of the most valuable
>>>> things I can think of is to write some plugin that will mechanize data
>>>> extraction as needed in order to accelerate manual intervention with the
>>>> newer tools, such as in the case of training.
>>>>
>>>>
>>>>>
>>>>> IMHO, our research reveals the (important) fact that even though
>>>>> theoretically relocation issue
>>>>> is hard to solve with 100% accuracy, it might not be as troublesome as
>>>>> it was assumed by previous work.
>>>>> Simple solutions can achieve good results.
>>>>>
>>>>
>>>> Agreed; there are failback stratagems.
>>>>
>>>>
>>>>>
>>>>> I hope it answers your questions, otherwise, please let me know :)
>>>>>
>>>>> Best,
>>>>> Shuai
>>>>>
>>>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>>>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>>>>> Extraction and Embedding for Software Security Applications
>>>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>>>>
>>>>>
>>>>>
>>>> There's a good utility for working with white papers and interacting
>>>> with colleagues; mendeley.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>>>>> kennethadammiller@gmail.com> wrote:
>>>>>
>>>>>> Well it's interesting that you've gone with a binary recompilation
>>>>>> approach. How do you ensure that, statically, for any given edit, you
>>>>>> reposition all the jump targets correctly? How do you deal with the
>>>>>> difficulty of disassembly reducing to the halting problem?
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I am glad that you are interested in our work!!
>>>>>>>
>>>>>>> Actually this project starts over 1.5 years ago, and I believe at
>>>>>>> that time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>>>>
>>>>>>> I choose to implement from the stretch is because I want to have a
>>>>>>> nice tool for my own research projects, also I can have an opportunity
>>>>>>> to learn OCaml... :)
>>>>>>>
>>>>>>> Yes, I definitely would like to unite our efforts!!
>>>>>>>
>>>>>>> Best,
>>>>>>> Shuai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Shuai,
>>>>>>>>
>>>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>>>>> disassembler?
>>>>>>>>
>>>>>>>> Do you know, that we have a low-level interface to disassembling,
>>>>>>>> like [linear_sweep][2] or even
>>>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>>>>> instruction level granularity.
>>>>>>>>
>>>>>>>> It will be very interesting, if we can unite our efforts.
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>> Ivan Gotovchits
>>>>>>>>
>>>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>>>>> [2]:
>>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>>>>> [3]:
>>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Dear List,
>>>>>>>>>
>>>>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>>>>> for reassembleable disassembling and transformation.
>>>>>>>>>
>>>>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>>>>> You can find our research paper which describes the core technique
>>>>>>>>> implemented in Uroboros here:
>>>>>>>>>
>>>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>>>>
>>>>>>>>> We will provide a project home page, as well as more detailed
>>>>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>>>>
>>>>>>>>> Happy hacking!
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Shuai
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

[-- Attachment #2: Type: text/html, Size: 16557 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-23 17:45   ` Kenneth Adam Miller
@ 2015-10-26 17:04     ` Eric Cooper
  2015-10-26 17:05       ` Kenneth Adam Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Cooper @ 2015-10-26 17:04 UTC (permalink / raw)
  To: caml-list

On Fri, Oct 23, 2015 at 01:45:27PM -0400, Kenneth Adam Miller wrote:
> Rewriting binaries is going to be retarded hard.

Please don't use the term "retarded" in this offensive fashion.

--
Eric Cooper             e c c @ c m u . e d u

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] [ANN] Uroboros 0.1
  2015-10-26 17:04     ` Eric Cooper
@ 2015-10-26 17:05       ` Kenneth Adam Miller
  0 siblings, 0 replies; 13+ messages in thread
From: Kenneth Adam Miller @ 2015-10-26 17:05 UTC (permalink / raw)
  To: Eric Cooper, caml users

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

On Mon, Oct 26, 2015 at 1:04 PM, Eric Cooper <ecc@cmu.edu> wrote:

> On Fri, Oct 23, 2015 at 01:45:27PM -0400, Kenneth Adam Miller wrote:
> > Rewriting binaries is going to be retarded hard.
>
> Please don't use the term "retarded" in this offensive fashion.
>

Ok, it's going to be very very hard.


>
> --
> Eric Cooper             e c c @ c m u . e d u
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1447 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-10-26 17:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-23 17:05 [Caml-list] [ANN] Uroboros 0.1 Shuai Wang
2015-10-23 17:30 ` Ivan Gotovchits
2015-10-23 17:45   ` Kenneth Adam Miller
2015-10-26 17:04     ` Eric Cooper
2015-10-26 17:05       ` Kenneth Adam Miller
2015-10-23 20:59   ` Shuai Wang
2015-10-23 22:31     ` Kenneth Adam Miller
2015-10-25 19:04       ` Shuai Wang
2015-10-25 19:25         ` Kenneth Adam Miller
2015-10-25 20:49           ` Shuai Wang
2015-10-25 21:23             ` Kenneth Adam Miller
2015-10-25 23:11               ` Shuai Wang
2015-10-25 23:46                 ` Kenneth Adam Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).