From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 347207FB5F for ; Sun, 25 Oct 2015 22:23:18 +0100 (CET) IronPort-PHdr: 9a23:BEunehXrkdm6GwnphIsSnerttX3V8LGtZVwlr6E/grcLSJyIuqrYZhyDt8tkgFKBZ4jH8fUM07OQ6PC9HzRYqb+681k8M7V0HycfjssXmwFySOWkMmbcaMDQUiohAc5ZX0Vk9XzoeWJcGcL5ekGA6ibqtW1aJBzzOEJPK/jvHcaK1oLsh730o8WbSj4LrQT+SIs6FA+xowTVu5teqqpZAYF19CH0pGBVcf9d32JiKAHbtR/94sCt4MwrqHwI6LoJvvRNWqTifqk+UacQTHF/azh0t4XXskz4TRaG5zMjW2MZ2k5XCg7K9xHnV5ag6nLSue902S3cNsrzG+MaQzOnuoRmThnllCdPHjIw9Snyi8h0gbgT9BGsoRpy347dbIiQMft6eq7HVdwfTGtFGM1WUnoSUcuHc4ITAr9Zbq5jpI7nqg5L9EPmCA== Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=kennethadammiller@gmail.com; spf=Pass smtp.mailfrom=kennethadammiller@gmail.com; spf=None smtp.helo=postmaster@mail-yk0-f181.google.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of kennethadammiller@gmail.com) identity=pra; client-ip=209.85.160.181; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="kennethadammiller@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of kennethadammiller@gmail.com designates 209.85.160.181 as permitted sender) identity=mailfrom; client-ip=209.85.160.181; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="kennethadammiller@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-yk0-f181.google.com) identity=helo; client-ip=209.85.160.181; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="postmaster@mail-yk0-f181.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0BRAADWRi1WlLWgVdFUCoJpgSFgDwaELahvkCJ4AQ2BWiOCQYFggVkCgRAHOBQBAQEBAQEBARABAQEBBwsLCR8wgiuCBwEBAQIBARIRHQEbDAsGAQMBCwYFCw0qAgIhAQERAQUBHAYTCBMBBod4AQMKCA2kFYExPjGLSYFsgnmHXAoZJw1WhAABAQEBAQUBAQEBAQEBAQEUAQUOhECCKoN3gQaCUIFgWQQHgmmBRQWSYoNUhRyCcIMhgXWBWYQ/gySLJINfgiQSI4EXEQ4BAYIiJCOBeCI0AQEBAYcUAQEB X-IPAS-Result: A0BRAADWRi1WlLWgVdFUCoJpgSFgDwaELahvkCJ4AQ2BWiOCQYFggVkCgRAHOBQBAQEBAQEBARABAQEBBwsLCR8wgiuCBwEBAQIBARIRHQEbDAsGAQMBCwYFCw0qAgIhAQERAQUBHAYTCBMBBod4AQMKCA2kFYExPjGLSYFsgnmHXAoZJw1WhAABAQEBAQUBAQEBAQEBAQEUAQUOhECCKoN3gQaCUIFgWQQHgmmBRQWSYoNUhRyCcIMhgXWBWYQ/gySLJINfgiQSI4EXEQ4BAYIiJCOBeCI0AQEBAYcUAQEB X-IronPort-AV: E=Sophos;i="5.20,197,1444687200"; d="scan'208";a="184387115" Received: from mail-yk0-f181.google.com ([209.85.160.181]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 25 Oct 2015 22:23:16 +0100 Received: by yknn9 with SMTP id n9so166271640ykn.0 for ; Sun, 25 Oct 2015 14:23:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+2h1bhiMCAEgT5vqpU/bC9KzBU3hinmNfk4GSIoLgHE=; b=ZgSev4KucH2DzKVDP84k1mjAe6pTc0PXDWWniXQeMweF3BK+x0DwUzzMxpNwuc1FSv FBG8C2IBUZFOKcJm+biEV8MU+/HSPpnTkRPEu4sb4iSr0JVidnhtcfGW2YDbFYe88SNi QbW4dBokETZ8YFTR+7ZWY7kOacchclG0SQ2Gj1sEJSVKZCDDIjHDtnJbuVKMdGKHFuZi 1enu0IeKW5uQqFAwibLoMSL4goGiqB/Cu3DypkEripO5kumzgZ1D3KXNgF5xZkaWlTud bqZDrN3k4v8qpC/emcq1tCFaOPaYbV7hQ18jRr+jr8fBv1zACkoPLv4/dAfedoOBmqbP S2bA== MIME-Version: 1.0 X-Received: by 10.13.192.66 with SMTP id b63mr23099060ywd.80.1445808195238; Sun, 25 Oct 2015 14:23:15 -0700 (PDT) Received: by 10.37.65.143 with HTTP; Sun, 25 Oct 2015 14:23:15 -0700 (PDT) Received: by 10.37.65.143 with HTTP; Sun, 25 Oct 2015 14:23:15 -0700 (PDT) In-Reply-To: References: Date: Sun, 25 Oct 2015 17:23:15 -0400 Message-ID: From: Kenneth Adam Miller To: Shuai Wang Cc: caml users Content-Type: multipart/alternative; boundary=001a114e650ee402a00522f47313 Subject: Re: [Caml-list] [ANN] Uroboros 0.1 --001a114e650ee402a00522f47313 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I'm quite sure we are thinking of different things. STIR is a binary randomization technique used to mitigate rop, and was developed in sitsu with the binary rewriting techniques. The technique of retaining the original code section is a failback to guard against errors in rewriting, but to my knowledge doesn't impose a performance penalty. Size required is a constant multiple, so I don't consider it an adoption hurdle. But everybody has different use scenarios. Right. Correctness is critical. I think co program proof methodologies with tools like coq will shine here in proofs to remove required trust that a rewrtten binary is conformant to certain execution properties. I hadn't know static rewriters even existed. I presume you are you talking about dynamic tools. On Oct 25, 2015 4:49 PM, "Shuai Wang" wrote: > Hello Kenneth, > > Yes, I agree Binary Stirring system can eliminate symbolization false > positive as well. Actually I believe > many research work, and tools (DynInst, for instance) have implemented > this a so-called "replica-based" > binary instrumentation framework. This is a quite robust way to > instrument binary code, although size expansion and > performance penalty cannot be ignored in the instrumentation outputs. > > However, I found those solutions are all quite complex, and difficult to > understand. And it might not be inaccurate > to assume "aggressive" instrumentation methods can break the functionality > due to the limitation of design, > or challenges in bug-less implementation. I even found that some > the-state-of-the-art binary instrumentation tools > cannot preserve the correct functionality when employing them to > instrument some SPEC2006 test cases. > > I personally would like to find some cleaner solutions, which can > introduce very little overhead in terms of binary > size and execution. Besides, some research work reveals that binary > security applications built on top of previous > instrumentation framework do leave certain exploitable vulnerabilities due > to the design limitations. > > Sincerely, > Shuai > > > > > > > On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller < > kennethadammiller@gmail.com> wrote: > >> Replied inline >> >> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang >> wrote: >> >>> Hello Kenneth, >>> >>> Sorry for the late reply. I have several deadlines during this weekend. >>> >>> To answer your question, our current approach cannot ensure 100% >>> "reposition" correct. >>> The most challenging part is to identify code pointers in global data >>> sections, as we discussed >>> in our paper, it is quite difficult to handle even with some static >>> analysis techniques >>> (type inference, for instance). We do have some false positive, as shown >>> in the appendix of our paper [1]. >>> We will research more to eliminate the false positive. >>> >>> I believe it is doable to present a sound solution. It indeed requires >>> some additional >>> trampolines inserted in the binary code. You may refer to this paper for >>> some enlightens [2]. >>> >>> As for the disassembling challenges, we directly adopt a disassembly >>> approach proposed >>> by an excellent work [3]. You can check out their evaluation section, >>> and find that their approach >>> can correctly disassemble large-size applications without any error. My >>> experience is that Linux ELF >>> binaries are indeed easier to disassemble, and typical compilers (gcc; >>> icc; llvm) would not >>> insert data into code sections (the embedded data can trouble linear >>> disassembler a lot). >>> >>> >> I remember reading about [3] when it came out. That was a year after the >> original REINS system came out that proposed re-writing binaries, along >> with it's companion STIR. Shingled disassembly originated with Wartell et >> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently >> working on the integration of a sheering and PFSM enhanced Shingled >> Disassembler into BAP. But if you've already implemented something like >> that, what would be really valuable is if you were to review my shingled >> disassembler implementation and I review yours that way we have some cro= ss >> review feedback. >> >> Regarding the need for 100% accuracy, in the REINS and STIR papers, the >> approach taken is to obtain very very high classification accuracy, but = in >> the case that correctness cannot be established, to simply retain each >> interpretation of a byte sequence, so you are still correct in the insta= nce >> that it's code by treating it as such. Then, a companion technique is >> introduced wherein the code section is retained in order that should suc= h a >> data reference in code instance occur and interpretation was incorrect, >> such reference can read and write into the kept section. But if it's cod= e, >> it has been rewritten in the new section. Then it should remain correct = in >> any scenario. >> >> >>> However, if I am asked to work on PE binaries, then I will probably >>> start from IDA-Pro. >>> We consider the disassembling challenge is orthogonal to our research. >>> >> >> It is good to have good interoperabiblity with IDA as a guided >> disassembler and the actual new research tools. One of the most valuable >> things I can think of is to write some plugin that will mechanize data >> extraction as needed in order to accelerate manual intervention with the >> newer tools, such as in the case of training. >> >> >>> >>> IMHO, our research reveals the (important) fact that even though >>> theoretically relocation issue >>> is hard to solve with 100% accuracy, it might not be as troublesome as >>> it was assumed by previous work. >>> Simple solutions can achieve good results. >>> >> >> Agreed; there are failback stratagems. >> >> >>> >>> I hope it answers your questions, otherwise, please let me know :) >>> >>> Best, >>> Shuai >>> >>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling. >>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component >>> Extraction and Embedding for Software Security Applications >>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries. >>> >>> >>> >> There's a good utility for working with white papers and interacting with >> colleagues; mendeley. >> >> >>> >>> >>> >>> >>> >>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller < >>> kennethadammiller@gmail.com> wrote: >>> >>>> Well it's interesting that you've gone with a binary recompilation >>>> approach. How do you ensure that, statically, for any given edit, you >>>> reposition all the jump targets correctly? How do you deal with the >>>> difficulty of disassembly reducing to the halting problem? >>>> >>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang >>>> wrote: >>>> >>>>> Hi guys, >>>>> >>>>> I am glad that you are interested in our work!! >>>>> >>>>> Actually this project starts over 1.5 years ago, and I believe at that >>>>> time, BAP (version 0.7 I believe?) is still a research prototype.. >>>>> >>>>> I choose to implement from the stretch is because I want to have a >>>>> nice tool for my own research projects, also I can have an opportunity >>>>> to learn OCaml... :) >>>>> >>>>> Yes, I definitely would like to unite our efforts!! >>>>> >>>>> Best, >>>>> Shuai >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits wrote: >>>>> >>>>>> Hi Shuai, >>>>>> >>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a >>>>>> disassembler? >>>>>> >>>>>> Do you know, that we have a low-level interface to disassembling, >>>>>> like [linear_sweep][2] or even >>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on >>>>>> instruction level granularity. >>>>>> >>>>>> It will be very interesting, if we can unite our efforts. >>>>>> >>>>>> Best wishes, >>>>>> Ivan Gotovchits >>>>>> >>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap >>>>>> [2]: >>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#= VALlinear_sweep >>>>>> [3]: >>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disas= m_expert.Basic.html >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang >>>>>> wrote: >>>>>> >>>>>>> Dear List, >>>>>>> >>>>>>> I=E2=80=99m glad to announce the first release of Uroboros: an inf= rastructure >>>>>>> for reassembleable disassembling and transformation. >>>>>>> >>>>>>> You can find the code here: https://github.com/s3team/uroboros >>>>>>> You can find our research paper which describes the core technique >>>>>>> implemented in Uroboros here: >>>>>>> >>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec= 15-paper-wang-shuai.pdf >>>>>>> >>>>>>> We will provide a project home page, as well as more detailed >>>>>>> documents in the near future. Issues and pull requests welcomed. >>>>>>> >>>>>>> Happy hacking! >>>>>>> >>>>>>> Sincerely, >>>>>>> Shuai >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --001a114e650ee402a00522f47313 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I'm quite sure we are thinking of different things.

STIR is a binary randomization technique used to mitigate ro= p, and was developed in sitsu with the binary rewriting techniques. The tec= hnique of retaining the original code section is a failback to guard agains= t errors in rewriting, but to my knowledge doesn't impose a performance= penalty. Size required is a constant multiple, so I don't consider it = an adoption hurdle. But everybody has different use scenarios.

Right. Correctness is critical. I think co program proof met= hodologies with tools like coq will shine here in proofs to remove required= trust that a rewrtten binary is conformant to certain execution properties= .

I hadn't know static rewriters even existed. I presume y= ou are you talking about dynamic tools.

On Oct 25, 2015 4:49 PM, "Shuai Wang" = <wangshuai901@gmail.com>= ; wrote:
Hello Kenneth,

Yes, I agree Binary Stirring system= can eliminate symbolization false positive=C2=A0as well. Actually I believ= e=C2=A0
many research work, and =C2=A0tools (DynInst, for instanc= e) have implemented this a so-called "replica-based"=C2=A0
<= div>binary instrumentation framework. This is a quite robust way to instrum= ent=C2=A0binary code, although size expansion and=C2=A0
performan= ce penalty cannot be ignored in the instrumentation outputs. =C2=A0

However, I found those solutions are all quite complex, a= nd difficult to understand. And it might not be inaccurate=C2=A0
= to assume "aggressive" instrumentation methods can break the func= tionality due to the limitation of design,=C2=A0
or challenges in= bug-less implementation. I even found that some the-state-of-the-art binar= y instrumentation tools=C2=A0
cannot preserve the correct functio= nality when employing them to instrument some SPEC2006 test cases.=C2=A0

I personally would like to find some cleaner solutio= ns, which can introduce very little overhead in terms of binary=C2=A0
=
size and execution. Besides, some research work reveals that binary se= curity applications built on top of previous=C2=A0
instrumentatio= n framework do leave certain exploitable vulnerabilities due to the design = limitations.=C2=A0

Sincerely,
Shuai






On Sun, Oc= t 25, 2015 at 3:25 PM, Kenneth Adam Miller <kennethadammiller@gm= ail.com> wrote:
Replied inline

On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <= wangshuai901@gm= ail.com> wrote:
Hello Kenneth,

Sorry for the late reply. I have se= veral deadlines during this weekend.=C2=A0

To answ= er your question, our current approach cannot ensure 100% "reposition&= quot; correct.=C2=A0
The most challenging part is to identify cod= e pointers in global data sections, as we discussed=C2=A0
in our = paper, it is quite difficult to handle even with some static analysis techn= iques=C2=A0
(type inference, for instance). We do have some false= positive, as shown in the appendix of our paper [1].=C2=A0
We wi= ll research more to eliminate the false positive.=C2=A0

I believe it is doable to present a sound solution. It indeed require= s some additional
trampolines inserted in the binary code. You ma= y refer to this paper for some enlightens [2].=C2=A0
=C2=A0
=
As for the disassembling challenges, we directly adopt a disassembly a= pproach proposed=C2=A0
by an excellent work [3]. You can chec= k out their evaluation section, and find that their approach=C2=A0
can correctly disassemble large-size applications without any error. My e= xperience is that Linux ELF=C2=A0
binaries are indeed easier to d= isassemble, and typical compilers (gcc; icc; llvm) would not=C2=A0
insert data into code sections (the embedded data can trouble linear disa= ssembler a lot).=C2=A0


I remember reading about [3] when it came out. That was a yea= r after the original REINS system came out that proposed re-writing binarie= s, along with it's companion STIR. Shingled disassembly originated with= Wartell et al.'s seminal Distinguishing Code and Data PhD thesis. I= 9;m currently working on the integration of a sheering and PFSM enhanced Sh= ingled Disassembler into BAP. But if you've already implemented somethi= ng like that, what would be really valuable is if you were to review my shi= ngled disassembler implementation and I review yours that way we have some = cross review feedback.

Regarding the need for 100%= accuracy, in the REINS and STIR papers, the approach taken is to obtain ve= ry very high classification accuracy, but in the case that correctness cann= ot be established, to simply retain each interpretation of a byte sequence,= so you are still correct in the instance that it's code by treating it= as such. Then, a companion technique is introduced wherein the code sectio= n is retained in order that should such a data reference in code instance o= ccur and interpretation was incorrect, such reference can read and write in= to the kept section. But if it's code, it has been rewritten in the new= section. Then it should remain correct in any scenario.
= =C2=A0
However, if I am asked to work on PE binaries, then I will probably start = from IDA-Pro.=C2=A0
We consider the disassembling challenge is or= thogonal to our research.=C2=A0

It is good to have good interoperabiblity with IDA as a guided disa= ssembler and the actual new research tools. One of the most valuable things= I can think of is to write some plugin that will mechanize data extraction= as needed in order to accelerate manual intervention with the newer tools,= such as in the case of training.
=C2=A0

IMHO, our research r= eveals the (important) fact that even though theoretically relocation issue= =C2=A0
is hard to solve with 100% accuracy, it might not be as tr= oublesome as it was assumed by previous work.
Simple solutions ca= n achieve good results.=C2=A0

Agreed; there are failback stratagems.
=C2=A0

I hope it= answers your questions, otherwise, please let me know :)=C2=A0
<= br>
Best,
Shuai

[1] Shuai Wang= , Pei Wang, Dinghao Wu, Reassembleable Disassembling.
[2]=C2=A0Zhui Deng, Xiangyu Zha= ng, Dongyan Xu, BISTRO: Binary Component Ext= raction and Embedding for Software Security Applications
[= 3]=C2=A0Mingwei Zhan= g, Sekar, R, Control Flow Integrity for COTS= Binaries.



There's a good utility for working with white p= apers and interacting with colleagues; mendeley.
= =C2=A0





On Fri, Oct 23, 2015= at 6:31 PM, Kenneth Adam Miller <kennethadammiller@gmail.com> wrote:
Well= it's interesting that you've gone with a binary recompilation appr= oach. How do you ensure that, statically, for any given edit, you repositio= n all the jump targets correctly? How do you deal with the difficulty of di= sassembly reducing to the halting problem?

On Fri, Oct 23, 2015 at 4:59 PM, S= huai Wang <wangshuai901@gmail.com> wrote:
Hi guys,

I am glad= that you are interested in our work!!=C2=A0

Actua= lly this project starts over 1.5 years ago, and I believe at that time, BAP= (version 0.7 I believe?) is still a research prototype..

I choose to implement from the stretch=C2=A0is because I want to ha= ve a nice tool for my own research projects, also I can have an opportunity=
to learn OCaml... :)

Yes, I definitely = would like to unite our efforts!!=C2=A0

Best,
Shuai




On Fri, Oct 23, 20= 15 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
Hi Shuai,

<= /div>
Nice work! But I'm curious, why didn't you use [bap][1] a= s a disassembler?=C2=A0

Do you know, that we have = a low-level interface to disassembling, like [linear_sweep][2] or even
lower [Disasm_expert.Basic][3] interface, that can disassemble on ins= truction level granularity.

It will be very intere= sting, if we can unite our efforts.


On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang &= lt;wangshuai901= @gmail.com> wrote:
Dear List,

I=E2=80=99m glad to=C2=A0announce=C2=A0t= he first release of Uroboros: =C2=A0an infrastructure for reassemb= leable disassembling and transformation.

You c= an find the code here:=C2=A0https://github.com/s3team/uroboros=C2=A0
You can find our research paper which = describes the core technique implemented in Uroboros here:=C2=A0

We will provide a project home page, as well as more detailed = documents in the near future. =C2=A0Issues and pull requests welcomed.

Happy hacking!

Sincerely,
Shuai






--001a114e650ee402a00522f47313--