From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 641CF7FB5F for ; Sun, 25 Oct 2015 21:49:28 +0100 (CET) IronPort-PHdr: 9a23:NI9eaRQwQnl/NLb48HT9/uy2J9psv+yvbD5Q0YIujvd0So/mwa64bBCN2/xhgRfzUJnB7Loc0qyN4/2mATRIyK3CmU5BWaQEbwUCh8QSkl5oK+++Imq/EsTXaTcnFt9JTl5v8iLzG0FUHMHjew+a+SXqvnYsExnyfTB4Ov7yUtaLyZ/niqbqo9X6WEZhunmUWftKNhK4rAHc5IE9oLBJDeIP8CbPuWZCYO9MxGlldhq5lhf44dqsrtY4q3wD89pozcNLUL37cqIkVvQYSW1+ayFmrPHs4DrOSw2C+ntUe2kfl1JtAgzB4QuyCpT8tC33qup01CCfOMzySb0ucTun5qZvDhTvjXFUGSQ+9TTsitF5jepkqRSu70hgyojbe4GIPfsvJvr1ctYTRG4HVcFUAX8SSrigZpcCWrJSdd1TqJPw8h5X9UOz Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=wangshuai901@gmail.com; spf=Pass smtp.mailfrom=wangshuai901@gmail.com; spf=None smtp.helo=postmaster@mail-oi0-f49.google.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of wangshuai901@gmail.com) identity=pra; client-ip=209.85.218.49; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="wangshuai901@gmail.com"; x-sender="wangshuai901@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of wangshuai901@gmail.com designates 209.85.218.49 as permitted sender) identity=mailfrom; client-ip=209.85.218.49; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="wangshuai901@gmail.com"; x-sender="wangshuai901@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-oi0-f49.google.com) identity=helo; client-ip=209.85.218.49; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="wangshuai901@gmail.com"; x-sender="postmaster@mail-oi0-f49.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0DTAADQPy1WmzHaVdFUCoJpgSFgDwaELahvkCKBBoFaI4JBgzkCgRAHORMBAQEBAQEBARABAQEBAQYLCwkhLoIrggcBAQECAQESER0BGwwLBgEDAQsGBQsNKgICIQEBEQEFARwGExsBBod4AQMKCA2kEoExPjGLSYFsgnmHXAoZJw1WhAABAQEHAQEBAQEBARYBBQ6EQIYhgQaCUIFgWQQHgmmBRQWSYoNNB4UcgnCDIYF1gVmEP4MkiySDX4IkEiOBFxESAYJDI4F4IjQBAQEBhxQBAQE X-IPAS-Result: A0DTAADQPy1WmzHaVdFUCoJpgSFgDwaELahvkCKBBoFaI4JBgzkCgRAHORMBAQEBAQEBARABAQEBAQYLCwkhLoIrggcBAQECAQESER0BGwwLBgEDAQsGBQsNKgICIQEBEQEFARwGExsBBod4AQMKCA2kEoExPjGLSYFsgnmHXAoZJw1WhAABAQEHAQEBAQEBARYBBQ6EQIYhgQaCUIFgWQQHgmmBRQWSYoNNB4UcgnCDIYF1gVmEP4MkiySDX4IkEiOBFxESAYJDI4F4IjQBAQEBhxQBAQE X-IronPort-AV: E=Sophos;i="5.20,197,1444687200"; d="scan'208";a="184385268" Received: from mail-oi0-f49.google.com ([209.85.218.49]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 25 Oct 2015 21:49:26 +0100 Received: by oies66 with SMTP id s66so90472478oie.1 for ; Sun, 25 Oct 2015 13:49:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=8XVgWBFAXCGOMj31RjFIPp01GZlQ5+tLsduGGj9kXeo=; b=u6F46rph7+kkr3ZgWxtlGjWlotUTtNfr0edPrfSI6Q6qhLcFIpMruQZGYBASNFs6xF LsanY7mfo3M0MLMM16CP993ARZtJVDoAXLiPuWpI5oj07pa0d9bkk+BByq8FPc/8/hJD 47OnSpIofwoQJH87fdyIKD/Q3ttj6CAoiEpnVZYvhTeu5KmhvV8ilsbBXMuLCvKKwXFu PHbeH1l6sTG/QV0/SsXUFFs+qp4b1knKsiAv473pgcD5fvTzohzUAUOvvYkAeMs6NFNa u9kaeoP1i66G0FjrVFEC0XfcLZtdaGrAfYIIKHEsq+1SguJtk+lAYaj3iMOpnVgClzI3 r4dw== MIME-Version: 1.0 X-Received: by 10.202.241.196 with SMTP id p187mr9788277oih.51.1445806164985; Sun, 25 Oct 2015 13:49:24 -0700 (PDT) Received: by 10.202.102.77 with HTTP; Sun, 25 Oct 2015 13:49:24 -0700 (PDT) In-Reply-To: References: Date: Sun, 25 Oct 2015 16:49:24 -0400 Message-ID: From: Shuai Wang To: Kenneth Adam Miller Cc: Ivan Gotovchits , caml users Content-Type: multipart/alternative; boundary=94eb2c096d98e0cc710522f3fa07 Subject: Re: [Caml-list] [ANN] Uroboros 0.1 --94eb2c096d98e0cc710522f3fa07 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Kenneth, Yes, I agree Binary Stirring system can eliminate symbolization false positive as well. Actually I believe many research work, and tools (DynInst, for instance) have implemented this a so-called "replica-based" binary instrumentation framework. This is a quite robust way to instrument binary code, although size expansion and performance penalty cannot be ignored in the instrumentation outputs. However, I found those solutions are all quite complex, and difficult to understand. And it might not be inaccurate to assume "aggressive" instrumentation methods can break the functionality due to the limitation of design, or challenges in bug-less implementation. I even found that some the-state-of-the-art binary instrumentation tools cannot preserve the correct functionality when employing them to instrument some SPEC2006 test cases. I personally would like to find some cleaner solutions, which can introduce very little overhead in terms of binary size and execution. Besides, some research work reveals that binary security applications built on top of previous instrumentation framework do leave certain exploitable vulnerabilities due to the design limitations. Sincerely, Shuai On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller < kennethadammiller@gmail.com> wrote: > Replied inline > > On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang > wrote: > >> Hello Kenneth, >> >> Sorry for the late reply. I have several deadlines during this weekend. >> >> To answer your question, our current approach cannot ensure 100% >> "reposition" correct. >> The most challenging part is to identify code pointers in global data >> sections, as we discussed >> in our paper, it is quite difficult to handle even with some static >> analysis techniques >> (type inference, for instance). We do have some false positive, as shown >> in the appendix of our paper [1]. >> We will research more to eliminate the false positive. >> >> I believe it is doable to present a sound solution. It indeed requires >> some additional >> trampolines inserted in the binary code. You may refer to this paper for >> some enlightens [2]. >> >> As for the disassembling challenges, we directly adopt a disassembly >> approach proposed >> by an excellent work [3]. You can check out their evaluation section, and >> find that their approach >> can correctly disassemble large-size applications without any error. My >> experience is that Linux ELF >> binaries are indeed easier to disassemble, and typical compilers (gcc; >> icc; llvm) would not >> insert data into code sections (the embedded data can trouble linear >> disassembler a lot). >> >> > I remember reading about [3] when it came out. That was a year after the > original REINS system came out that proposed re-writing binaries, along > with it's companion STIR. Shingled disassembly originated with Wartell et > al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently > working on the integration of a sheering and PFSM enhanced Shingled > Disassembler into BAP. But if you've already implemented something like > that, what would be really valuable is if you were to review my shingled > disassembler implementation and I review yours that way we have some cross > review feedback. > > Regarding the need for 100% accuracy, in the REINS and STIR papers, the > approach taken is to obtain very very high classification accuracy, but in > the case that correctness cannot be established, to simply retain each > interpretation of a byte sequence, so you are still correct in the instan= ce > that it's code by treating it as such. Then, a companion technique is > introduced wherein the code section is retained in order that should such= a > data reference in code instance occur and interpretation was incorrect, > such reference can read and write into the kept section. But if it's code, > it has been rewritten in the new section. Then it should remain correct in > any scenario. > > >> However, if I am asked to work on PE binaries, then I will probably start >> from IDA-Pro. >> We consider the disassembling challenge is orthogonal to our research. >> > > It is good to have good interoperabiblity with IDA as a guided > disassembler and the actual new research tools. One of the most valuable > things I can think of is to write some plugin that will mechanize data > extraction as needed in order to accelerate manual intervention with the > newer tools, such as in the case of training. > > >> >> IMHO, our research reveals the (important) fact that even though >> theoretically relocation issue >> is hard to solve with 100% accuracy, it might not be as troublesome as it >> was assumed by previous work. >> Simple solutions can achieve good results. >> > > Agreed; there are failback stratagems. > > >> >> I hope it answers your questions, otherwise, please let me know :) >> >> Best, >> Shuai >> >> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling. >> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component >> Extraction and Embedding for Software Security Applications >> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries. >> >> >> > There's a good utility for working with white papers and interacting with > colleagues; mendeley. > > >> >> >> >> >> >> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller < >> kennethadammiller@gmail.com> wrote: >> >>> Well it's interesting that you've gone with a binary recompilation >>> approach. How do you ensure that, statically, for any given edit, you >>> reposition all the jump targets correctly? How do you deal with the >>> difficulty of disassembly reducing to the halting problem? >>> >>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang >>> wrote: >>> >>>> Hi guys, >>>> >>>> I am glad that you are interested in our work!! >>>> >>>> Actually this project starts over 1.5 years ago, and I believe at that >>>> time, BAP (version 0.7 I believe?) is still a research prototype.. >>>> >>>> I choose to implement from the stretch is because I want to have a nice >>>> tool for my own research projects, also I can have an opportunity >>>> to learn OCaml... :) >>>> >>>> Yes, I definitely would like to unite our efforts!! >>>> >>>> Best, >>>> Shuai >>>> >>>> >>>> >>>> >>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits wrote: >>>> >>>>> Hi Shuai, >>>>> >>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a >>>>> disassembler? >>>>> >>>>> Do you know, that we have a low-level interface to disassembling, like >>>>> [linear_sweep][2] or even >>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on >>>>> instruction level granularity. >>>>> >>>>> It will be very interesting, if we can unite our efforts. >>>>> >>>>> Best wishes, >>>>> Ivan Gotovchits >>>>> >>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap >>>>> [2]: >>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#V= ALlinear_sweep >>>>> [3]: >>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm= _expert.Basic.html >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang >>>>> wrote: >>>>> >>>>>> Dear List, >>>>>> >>>>>> I=E2=80=99m glad to announce the first release of Uroboros: an infr= astructure >>>>>> for reassembleable disassembling and transformation. >>>>>> >>>>>> You can find the code here: https://github.com/s3team/uroboros >>>>>> You can find our research paper which describes the core technique >>>>>> implemented in Uroboros here: >>>>>> >>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec1= 5-paper-wang-shuai.pdf >>>>>> >>>>>> We will provide a project home page, as well as more detailed >>>>>> documents in the near future. Issues and pull requests welcomed. >>>>>> >>>>>> Happy hacking! >>>>>> >>>>>> Sincerely, >>>>>> Shuai >>>>>> >>>>> >>>>> >>>> >>> >> > --94eb2c096d98e0cc710522f3fa07 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello Kenneth,

Yes, I agree Binary Stir= ring system can eliminate symbolization false positive=C2=A0as well. Actual= ly I believe=C2=A0
many research work, and =C2=A0tools (DynInst, = for instance) have implemented this a so-called "replica-based"= =C2=A0
binary instrumentation framework. This is a quite robust w= ay to instrument=C2=A0binary code, although size expansion and=C2=A0
<= div>performance penalty cannot be ignored in the instrumentation outputs. = =C2=A0

However, I found those solutions are all qu= ite complex, and difficult to understand. And it might not be inaccurate=C2= =A0
to assume "aggressive" instrumentation methods can = break the functionality due to the limitation of design,=C2=A0
or= challenges in bug-less implementation. I even found that some the-state-of= -the-art binary instrumentation tools=C2=A0
cannot preserve the c= orrect functionality when employing them to instrument some SPEC2006 test c= ases.=C2=A0

I personally would like to find some c= leaner solutions, which can introduce very little overhead in terms of bina= ry=C2=A0
size and execution. Besides, some research work reveals = that binary security applications built on top of previous=C2=A0
= instrumentation framework do leave certain exploitable vulnerabilities due = to the design limitations.=C2=A0

Sincerely,
<= div>Shuai




=


On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller = <kennet= hadammiller@gmail.com> wrote:
Replied inline

On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wa= ng <wangshuai901@gmail.com> wrote:
Hello Kenneth,

Sorry fo= r the late reply. I have several deadlines during this weekend.=C2=A0
=

To answer your question, our current approach cannot en= sure 100% "reposition" correct.=C2=A0
The most challeng= ing part is to identify code pointers in global data sections, as we discus= sed=C2=A0
in our paper, it is quite difficult to handle even with= some static analysis techniques=C2=A0
(type inference, for insta= nce). We do have some false positive, as shown in the appendix of our paper= [1].=C2=A0
We will research more to eliminate the false positive= .=C2=A0

I believe it is doable to present a sound = solution. It indeed requires some additional
trampolines inserted= in the binary code. You may refer to this paper for some enlightens [2].= =C2=A0
=C2=A0
As for the disassembling challenges, we d= irectly adopt a disassembly approach proposed=C2=A0
by an exc= ellent work [3]. You can check out their evaluation section, and find that = their approach=C2=A0
can correctly disassemble large-size applica= tions without any error. My experience is that Linux ELF=C2=A0
bi= naries are indeed easier to disassemble, and typical compilers (gcc; icc; l= lvm) would not=C2=A0
insert data into code sections (the embedded= data can trouble linear disassembler a lot).=C2=A0


I remember reading about [3] whe= n it came out. That was a year after the original REINS system came out tha= t proposed re-writing binaries, along with it's companion STIR. Shingle= d disassembly originated with Wartell et al.'s seminal Distinguishing C= ode and Data PhD thesis. I'm currently working on the integration of a = sheering and PFSM enhanced Shingled Disassembler into BAP. But if you'v= e already implemented something like that, what would be really valuable is= if you were to review my shingled disassembler implementation and I review= yours that way we have some cross review feedback.

Regarding the need for 100% accuracy, in the REINS and STIR papers, the a= pproach taken is to obtain very very high classification accuracy, but in t= he case that correctness cannot be established, to simply retain each inter= pretation of a byte sequence, so you are still correct in the instance that= it's code by treating it as such. Then, a companion technique is intro= duced wherein the code section is retained in order that should such a data= reference in code instance occur and interpretation was incorrect, such re= ference can read and write into the kept section. But if it's code, it = has been rewritten in the new section. Then it should remain correct in any= scenario.
=C2=A0
However, if I am asked to work on PE= binaries, then I will probably start from IDA-Pro.=C2=A0
We cons= ider the disassembling challenge is orthogonal to our research.=C2=A0
=

It is good to have good inter= operabiblity with IDA as a guided disassembler and the actual new research = tools. One of the most valuable things I can think of is to write some plug= in that will mechanize data extraction as needed in order to accelerate man= ual intervention with the newer tools, such as in the case of training.
=C2=A0

IMHO, our research reveals the (important) fac= t that even though theoretically relocation issue=C2=A0
is hard t= o solve with 100% accuracy, it might not be as troublesome as it was assume= d by previous work.
Simple solutions can achieve good results.=C2= =A0

Agreed; there are fa= ilback stratagems.
=C2=A0

I hope it answers your q= uestions, otherwise, please let me know :)=C2=A0

B= est,
Shuai

[1] Shuai Wang, Pei Wang, Din= ghao Wu, Reassembleable Disassembling.
[2]=C2=A0Zhui Deng, Xiangyu Zhang, Dongyan Xu,= BISTRO: Binary Component Extraction and Emb= edding for Software Security Applications
[3]=C2=A0Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.



There's a good utility for working with white papers and inter= acting with colleagues; mendeley.
=C2= =A0




On Fri, Oct 23, 2015 at= 6:31 PM, Kenneth Adam Miller <kennethadammiller@gmail.com&g= t; wrote:
Well it= 's interesting that you've gone with a binary recompilation approac= h. How do you ensure that, statically, for any given edit, you reposition a= ll the jump targets correctly? How do you deal with the difficulty of disas= sembly reducing to the halting problem?

On Fri, Oct 23, 2015 at 4:59 PM, Shua= i Wang <wangshuai901@gmail.com> wrote:
Hi guys,

I am glad th= at you are interested in our work!!=C2=A0

Actually= this project starts over 1.5 years ago, and I believe at that time, BAP (v= ersion 0.7 I believe?) is still a research prototype..

=
I choose to implement from the stretch=C2=A0is because I want to have = a nice tool for my own research projects, also I can have an opportunity
to learn OCaml... :)

Yes, I definitely wou= ld like to unite our efforts!!=C2=A0

Best,
Shuai




On Fri, Oct 23, 2015 = at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
=
Hi Shuai,

Nice work! But I'm curious, why didn't you use [bap][1] as a= disassembler?=C2=A0

Do you know, that we have a l= ow-level interface to disassembling, like [linear_sweep][2] or even
lower [Disasm_expert.Basic][3] interface, that can disassemble on instru= ction level granularity.

It will be very interesti= ng, if we can unite our efforts.


On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <= wangshuai901@gm= ail.com> wrote:
Dear List,

I=E2=80=99m glad to=C2=A0announce=C2=A0the = first release of Uroboros: =C2=A0an infrastructure for reassemblea= ble disassembling and transformation.

You can = find the code here:=C2=A0https://github.com/s3team/uroboros=C2=A0
You can find our research paper which des= cribes the core technique implemented in Uroboros here:=C2=A0

We will provide a project home page, as well as more detailed doc= uments in the near future. =C2=A0Issues and pull requests welcomed.

Happy hacking!

Sincerely,
Shuai






--94eb2c096d98e0cc710522f3fa07--