From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id DE4E87FB5F for ; Sun, 25 Oct 2015 20:25:20 +0100 (CET) IronPort-PHdr: 9a23:popI/x/qeOeT6v9uRHKM819IXTAuvvDOBiVQ1KB92+4cTK2v8tzYMVDF4r011RmSDdids6oMotGVmp6jcFRI2YyGvnEGfc4EfD4+ouJSoTYdBtWYA1bwNv/gYn9yNs1DUFh44yPzahANS47AblHf6ke/8SQVUk2mc1Ele6KtQsb7tIee6aObw9XreQJGhT6wM/tZDS6dikHvjPQQmpZoMa0ryxHE8TNicuVSwn50dxrIx06vru/5xpNo8jxRtvQ97IYAFPyiJ+VrBYBfWQ8mLmk0rPLisxaLGRSG4HQHUngfk0sQWiDK6Rj7WtH6tS6s5cRn3yzPHsDwS70oWXyL465uADrpjCMKLXZt82zRjMFsjKtXqRekphh7zpT8b4ScNf44daTYK4BJDVFdV9pcAnQSSri3aJECWq9YZb5V Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=kennethadammiller@gmail.com; spf=Pass smtp.mailfrom=kennethadammiller@gmail.com; spf=None smtp.helo=postmaster@mail-yk0-f182.google.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of kennethadammiller@gmail.com) identity=pra; client-ip=209.85.160.182; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="kennethadammiller@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of kennethadammiller@gmail.com designates 209.85.160.182 as permitted sender) identity=mailfrom; client-ip=209.85.160.182; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="kennethadammiller@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-yk0-f182.google.com) identity=helo; client-ip=209.85.160.182; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="kennethadammiller@gmail.com"; x-sender="postmaster@mail-yk0-f182.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0AvAACmKy1Wm7agVdFUCoJpgSFgDwaELahvkCJ4AQ2BWiOCQYM5AoEQBzgUAQEBAQEBAQEQAQEBAQEGCwsJIS6CK4IHAQEBAgEBEhEdARsMCwYBAwELBgULDSoCAiEBAREBBQEcBhMbAQaHeAEDCggNpCSBMT4xi0mBbIJ5h3cKGScNVoQAAQEBAQYBAQEBAQEBFgEFDoRAgiqDd4EGglCBYFkEB4JpgUUFkmKDVIUcgnCDIYF1gVmEP4MkiySDX4IkEiOBFxEOAQGCRiOBeCI0AQEBAYcUAQEB X-IPAS-Result: A0AvAACmKy1Wm7agVdFUCoJpgSFgDwaELahvkCJ4AQ2BWiOCQYM5AoEQBzgUAQEBAQEBAQEQAQEBAQEGCwsJIS6CK4IHAQEBAgEBEhEdARsMCwYBAwELBgULDSoCAiEBAREBBQEcBhMbAQaHeAEDCggNpCSBMT4xi0mBbIJ5h3cKGScNVoQAAQEBAQYBAQEBAQEBFgEFDoRAgiqDd4EGglCBYFkEB4JpgUUFkmKDVIUcgnCDIYF1gVmEP4MkiySDX4IkEiOBFxEOAQGCRiOBeCI0AQEBAYcUAQEB X-IronPort-AV: E=Sophos;i="5.20,197,1444687200"; d="scan'208";a="184381632" Received: from mail-yk0-f182.google.com ([209.85.160.182]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 25 Oct 2015 20:25:19 +0100 Received: by ykdr3 with SMTP id r3so165094720ykd.1 for ; Sun, 25 Oct 2015 12:25:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=erPcpp0lr3GvI5nvqZIGNrjAXFYpIZKUs2TAdxf7pF8=; b=eooM2S1s0RpxUyObNXFD7jn89LGvZC2DhYv9qcV9AR9L5IEfe+ePMsMAcr5mbXavK+ xMH8FGqC8O4dbzYzmSgO6aOdLgt5G3xaaL6eqQyjYvIfmI00wNiplfpnk4jyXduAK5Sj 3fVLG4Yw9BZsh6FP4O3h4IbKyE0lHk5Df14BkQ1IhseZvzddMT9kTw4D8y/fmC49ucs5 lp/sAuXnOuqbPsEvhzQA6gqlBDjR74zLyNNhBRScjdq/3TwVnjnxpwyG6/PNIKVTh6C8 dTxN5gLl/Iyu/liveJAP6cK9MPsj5jweRVifj5G8Ywc9Q+ixYRLDzEMyQIjz8whZ33dK bFVQ== MIME-Version: 1.0 X-Received: by 10.129.83.195 with SMTP id h186mr22866222ywb.191.1445801118252; Sun, 25 Oct 2015 12:25:18 -0700 (PDT) Received: by 10.37.65.143 with HTTP; Sun, 25 Oct 2015 12:25:18 -0700 (PDT) In-Reply-To: References: Date: Sun, 25 Oct 2015 15:25:18 -0400 Message-ID: From: Kenneth Adam Miller To: Shuai Wang Cc: Ivan Gotovchits , caml users Content-Type: multipart/alternative; boundary=001a114d729211c6f40522f2ce89 Subject: Re: [Caml-list] [ANN] Uroboros 0.1 --001a114d729211c6f40522f2ce89 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Replied inline On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang wrote: > Hello Kenneth, > > Sorry for the late reply. I have several deadlines during this weekend. > > To answer your question, our current approach cannot ensure 100% > "reposition" correct. > The most challenging part is to identify code pointers in global data > sections, as we discussed > in our paper, it is quite difficult to handle even with some static > analysis techniques > (type inference, for instance). We do have some false positive, as shown > in the appendix of our paper [1]. > We will research more to eliminate the false positive. > > I believe it is doable to present a sound solution. It indeed requires > some additional > trampolines inserted in the binary code. You may refer to this paper for > some enlightens [2]. > > As for the disassembling challenges, we directly adopt a disassembly > approach proposed > by an excellent work [3]. You can check out their evaluation section, and > find that their approach > can correctly disassemble large-size applications without any error. My > experience is that Linux ELF > binaries are indeed easier to disassemble, and typical compilers (gcc; > icc; llvm) would not > insert data into code sections (the embedded data can trouble linear > disassembler a lot). > > I remember reading about [3] when it came out. That was a year after the original REINS system came out that proposed re-writing binaries, along with it's companion STIR. Shingled disassembly originated with Wartell et al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently working on the integration of a sheering and PFSM enhanced Shingled Disassembler into BAP. But if you've already implemented something like that, what would be really valuable is if you were to review my shingled disassembler implementation and I review yours that way we have some cross review feedback. Regarding the need for 100% accuracy, in the REINS and STIR papers, the approach taken is to obtain very very high classification accuracy, but in the case that correctness cannot be established, to simply retain each interpretation of a byte sequence, so you are still correct in the instance that it's code by treating it as such. Then, a companion technique is introduced wherein the code section is retained in order that should such a data reference in code instance occur and interpretation was incorrect, such reference can read and write into the kept section. But if it's code, it has been rewritten in the new section. Then it should remain correct in any scenario. > However, if I am asked to work on PE binaries, then I will probably start > from IDA-Pro. > We consider the disassembling challenge is orthogonal to our research. > It is good to have good interoperabiblity with IDA as a guided disassembler and the actual new research tools. One of the most valuable things I can think of is to write some plugin that will mechanize data extraction as needed in order to accelerate manual intervention with the newer tools, such as in the case of training. > > IMHO, our research reveals the (important) fact that even though > theoretically relocation issue > is hard to solve with 100% accuracy, it might not be as troublesome as it > was assumed by previous work. > Simple solutions can achieve good results. > Agreed; there are failback stratagems. > > I hope it answers your questions, otherwise, please let me know :) > > Best, > Shuai > > [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling. > [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component > Extraction and Embedding for Software Security Applications > [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries. > > > There's a good utility for working with white papers and interacting with colleagues; mendeley. > > > > > > On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller < > kennethadammiller@gmail.com> wrote: > >> Well it's interesting that you've gone with a binary recompilation >> approach. How do you ensure that, statically, for any given edit, you >> reposition all the jump targets correctly? How do you deal with the >> difficulty of disassembly reducing to the halting problem? >> >> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang >> wrote: >> >>> Hi guys, >>> >>> I am glad that you are interested in our work!! >>> >>> Actually this project starts over 1.5 years ago, and I believe at that >>> time, BAP (version 0.7 I believe?) is still a research prototype.. >>> >>> I choose to implement from the stretch is because I want to have a nice >>> tool for my own research projects, also I can have an opportunity >>> to learn OCaml... :) >>> >>> Yes, I definitely would like to unite our efforts!! >>> >>> Best, >>> Shuai >>> >>> >>> >>> >>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits wrote: >>> >>>> Hi Shuai, >>>> >>>> Nice work! But I'm curious, why didn't you use [bap][1] as a >>>> disassembler? >>>> >>>> Do you know, that we have a low-level interface to disassembling, like >>>> [linear_sweep][2] or even >>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on >>>> instruction level granularity. >>>> >>>> It will be very interesting, if we can unite our efforts. >>>> >>>> Best wishes, >>>> Ivan Gotovchits >>>> >>>> [1]: https://github.com/BinaryAnalysisPlatform/bap >>>> [2]: >>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VA= Llinear_sweep >>>> [3]: >>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_= expert.Basic.html >>>> >>>> >>>> >>>> >>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang >>>> wrote: >>>> >>>>> Dear List, >>>>> >>>>> I=E2=80=99m glad to announce the first release of Uroboros: an infra= structure >>>>> for reassembleable disassembling and transformation. >>>>> >>>>> You can find the code here: https://github.com/s3team/uroboros >>>>> You can find our research paper which describes the core technique >>>>> implemented in Uroboros here: >>>>> >>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15= -paper-wang-shuai.pdf >>>>> >>>>> We will provide a project home page, as well as more detailed >>>>> documents in the near future. Issues and pull requests welcomed. >>>>> >>>>> Happy hacking! >>>>> >>>>> Sincerely, >>>>> Shuai >>>>> >>>> >>>> >>> >> > --001a114d729211c6f40522f2ce89 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Replied inline

On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai9= 01@gmail.com> wrote:
Hello Kenneth,

Sorry for the late reply. I ha= ve several deadlines during this weekend.=C2=A0

To= answer your question, our current approach cannot ensure 100% "reposi= tion" correct.=C2=A0
The most challenging part is to identif= y code pointers in global data sections, as we discussed=C2=A0
in= our paper, it is quite difficult to handle even with some static analysis = techniques=C2=A0
(type inference, for instance). We do have some = false positive, as shown in the appendix of our paper [1].=C2=A0
= We will research more to eliminate the false positive.=C2=A0

=
I believe it is doable to present a sound solution. It indeed re= quires some additional
trampolines inserted in the binary code. Y= ou may refer to this paper for some enlightens [2].=C2=A0
=C2=A0<= /div>
As for the disassembling challenges, we directly adopt a disassem= bly approach proposed=C2=A0
by an excellent work [3]. You can= check out their evaluation section, and find that their approach=C2=A0
can correctly disassemble large-size applications without any error.= My experience is that Linux ELF=C2=A0
binaries are indeed easier= to disassemble, and typical compilers (gcc; icc; llvm) would not=C2=A0
insert data into code sections (the embedded data can trouble linear= disassembler a lot).=C2=A0

I remember reading about [3] when it came out. That was a year = after the original REINS system came out that proposed re-writing binaries,= along with it's companion STIR. Shingled disassembly originated with W= artell et al.'s seminal Distinguishing Code and Data PhD thesis. I'= m currently working on the integration of a sheering and PFSM enhanced Shin= gled Disassembler into BAP. But if you've already implemented something= like that, what would be really valuable is if you were to review my shing= led disassembler implementation and I review yours that way we have some cr= oss review feedback.

Regarding the need for 100% a= ccuracy, in the REINS and STIR papers, the approach taken is to obtain very= very high classification accuracy, but in the case that correctness cannot= be established, to simply retain each interpretation of a byte sequence, s= o you are still correct in the instance that it's code by treating it a= s such. Then, a companion technique is introduced wherein the code section = is retained in order that should such a data reference in code instance occ= ur and interpretation was incorrect, such reference can read and write into= the kept section. But if it's code, it has been rewritten in the new s= ection. Then it should remain correct in any scenario.
=C2=A0
However, = if I am asked to work on PE binaries, then I will probably start from IDA-P= ro.=C2=A0
We consider the disassembling challenge is orthogonal t= o our research.=C2=A0

It is goo= d to have good interoperabiblity with IDA as a guided disassembler and the = actual new research tools. One of the most valuable things I can think of i= s to write some plugin that will mechanize data extraction as needed in ord= er to accelerate manual intervention with the newer tools, such as in the c= ase of training.
=C2=A0

IMHO, our research reveals the (important) = fact that even though theoretically relocation issue=C2=A0
is har= d to solve with 100% accuracy, it might not be as troublesome as it was ass= umed by previous work.
Simple solutions can achieve good results.= =C2=A0

Agreed; there are failba= ck stratagems.
=C2=A0

I hope it answers your questions, otherwise, = please let me know :)=C2=A0

Best,
Shuai<= /div>

[1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleab= le Disassembling.
[2]=C2=A0Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component Extraction and Embedding for Software= Security Applications
[3]=C2=A0Mi= ngwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.



There's a goo= d utility for working with white papers and interacting with colleagues; me= ndeley.
=C2=A0




<= /div>

On Fri, Oct 23, 2015 at 6:31 PM, Kenneth = Adam Miller <kennethadammiller@gmail.com> wrote:
Well it's interesting= that you've gone with a binary recompilation approach. How do you ensu= re that, statically, for any given edit, you reposition all the jump target= s correctly? How do you deal with the difficulty of disassembly reducing to= the halting problem?

On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wa= ngshuai901@gmail.com> wrote:
Hi guys,

I am glad that you are inter= ested in our work!!=C2=A0

Actually this project st= arts over 1.5 years ago, and I believe at that time, BAP (version 0.7 I bel= ieve?) is still a research prototype..

I choose to= implement from the stretch=C2=A0is because I want to have a nice tool for = my own research projects, also I can have an opportunity
to learn= OCaml... :)

Yes, I definitely would like to unite= our efforts!!=C2=A0

Best,
Shuai




On Fri, Oct 23, 2015 at 1:30 PM, Ivan= Gotovchits <ivg@ieee.org> wrote:
Hi Shuai,

Nice work= ! But I'm curious, why didn't you use [bap][1] as a disassembler?= =C2=A0

Do you know, that we have a low-level inter= face to disassembling, like [linear_sweep][2] or even
lower [Disa= sm_expert.Basic][3] interface, that can disassemble on instruction level gr= anularity.

It will be very interesting, if we can = unite our efforts.

Best wishes,
Ivan Got= ovchits





On Fr= i, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>= ; wrote:
Dear List,

<= /div>I=E2=80=99m glad to=C2=A0announce=C2=A0the first release = of Uroboros: =C2=A0an infrastructure for reassembleable disassembl= ing and transformation.
<= br>
You can find the code = here:=C2=A0https://github.com/s3team/uroboros=C2=A0
You can find our research paper which describes the cor= e technique implemented in Uroboros here:=C2=A0

We w= ill provide a project home page, as well as more detailed documents in the = near future. =C2=A0Issues and = pull requests welcomed.

Happy hacking!

Sincerely,
Shuai





--001a114d729211c6f40522f2ce89--