Re: [Caml-list] [ANN] Uroboros 0.1

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Kenneth Adam Miller <kennethadammiller@gmail.com>
To: Shuai Wang <wangshuai901@gmail.com>
Cc: caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] [ANN] Uroboros 0.1
Date: Sun, 25 Oct 2015 19:46:39 -0400	[thread overview]
Message-ID: <CAK7rcp-E-JmLVFpvz7zBL3bwGX=Q3hC7myrVbqCvc3m9P_jiSQ@mail.gmail.com> (raw)
In-Reply-To: <CAEQMQomZ09w_X=HCpu-JFe0q_aj09okh9aaVkH=hA2pUCLaLXw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10653 bytes --]

No, that's right, but I was making a distinguishment in the two techniques,
and I originally only mentioned stir because that and its companion papers
is where the technique of retaining the original code region made its
debut.

Thanks I hadn't heard about dyn inst.
On Oct 25, 2015 7:11 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:

> I though "STIR" refers to Binary Stirring [1].
>
> Leveraging coq to verify the instrumentation correctness sounds
> very interesting to me, although I am not aware of any existing related
> work.
> Do you any existing work?
>
> I think there do exist some static rewriter, such as DynInst.
>
> [1] Binary Stirring: http://www.utdallas.edu/~hamlen/wartell12ccs.pdf
> [2] DynInst : http://www.dyninst.org/
>
>
>
> On Sun, Oct 25, 2015 at 5:23 PM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
>> I'm quite sure we are thinking of different things.
>>
>> STIR is a binary randomization technique used to mitigate rop, and was
>> developed in sitsu with the binary rewriting techniques. The technique of
>> retaining the original code section is a failback to guard against errors
>> in rewriting, but to my knowledge doesn't impose a performance penalty.
>> Size required is a constant multiple, so I don't consider it an adoption
>> hurdle. But everybody has different use scenarios.
>>
>> Right. Correctness is critical. I think co program proof methodologies
>> with tools like coq will shine here in proofs to remove required trust that
>> a rewrtten binary is conformant to certain execution properties.
>>
>> I hadn't know static rewriters even existed. I presume you are you
>> talking about dynamic tools.
>> On Oct 25, 2015 4:49 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:
>>
>>> Hello Kenneth,
>>>
>>> Yes, I agree Binary Stirring system can eliminate symbolization false
>>> positive as well. Actually I believe
>>> many research work, and  tools (DynInst, for instance) have implemented
>>> this a so-called "replica-based"
>>> binary instrumentation framework. This is a quite robust way to
>>> instrument binary code, although size expansion and
>>> performance penalty cannot be ignored in the instrumentation outputs.
>>>
>>> However, I found those solutions are all quite complex, and difficult to
>>> understand. And it might not be inaccurate
>>> to assume "aggressive" instrumentation methods can break the
>>> functionality due to the limitation of design,
>>> or challenges in bug-less implementation. I even found that some
>>> the-state-of-the-art binary instrumentation tools
>>> cannot preserve the correct functionality when employing them to
>>> instrument some SPEC2006 test cases.
>>>
>>> I personally would like to find some cleaner solutions, which can
>>> introduce very little overhead in terms of binary
>>> size and execution. Besides, some research work reveals that binary
>>> security applications built on top of previous
>>> instrumentation framework do leave certain exploitable vulnerabilities
>>> due to the design limitations.
>>>
>>> Sincerely,
>>> Shuai
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
>>> kennethadammiller@gmail.com> wrote:
>>>
>>>> Replied inline
>>>>
>>>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Kenneth,
>>>>>
>>>>> Sorry for the late reply. I have several deadlines during this
>>>>> weekend.
>>>>>
>>>>> To answer your question, our current approach cannot ensure 100%
>>>>> "reposition" correct.
>>>>> The most challenging part is to identify code pointers in global data
>>>>> sections, as we discussed
>>>>> in our paper, it is quite difficult to handle even with some static
>>>>> analysis techniques
>>>>> (type inference, for instance). We do have some false positive, as
>>>>> shown in the appendix of our paper [1].
>>>>> We will research more to eliminate the false positive.
>>>>>
>>>>> I believe it is doable to present a sound solution. It indeed requires
>>>>> some additional
>>>>> trampolines inserted in the binary code. You may refer to this paper
>>>>> for some enlightens [2].
>>>>>
>>>>> As for the disassembling challenges, we directly adopt a disassembly
>>>>> approach proposed
>>>>> by an excellent work [3]. You can check out their evaluation section,
>>>>> and find that their approach
>>>>> can correctly disassemble large-size applications without any error.
>>>>> My experience is that Linux ELF
>>>>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>>>>> icc; llvm) would not
>>>>> insert data into code sections (the embedded data can trouble linear
>>>>> disassembler a lot).
>>>>>
>>>>>
>>>> I remember reading about [3] when it came out. That was a year after
>>>> the original REINS system came out that proposed re-writing binaries, along
>>>> with it's companion STIR. Shingled disassembly originated with Wartell et
>>>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
>>>> working on the integration of a sheering and PFSM enhanced Shingled
>>>> Disassembler into BAP. But if you've already implemented something like
>>>> that, what would be really valuable is if you were to review my shingled
>>>> disassembler implementation and I review yours that way we have some cross
>>>> review feedback.
>>>>
>>>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
>>>> approach taken is to obtain very very high classification accuracy, but in
>>>> the case that correctness cannot be established, to simply retain each
>>>> interpretation of a byte sequence, so you are still correct in the instance
>>>> that it's code by treating it as such. Then, a companion technique is
>>>> introduced wherein the code section is retained in order that should such a
>>>> data reference in code instance occur and interpretation was incorrect,
>>>> such reference can read and write into the kept section. But if it's code,
>>>> it has been rewritten in the new section. Then it should remain correct in
>>>> any scenario.
>>>>
>>>>
>>>>> However, if I am asked to work on PE binaries, then I will probably
>>>>> start from IDA-Pro.
>>>>> We consider the disassembling challenge is orthogonal to our research.
>>>>>
>>>>
>>>> It is good to have good interoperabiblity with IDA as a guided
>>>> disassembler and the actual new research tools. One of the most valuable
>>>> things I can think of is to write some plugin that will mechanize data
>>>> extraction as needed in order to accelerate manual intervention with the
>>>> newer tools, such as in the case of training.
>>>>
>>>>
>>>>>
>>>>> IMHO, our research reveals the (important) fact that even though
>>>>> theoretically relocation issue
>>>>> is hard to solve with 100% accuracy, it might not be as troublesome as
>>>>> it was assumed by previous work.
>>>>> Simple solutions can achieve good results.
>>>>>
>>>>
>>>> Agreed; there are failback stratagems.
>>>>
>>>>
>>>>>
>>>>> I hope it answers your questions, otherwise, please let me know :)
>>>>>
>>>>> Best,
>>>>> Shuai
>>>>>
>>>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>>>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>>>>> Extraction and Embedding for Software Security Applications
>>>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>>>>
>>>>>
>>>>>
>>>> There's a good utility for working with white papers and interacting
>>>> with colleagues; mendeley.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>>>>> kennethadammiller@gmail.com> wrote:
>>>>>
>>>>>> Well it's interesting that you've gone with a binary recompilation
>>>>>> approach. How do you ensure that, statically, for any given edit, you
>>>>>> reposition all the jump targets correctly? How do you deal with the
>>>>>> difficulty of disassembly reducing to the halting problem?
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I am glad that you are interested in our work!!
>>>>>>>
>>>>>>> Actually this project starts over 1.5 years ago, and I believe at
>>>>>>> that time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>>>>
>>>>>>> I choose to implement from the stretch is because I want to have a
>>>>>>> nice tool for my own research projects, also I can have an opportunity
>>>>>>> to learn OCaml... :)
>>>>>>>
>>>>>>> Yes, I definitely would like to unite our efforts!!
>>>>>>>
>>>>>>> Best,
>>>>>>> Shuai
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Shuai,
>>>>>>>>
>>>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>>>>> disassembler?
>>>>>>>>
>>>>>>>> Do you know, that we have a low-level interface to disassembling,
>>>>>>>> like [linear_sweep][2] or even
>>>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>>>>> instruction level granularity.
>>>>>>>>
>>>>>>>> It will be very interesting, if we can unite our efforts.
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>> Ivan Gotovchits
>>>>>>>>
>>>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>>>>> [2]:
>>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>>>>> [3]:
>>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Dear List,
>>>>>>>>>
>>>>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>>>>> for reassembleable disassembling and transformation.
>>>>>>>>>
>>>>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>>>>> You can find our research paper which describes the core technique
>>>>>>>>> implemented in Uroboros here:
>>>>>>>>>
>>>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>>>>
>>>>>>>>> We will provide a project home page, as well as more detailed
>>>>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>>>>
>>>>>>>>> Happy hacking!
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Shuai
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

[-- Attachment #2: Type: text/html, Size: 16557 bytes --]

     prev parent reply	other threads:[~2015-10-25 23:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-23 17:05 Shuai Wang
2015-10-23 17:30 ` Ivan Gotovchits
2015-10-23 17:45   ` Kenneth Adam Miller
2015-10-26 17:04     ` Eric Cooper
2015-10-26 17:05       ` Kenneth Adam Miller
2015-10-23 20:59   ` Shuai Wang
2015-10-23 22:31     ` Kenneth Adam Miller
2015-10-25 19:04       ` Shuai Wang
2015-10-25 19:25         ` Kenneth Adam Miller
2015-10-25 20:49           ` Shuai Wang
2015-10-25 21:23             ` Kenneth Adam Miller
2015-10-25 23:11               ` Shuai Wang
2015-10-25 23:46                 ` Kenneth Adam Miller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK7rcp-E-JmLVFpvz7zBL3bwGX=Q3hC7myrVbqCvc3m9P_jiSQ@mail.gmail.com' \
    --to=kennethadammiller@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=wangshuai901@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).