No, that's right, but I was making a distinguishment in the two techniques, and I originally only mentioned stir because that and its companion papers is where the technique of retaining the original code region made its debut. Thanks I hadn't heard about dyn inst. On Oct 25, 2015 7:11 PM, "Shuai Wang" wrote: > I though "STIR" refers to Binary Stirring [1]. > > Leveraging coq to verify the instrumentation correctness sounds > very interesting to me, although I am not aware of any existing related > work. > Do you any existing work? > > I think there do exist some static rewriter, such as DynInst. > > [1] Binary Stirring: http://www.utdallas.edu/~hamlen/wartell12ccs.pdf > [2] DynInst : http://www.dyninst.org/ > > > > On Sun, Oct 25, 2015 at 5:23 PM, Kenneth Adam Miller < > kennethadammiller@gmail.com> wrote: > >> I'm quite sure we are thinking of different things. >> >> STIR is a binary randomization technique used to mitigate rop, and was >> developed in sitsu with the binary rewriting techniques. The technique of >> retaining the original code section is a failback to guard against errors >> in rewriting, but to my knowledge doesn't impose a performance penalty. >> Size required is a constant multiple, so I don't consider it an adoption >> hurdle. But everybody has different use scenarios. >> >> Right. Correctness is critical. I think co program proof methodologies >> with tools like coq will shine here in proofs to remove required trust that >> a rewrtten binary is conformant to certain execution properties. >> >> I hadn't know static rewriters even existed. I presume you are you >> talking about dynamic tools. >> On Oct 25, 2015 4:49 PM, "Shuai Wang" wrote: >> >>> Hello Kenneth, >>> >>> Yes, I agree Binary Stirring system can eliminate symbolization false >>> positive as well. Actually I believe >>> many research work, and tools (DynInst, for instance) have implemented >>> this a so-called "replica-based" >>> binary instrumentation framework. This is a quite robust way to >>> instrument binary code, although size expansion and >>> performance penalty cannot be ignored in the instrumentation outputs. >>> >>> However, I found those solutions are all quite complex, and difficult to >>> understand. And it might not be inaccurate >>> to assume "aggressive" instrumentation methods can break the >>> functionality due to the limitation of design, >>> or challenges in bug-less implementation. I even found that some >>> the-state-of-the-art binary instrumentation tools >>> cannot preserve the correct functionality when employing them to >>> instrument some SPEC2006 test cases. >>> >>> I personally would like to find some cleaner solutions, which can >>> introduce very little overhead in terms of binary >>> size and execution. Besides, some research work reveals that binary >>> security applications built on top of previous >>> instrumentation framework do leave certain exploitable vulnerabilities >>> due to the design limitations. >>> >>> Sincerely, >>> Shuai >>> >>> >>> >>> >>> >>> >>> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller < >>> kennethadammiller@gmail.com> wrote: >>> >>>> Replied inline >>>> >>>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang >>>> wrote: >>>> >>>>> Hello Kenneth, >>>>> >>>>> Sorry for the late reply. I have several deadlines during this >>>>> weekend. >>>>> >>>>> To answer your question, our current approach cannot ensure 100% >>>>> "reposition" correct. >>>>> The most challenging part is to identify code pointers in global data >>>>> sections, as we discussed >>>>> in our paper, it is quite difficult to handle even with some static >>>>> analysis techniques >>>>> (type inference, for instance). We do have some false positive, as >>>>> shown in the appendix of our paper [1]. >>>>> We will research more to eliminate the false positive. >>>>> >>>>> I believe it is doable to present a sound solution. It indeed requires >>>>> some additional >>>>> trampolines inserted in the binary code. You may refer to this paper >>>>> for some enlightens [2]. >>>>> >>>>> As for the disassembling challenges, we directly adopt a disassembly >>>>> approach proposed >>>>> by an excellent work [3]. You can check out their evaluation section, >>>>> and find that their approach >>>>> can correctly disassemble large-size applications without any error. >>>>> My experience is that Linux ELF >>>>> binaries are indeed easier to disassemble, and typical compilers (gcc; >>>>> icc; llvm) would not >>>>> insert data into code sections (the embedded data can trouble linear >>>>> disassembler a lot). >>>>> >>>>> >>>> I remember reading about [3] when it came out. That was a year after >>>> the original REINS system came out that proposed re-writing binaries, along >>>> with it's companion STIR. Shingled disassembly originated with Wartell et >>>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently >>>> working on the integration of a sheering and PFSM enhanced Shingled >>>> Disassembler into BAP. But if you've already implemented something like >>>> that, what would be really valuable is if you were to review my shingled >>>> disassembler implementation and I review yours that way we have some cross >>>> review feedback. >>>> >>>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the >>>> approach taken is to obtain very very high classification accuracy, but in >>>> the case that correctness cannot be established, to simply retain each >>>> interpretation of a byte sequence, so you are still correct in the instance >>>> that it's code by treating it as such. Then, a companion technique is >>>> introduced wherein the code section is retained in order that should such a >>>> data reference in code instance occur and interpretation was incorrect, >>>> such reference can read and write into the kept section. But if it's code, >>>> it has been rewritten in the new section. Then it should remain correct in >>>> any scenario. >>>> >>>> >>>>> However, if I am asked to work on PE binaries, then I will probably >>>>> start from IDA-Pro. >>>>> We consider the disassembling challenge is orthogonal to our research. >>>>> >>>> >>>> It is good to have good interoperabiblity with IDA as a guided >>>> disassembler and the actual new research tools. One of the most valuable >>>> things I can think of is to write some plugin that will mechanize data >>>> extraction as needed in order to accelerate manual intervention with the >>>> newer tools, such as in the case of training. >>>> >>>> >>>>> >>>>> IMHO, our research reveals the (important) fact that even though >>>>> theoretically relocation issue >>>>> is hard to solve with 100% accuracy, it might not be as troublesome as >>>>> it was assumed by previous work. >>>>> Simple solutions can achieve good results. >>>>> >>>> >>>> Agreed; there are failback stratagems. >>>> >>>> >>>>> >>>>> I hope it answers your questions, otherwise, please let me know :) >>>>> >>>>> Best, >>>>> Shuai >>>>> >>>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling. >>>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component >>>>> Extraction and Embedding for Software Security Applications >>>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries. >>>>> >>>>> >>>>> >>>> There's a good utility for working with white papers and interacting >>>> with colleagues; mendeley. >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller < >>>>> kennethadammiller@gmail.com> wrote: >>>>> >>>>>> Well it's interesting that you've gone with a binary recompilation >>>>>> approach. How do you ensure that, statically, for any given edit, you >>>>>> reposition all the jump targets correctly? How do you deal with the >>>>>> difficulty of disassembly reducing to the halting problem? >>>>>> >>>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang >>>>>> wrote: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I am glad that you are interested in our work!! >>>>>>> >>>>>>> Actually this project starts over 1.5 years ago, and I believe at >>>>>>> that time, BAP (version 0.7 I believe?) is still a research prototype.. >>>>>>> >>>>>>> I choose to implement from the stretch is because I want to have a >>>>>>> nice tool for my own research projects, also I can have an opportunity >>>>>>> to learn OCaml... :) >>>>>>> >>>>>>> Yes, I definitely would like to unite our efforts!! >>>>>>> >>>>>>> Best, >>>>>>> Shuai >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Shuai, >>>>>>>> >>>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a >>>>>>>> disassembler? >>>>>>>> >>>>>>>> Do you know, that we have a low-level interface to disassembling, >>>>>>>> like [linear_sweep][2] or even >>>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on >>>>>>>> instruction level granularity. >>>>>>>> >>>>>>>> It will be very interesting, if we can unite our efforts. >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> Ivan Gotovchits >>>>>>>> >>>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap >>>>>>>> [2]: >>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep >>>>>>>> [3]: >>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang >>>>>>> > wrote: >>>>>>>> >>>>>>>>> Dear List, >>>>>>>>> >>>>>>>>> I’m glad to announce the first release of Uroboros: an infrastructure >>>>>>>>> for reassembleable disassembling and transformation. >>>>>>>>> >>>>>>>>> You can find the code here: https://github.com/s3team/uroboros >>>>>>>>> You can find our research paper which describes the core technique >>>>>>>>> implemented in Uroboros here: >>>>>>>>> >>>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf >>>>>>>>> >>>>>>>>> We will provide a project home page, as well as more detailed >>>>>>>>> documents in the near future. Issues and pull requests welcomed. >>>>>>>>> >>>>>>>>> Happy hacking! >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Shuai >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >