I though "STIR" refers to Binary Stirring [1]. Leveraging coq to verify the instrumentation correctness sounds very interesting to me, although I am not aware of any existing related work. Do you any existing work? I think there do exist some static rewriter, such as DynInst. [1] Binary Stirring: http://www.utdallas.edu/~hamlen/wartell12ccs.pdf [2] DynInst : http://www.dyninst.org/ On Sun, Oct 25, 2015 at 5:23 PM, Kenneth Adam Miller < kennethadammiller@gmail.com> wrote: > I'm quite sure we are thinking of different things. > > STIR is a binary randomization technique used to mitigate rop, and was > developed in sitsu with the binary rewriting techniques. The technique of > retaining the original code section is a failback to guard against errors > in rewriting, but to my knowledge doesn't impose a performance penalty. > Size required is a constant multiple, so I don't consider it an adoption > hurdle. But everybody has different use scenarios. > > Right. Correctness is critical. I think co program proof methodologies > with tools like coq will shine here in proofs to remove required trust that > a rewrtten binary is conformant to certain execution properties. > > I hadn't know static rewriters even existed. I presume you are you talking > about dynamic tools. > On Oct 25, 2015 4:49 PM, "Shuai Wang" wrote: > >> Hello Kenneth, >> >> Yes, I agree Binary Stirring system can eliminate symbolization false >> positive as well. Actually I believe >> many research work, and tools (DynInst, for instance) have implemented >> this a so-called "replica-based" >> binary instrumentation framework. This is a quite robust way to >> instrument binary code, although size expansion and >> performance penalty cannot be ignored in the instrumentation outputs. >> >> However, I found those solutions are all quite complex, and difficult to >> understand. And it might not be inaccurate >> to assume "aggressive" instrumentation methods can break the >> functionality due to the limitation of design, >> or challenges in bug-less implementation. I even found that some >> the-state-of-the-art binary instrumentation tools >> cannot preserve the correct functionality when employing them to >> instrument some SPEC2006 test cases. >> >> I personally would like to find some cleaner solutions, which can >> introduce very little overhead in terms of binary >> size and execution. Besides, some research work reveals that binary >> security applications built on top of previous >> instrumentation framework do leave certain exploitable vulnerabilities >> due to the design limitations. >> >> Sincerely, >> Shuai >> >> >> >> >> >> >> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller < >> kennethadammiller@gmail.com> wrote: >> >>> Replied inline >>> >>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang >>> wrote: >>> >>>> Hello Kenneth, >>>> >>>> Sorry for the late reply. I have several deadlines during this weekend. >>>> >>>> To answer your question, our current approach cannot ensure 100% >>>> "reposition" correct. >>>> The most challenging part is to identify code pointers in global data >>>> sections, as we discussed >>>> in our paper, it is quite difficult to handle even with some static >>>> analysis techniques >>>> (type inference, for instance). We do have some false positive, as >>>> shown in the appendix of our paper [1]. >>>> We will research more to eliminate the false positive. >>>> >>>> I believe it is doable to present a sound solution. It indeed requires >>>> some additional >>>> trampolines inserted in the binary code. You may refer to this paper >>>> for some enlightens [2]. >>>> >>>> As for the disassembling challenges, we directly adopt a disassembly >>>> approach proposed >>>> by an excellent work [3]. You can check out their evaluation section, >>>> and find that their approach >>>> can correctly disassemble large-size applications without any error. My >>>> experience is that Linux ELF >>>> binaries are indeed easier to disassemble, and typical compilers (gcc; >>>> icc; llvm) would not >>>> insert data into code sections (the embedded data can trouble linear >>>> disassembler a lot). >>>> >>>> >>> I remember reading about [3] when it came out. That was a year after the >>> original REINS system came out that proposed re-writing binaries, along >>> with it's companion STIR. Shingled disassembly originated with Wartell et >>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently >>> working on the integration of a sheering and PFSM enhanced Shingled >>> Disassembler into BAP. But if you've already implemented something like >>> that, what would be really valuable is if you were to review my shingled >>> disassembler implementation and I review yours that way we have some cross >>> review feedback. >>> >>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the >>> approach taken is to obtain very very high classification accuracy, but in >>> the case that correctness cannot be established, to simply retain each >>> interpretation of a byte sequence, so you are still correct in the instance >>> that it's code by treating it as such. Then, a companion technique is >>> introduced wherein the code section is retained in order that should such a >>> data reference in code instance occur and interpretation was incorrect, >>> such reference can read and write into the kept section. But if it's code, >>> it has been rewritten in the new section. Then it should remain correct in >>> any scenario. >>> >>> >>>> However, if I am asked to work on PE binaries, then I will probably >>>> start from IDA-Pro. >>>> We consider the disassembling challenge is orthogonal to our research. >>>> >>> >>> It is good to have good interoperabiblity with IDA as a guided >>> disassembler and the actual new research tools. One of the most valuable >>> things I can think of is to write some plugin that will mechanize data >>> extraction as needed in order to accelerate manual intervention with the >>> newer tools, such as in the case of training. >>> >>> >>>> >>>> IMHO, our research reveals the (important) fact that even though >>>> theoretically relocation issue >>>> is hard to solve with 100% accuracy, it might not be as troublesome as >>>> it was assumed by previous work. >>>> Simple solutions can achieve good results. >>>> >>> >>> Agreed; there are failback stratagems. >>> >>> >>>> >>>> I hope it answers your questions, otherwise, please let me know :) >>>> >>>> Best, >>>> Shuai >>>> >>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling. >>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component >>>> Extraction and Embedding for Software Security Applications >>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries. >>>> >>>> >>>> >>> There's a good utility for working with white papers and interacting >>> with colleagues; mendeley. >>> >>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller < >>>> kennethadammiller@gmail.com> wrote: >>>> >>>>> Well it's interesting that you've gone with a binary recompilation >>>>> approach. How do you ensure that, statically, for any given edit, you >>>>> reposition all the jump targets correctly? How do you deal with the >>>>> difficulty of disassembly reducing to the halting problem? >>>>> >>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang >>>>> wrote: >>>>> >>>>>> Hi guys, >>>>>> >>>>>> I am glad that you are interested in our work!! >>>>>> >>>>>> Actually this project starts over 1.5 years ago, and I believe at >>>>>> that time, BAP (version 0.7 I believe?) is still a research prototype.. >>>>>> >>>>>> I choose to implement from the stretch is because I want to have a >>>>>> nice tool for my own research projects, also I can have an opportunity >>>>>> to learn OCaml... :) >>>>>> >>>>>> Yes, I definitely would like to unite our efforts!! >>>>>> >>>>>> Best, >>>>>> Shuai >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits >>>>>> wrote: >>>>>> >>>>>>> Hi Shuai, >>>>>>> >>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a >>>>>>> disassembler? >>>>>>> >>>>>>> Do you know, that we have a low-level interface to disassembling, >>>>>>> like [linear_sweep][2] or even >>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on >>>>>>> instruction level granularity. >>>>>>> >>>>>>> It will be very interesting, if we can unite our efforts. >>>>>>> >>>>>>> Best wishes, >>>>>>> Ivan Gotovchits >>>>>>> >>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap >>>>>>> [2]: >>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep >>>>>>> [3]: >>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang >>>>>>> wrote: >>>>>>> >>>>>>>> Dear List, >>>>>>>> >>>>>>>> I’m glad to announce the first release of Uroboros: an infrastructure >>>>>>>> for reassembleable disassembling and transformation. >>>>>>>> >>>>>>>> You can find the code here: https://github.com/s3team/uroboros >>>>>>>> You can find our research paper which describes the core technique >>>>>>>> implemented in Uroboros here: >>>>>>>> >>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf >>>>>>>> >>>>>>>> We will provide a project home page, as well as more detailed >>>>>>>> documents in the near future. Issues and pull requests welcomed. >>>>>>>> >>>>>>>> Happy hacking! >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Shuai >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>