I though "STIR" refers to Binary Stirring [1].

Leveraging coq to verify the instrumentation correctness sounds
very interesting to me, although I am not aware of any existing related
work.
Do you any existing work?

I think there do exist some static rewriter, such as DynInst.

[1] Binary Stirring: http://www.utdallas.edu/~hamlen/wartell12ccs.pdf
[2] DynInst : http://www.dyninst.org/



On Sun, Oct 25, 2015 at 5:23 PM, Kenneth Adam Miller <
kennethadammiller@gmail.com> wrote:

> I'm quite sure we are thinking of different things.
>
> STIR is a binary randomization technique used to mitigate rop, and was
> developed in sitsu with the binary rewriting techniques. The technique of
> retaining the original code section is a failback to guard against errors
> in rewriting, but to my knowledge doesn't impose a performance penalty.
> Size required is a constant multiple, so I don't consider it an adoption
> hurdle. But everybody has different use scenarios.
>
> Right. Correctness is critical. I think co program proof methodologies
> with tools like coq will shine here in proofs to remove required trust that
> a rewrtten binary is conformant to certain execution properties.
>
> I hadn't know static rewriters even existed. I presume you are you talking
> about dynamic tools.
> On Oct 25, 2015 4:49 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:
>
>> Hello Kenneth,
>>
>> Yes, I agree Binary Stirring system can eliminate symbolization false
>> positive as well. Actually I believe
>> many research work, and  tools (DynInst, for instance) have implemented
>> this a so-called "replica-based"
>> binary instrumentation framework. This is a quite robust way to
>> instrument binary code, although size expansion and
>> performance penalty cannot be ignored in the instrumentation outputs.
>>
>> However, I found those solutions are all quite complex, and difficult to
>> understand. And it might not be inaccurate
>> to assume "aggressive" instrumentation methods can break the
>> functionality due to the limitation of design,
>> or challenges in bug-less implementation. I even found that some
>> the-state-of-the-art binary instrumentation tools
>> cannot preserve the correct functionality when employing them to
>> instrument some SPEC2006 test cases.
>>
>> I personally would like to find some cleaner solutions, which can
>> introduce very little overhead in terms of binary
>> size and execution. Besides, some research work reveals that binary
>> security applications built on top of previous
>> instrumentation framework do leave certain exploitable vulnerabilities
>> due to the design limitations.
>>
>> Sincerely,
>> Shuai
>>
>>
>>
>>
>>
>>
>> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
>> kennethadammiller@gmail.com> wrote:
>>
>>> Replied inline
>>>
>>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
>>> wrote:
>>>
>>>> Hello Kenneth,
>>>>
>>>> Sorry for the late reply. I have several deadlines during this weekend.
>>>>
>>>> To answer your question, our current approach cannot ensure 100%
>>>> "reposition" correct.
>>>> The most challenging part is to identify code pointers in global data
>>>> sections, as we discussed
>>>> in our paper, it is quite difficult to handle even with some static
>>>> analysis techniques
>>>> (type inference, for instance). We do have some false positive, as
>>>> shown in the appendix of our paper [1].
>>>> We will research more to eliminate the false positive.
>>>>
>>>> I believe it is doable to present a sound solution. It indeed requires
>>>> some additional
>>>> trampolines inserted in the binary code. You may refer to this paper
>>>> for some enlightens [2].
>>>>
>>>> As for the disassembling challenges, we directly adopt a disassembly
>>>> approach proposed
>>>> by an excellent work [3]. You can check out their evaluation section,
>>>> and find that their approach
>>>> can correctly disassemble large-size applications without any error. My
>>>> experience is that Linux ELF
>>>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>>>> icc; llvm) would not
>>>> insert data into code sections (the embedded data can trouble linear
>>>> disassembler a lot).
>>>>
>>>>
>>> I remember reading about [3] when it came out. That was a year after the
>>> original REINS system came out that proposed re-writing binaries, along
>>> with it's companion STIR. Shingled disassembly originated with Wartell et
>>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
>>> working on the integration of a sheering and PFSM enhanced Shingled
>>> Disassembler into BAP. But if you've already implemented something like
>>> that, what would be really valuable is if you were to review my shingled
>>> disassembler implementation and I review yours that way we have some cross
>>> review feedback.
>>>
>>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
>>> approach taken is to obtain very very high classification accuracy, but in
>>> the case that correctness cannot be established, to simply retain each
>>> interpretation of a byte sequence, so you are still correct in the instance
>>> that it's code by treating it as such. Then, a companion technique is
>>> introduced wherein the code section is retained in order that should such a
>>> data reference in code instance occur and interpretation was incorrect,
>>> such reference can read and write into the kept section. But if it's code,
>>> it has been rewritten in the new section. Then it should remain correct in
>>> any scenario.
>>>
>>>
>>>> However, if I am asked to work on PE binaries, then I will probably
>>>> start from IDA-Pro.
>>>> We consider the disassembling challenge is orthogonal to our research.
>>>>
>>>
>>> It is good to have good interoperabiblity with IDA as a guided
>>> disassembler and the actual new research tools. One of the most valuable
>>> things I can think of is to write some plugin that will mechanize data
>>> extraction as needed in order to accelerate manual intervention with the
>>> newer tools, such as in the case of training.
>>>
>>>
>>>>
>>>> IMHO, our research reveals the (important) fact that even though
>>>> theoretically relocation issue
>>>> is hard to solve with 100% accuracy, it might not be as troublesome as
>>>> it was assumed by previous work.
>>>> Simple solutions can achieve good results.
>>>>
>>>
>>> Agreed; there are failback stratagems.
>>>
>>>
>>>>
>>>> I hope it answers your questions, otherwise, please let me know :)
>>>>
>>>> Best,
>>>> Shuai
>>>>
>>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>>>> Extraction and Embedding for Software Security Applications
>>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>>>
>>>>
>>>>
>>> There's a good utility for working with white papers and interacting
>>> with colleagues; mendeley.
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>>>> kennethadammiller@gmail.com> wrote:
>>>>
>>>>> Well it's interesting that you've gone with a binary recompilation
>>>>> approach. How do you ensure that, statically, for any given edit, you
>>>>> reposition all the jump targets correctly? How do you deal with the
>>>>> difficulty of disassembly reducing to the halting problem?
>>>>>
>>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I am glad that you are interested in our work!!
>>>>>>
>>>>>> Actually this project starts over 1.5 years ago, and I believe at
>>>>>> that time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>>>
>>>>>> I choose to implement from the stretch is because I want to have a
>>>>>> nice tool for my own research projects, also I can have an opportunity
>>>>>> to learn OCaml... :)
>>>>>>
>>>>>> Yes, I definitely would like to unite our efforts!!
>>>>>>
>>>>>> Best,
>>>>>> Shuai
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Shuai,
>>>>>>>
>>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>>>> disassembler?
>>>>>>>
>>>>>>> Do you know, that we have a low-level interface to disassembling,
>>>>>>> like [linear_sweep][2] or even
>>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>>>> instruction level granularity.
>>>>>>>
>>>>>>> It will be very interesting, if we can unite our efforts.
>>>>>>>
>>>>>>> Best wishes,
>>>>>>> Ivan Gotovchits
>>>>>>>
>>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>>>> [2]:
>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>>>> [3]:
>>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear List,
>>>>>>>>
>>>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>>>> for reassembleable disassembling and transformation.
>>>>>>>>
>>>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>>>> You can find our research paper which describes the core technique
>>>>>>>> implemented in Uroboros here:
>>>>>>>>
>>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>>>
>>>>>>>> We will provide a project home page, as well as more detailed
>>>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>>>
>>>>>>>> Happy hacking!
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Shuai
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>