I'm quite sure we are thinking of different things.

STIR is a binary randomization technique used to mitigate rop, and was
developed in sitsu with the binary rewriting techniques. The technique of
retaining the original code section is a failback to guard against errors
in rewriting, but to my knowledge doesn't impose a performance penalty.
Size required is a constant multiple, so I don't consider it an adoption
hurdle. But everybody has different use scenarios.

Right. Correctness is critical. I think co program proof methodologies with
tools like coq will shine here in proofs to remove required trust that a
rewrtten binary is conformant to certain execution properties.

I hadn't know static rewriters even existed. I presume you are you talking
about dynamic tools.
On Oct 25, 2015 4:49 PM, "Shuai Wang" <wangshuai901@gmail.com> wrote:

> Hello Kenneth,
>
> Yes, I agree Binary Stirring system can eliminate symbolization false
> positive as well. Actually I believe
> many research work, and  tools (DynInst, for instance) have implemented
> this a so-called "replica-based"
> binary instrumentation framework. This is a quite robust way to
> instrument binary code, although size expansion and
> performance penalty cannot be ignored in the instrumentation outputs.
>
> However, I found those solutions are all quite complex, and difficult to
> understand. And it might not be inaccurate
> to assume "aggressive" instrumentation methods can break the functionality
> due to the limitation of design,
> or challenges in bug-less implementation. I even found that some
> the-state-of-the-art binary instrumentation tools
> cannot preserve the correct functionality when employing them to
> instrument some SPEC2006 test cases.
>
> I personally would like to find some cleaner solutions, which can
> introduce very little overhead in terms of binary
> size and execution. Besides, some research work reveals that binary
> security applications built on top of previous
> instrumentation framework do leave certain exploitable vulnerabilities due
> to the design limitations.
>
> Sincerely,
> Shuai
>
>
>
>
>
>
> On Sun, Oct 25, 2015 at 3:25 PM, Kenneth Adam Miller <
> kennethadammiller@gmail.com> wrote:
>
>> Replied inline
>>
>> On Sun, Oct 25, 2015 at 3:04 PM, Shuai Wang <wangshuai901@gmail.com>
>> wrote:
>>
>>> Hello Kenneth,
>>>
>>> Sorry for the late reply. I have several deadlines during this weekend.
>>>
>>> To answer your question, our current approach cannot ensure 100%
>>> "reposition" correct.
>>> The most challenging part is to identify code pointers in global data
>>> sections, as we discussed
>>> in our paper, it is quite difficult to handle even with some static
>>> analysis techniques
>>> (type inference, for instance). We do have some false positive, as shown
>>> in the appendix of our paper [1].
>>> We will research more to eliminate the false positive.
>>>
>>> I believe it is doable to present a sound solution. It indeed requires
>>> some additional
>>> trampolines inserted in the binary code. You may refer to this paper for
>>> some enlightens [2].
>>>
>>> As for the disassembling challenges, we directly adopt a disassembly
>>> approach proposed
>>> by an excellent work [3]. You can check out their evaluation section,
>>> and find that their approach
>>> can correctly disassemble large-size applications without any error. My
>>> experience is that Linux ELF
>>> binaries are indeed easier to disassemble, and typical compilers (gcc;
>>> icc; llvm) would not
>>> insert data into code sections (the embedded data can trouble linear
>>> disassembler a lot).
>>>
>>>
>> I remember reading about [3] when it came out. That was a year after the
>> original REINS system came out that proposed re-writing binaries, along
>> with it's companion STIR. Shingled disassembly originated with Wartell et
>> al.'s seminal Distinguishing Code and Data PhD thesis. I'm currently
>> working on the integration of a sheering and PFSM enhanced Shingled
>> Disassembler into BAP. But if you've already implemented something like
>> that, what would be really valuable is if you were to review my shingled
>> disassembler implementation and I review yours that way we have some cross
>> review feedback.
>>
>> Regarding the need for 100% accuracy, in the REINS and STIR papers, the
>> approach taken is to obtain very very high classification accuracy, but in
>> the case that correctness cannot be established, to simply retain each
>> interpretation of a byte sequence, so you are still correct in the instance
>> that it's code by treating it as such. Then, a companion technique is
>> introduced wherein the code section is retained in order that should such a
>> data reference in code instance occur and interpretation was incorrect,
>> such reference can read and write into the kept section. But if it's code,
>> it has been rewritten in the new section. Then it should remain correct in
>> any scenario.
>>
>>
>>> However, if I am asked to work on PE binaries, then I will probably
>>> start from IDA-Pro.
>>> We consider the disassembling challenge is orthogonal to our research.
>>>
>>
>> It is good to have good interoperabiblity with IDA as a guided
>> disassembler and the actual new research tools. One of the most valuable
>> things I can think of is to write some plugin that will mechanize data
>> extraction as needed in order to accelerate manual intervention with the
>> newer tools, such as in the case of training.
>>
>>
>>>
>>> IMHO, our research reveals the (important) fact that even though
>>> theoretically relocation issue
>>> is hard to solve with 100% accuracy, it might not be as troublesome as
>>> it was assumed by previous work.
>>> Simple solutions can achieve good results.
>>>
>>
>> Agreed; there are failback stratagems.
>>
>>
>>>
>>> I hope it answers your questions, otherwise, please let me know :)
>>>
>>> Best,
>>> Shuai
>>>
>>> [1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
>>> [2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component
>>> Extraction and Embedding for Software Security Applications
>>> [3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.
>>>
>>>
>>>
>> There's a good utility for working with white papers and interacting with
>> colleagues; mendeley.
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 6:31 PM, Kenneth Adam Miller <
>>> kennethadammiller@gmail.com> wrote:
>>>
>>>> Well it's interesting that you've gone with a binary recompilation
>>>> approach. How do you ensure that, statically, for any given edit, you
>>>> reposition all the jump targets correctly? How do you deal with the
>>>> difficulty of disassembly reducing to the halting problem?
>>>>
>>>> On Fri, Oct 23, 2015 at 4:59 PM, Shuai Wang <wangshuai901@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I am glad that you are interested in our work!!
>>>>>
>>>>> Actually this project starts over 1.5 years ago, and I believe at that
>>>>> time, BAP (version 0.7 I believe?) is still a research prototype..
>>>>>
>>>>> I choose to implement from the stretch is because I want to have a
>>>>> nice tool for my own research projects, also I can have an opportunity
>>>>> to learn OCaml... :)
>>>>>
>>>>> Yes, I definitely would like to unite our efforts!!
>>>>>
>>>>> Best,
>>>>> Shuai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 23, 2015 at 1:30 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>>>>
>>>>>> Hi Shuai,
>>>>>>
>>>>>> Nice work! But I'm curious, why didn't you use [bap][1] as a
>>>>>> disassembler?
>>>>>>
>>>>>> Do you know, that we have a low-level interface to disassembling,
>>>>>> like [linear_sweep][2] or even
>>>>>> lower [Disasm_expert.Basic][3] interface, that can disassemble on
>>>>>> instruction level granularity.
>>>>>>
>>>>>> It will be very interesting, if we can unite our efforts.
>>>>>>
>>>>>> Best wishes,
>>>>>> Ivan Gotovchits
>>>>>>
>>>>>> [1]: https://github.com/BinaryAnalysisPlatform/bap
>>>>>> [2]:
>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.html#VALlinear_sweep
>>>>>> [3]:
>>>>>> http://binaryanalysisplatform.github.io/bap/api/master/Bap.Std.Disasm_expert.Basic.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 1:05 PM, Shuai Wang <wangshuai901@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear List,
>>>>>>>
>>>>>>> I’m glad to announce the first release of Uroboros:  an infrastructure
>>>>>>> for reassembleable disassembling and transformation.
>>>>>>>
>>>>>>> You can find the code here: https://github.com/s3team/uroboros
>>>>>>> You can find our research paper which describes the core technique
>>>>>>> implemented in Uroboros here:
>>>>>>>
>>>>>>> https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-wang-shuai.pdf
>>>>>>>
>>>>>>> We will provide a project home page, as well as more detailed
>>>>>>> documents in the near future.  Issues and pull requests welcomed.
>>>>>>>
>>>>>>> Happy hacking!
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Shuai
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>