Hello Kenneth,
Sorry for the late reply. I have several deadlines during this weekend.
To answer your question, our current approach cannot ensure 100% "reposition" correct.
The most challenging part is to identify code pointers in global data sections, as we discussed
in our paper, it is quite difficult to handle even with some static analysis techniques
(type inference, for instance). We do have some false positive, as shown in the appendix of our paper [1].
We will research more to eliminate the false positive.
I believe it is doable to present a sound solution. It indeed requires some additional
trampolines inserted in the binary code. You may refer to this paper for some enlightens [2].
As for the disassembling challenges, we directly adopt a disassembly approach proposed
by an excellent work [3]. You can check out their evaluation section, and find that their approach
can correctly disassemble large-size applications without any error. My experience is that Linux ELF
binaries are indeed easier to disassemble, and typical compilers (gcc; icc; llvm) would not
insert data into code sections (the embedded data can trouble linear disassembler a lot).
However, if I am asked to work on PE binaries, then I will probably start from IDA-Pro.
We consider the disassembling challenge is orthogonal to our research.
IMHO, our research reveals the (important) fact that even though theoretically relocation issue
is hard to solve with 100% accuracy, it might not be as troublesome as it was assumed by previous work.
Simple solutions can achieve good results.
I hope it answers your questions, otherwise, please let me know :)
Best,
Shuai
[1] Shuai Wang, Pei Wang, Dinghao Wu, Reassembleable Disassembling.
[2] Zhui Deng, Xiangyu Zhang, Dongyan Xu, BISTRO: Binary Component Extraction and Embedding for Software Security Applications
[3] Mingwei Zhang, Sekar, R, Control Flow Integrity for COTS Binaries.