Taint-Based Return Oriented Programming
This is an implementation of the taint-based gadget management approach presented at SSTIC (paper, slides, talk, in french) and REcon in 2018.
This is still a PoC, and the code is still ugly as can be attested by the various badges at the top of this README.
There are roughly two kinds of tools for return oriented programming (ROP): syntactic tools that return the disassembly of gadgets and sometimes perform template based automatic chaining, and symbolic tools that compute a symbolic representation of the output state for each gadget and allow more powerful manipulations. The former are very fast but only allow regex queries, the latter allow symbolic queries but are much slower.
Taint-based ROP is an alternative approach, faster than symbolic tools and allowing more expressive queries than syntactic tools. TBrop uses a coarse semantic of instructions. Instead of a precise symbolic I/O relationship, it only relies on a dependency matrix reflecting how a taint would be propagated by a given gadget.
As an example, from the following gadget:
xchg rax, rbx
xor rcx, rcx
add rcx, rax
jmp rcxTBrop compute the following (sparse boolean) matrix:
| ... | rax | rbx | rcx | ... | |
|---|---|---|---|---|---|
| ... | |||||
| rax | . | <┘ | . | ||
| rbx | <┘ | . | . | ||
| rcx | . | <┘ | . | ||
| ... |
which reads as rax is influenced by rbx, rbx is influenced by rax, and rcx is influenced by rbx. The matrix also contains indices corresponding to rip (chain condition), some of the stack cells, and dereferencing among others.
To build with docker:
sudo docker build -t tbrop .
Otherwise, just pip install the dependencies:
- capstone to disassemble opcodes;
- lief to load various executable formats;
- numpy and scipy for sparse boolean matrices;
- ipython if you want to use the TBrop script, as opposed to just using TBrop as a lib.
To analyse /FULL/LOCAL/PATH/FILE:
sudo docker run --rm -it -v /FULL/LOCAL/PATH/FILE:/app/FILE:ro tbrop /app/FILE
It should (eventually) bring you to an ipython shell where you can do stuff like:
for g in gdgtCollection.gadgets:
if g.gadgetMatrix.matrix[X86_REG_RSP,X86_REG_RAX] \
and g.gadgetMatrix.chainCond[0,X86_REG_RCX]:
print(hex(g.getAddress()),g)All the gadgets are in the gadgets attribute of the gdgtCollection object (some refactoring is needed...). Each gadget as a gadgetMatrix attribute that can be queried in the following way:
g.gadgetMatrix.matrix[X86_REG_RSP,X86_REG_RAX]isTrueif and only if there is a dependency fromX86_REG_RAXbefore the execution of the gadget toX86_REG_RSPafter its executiong.gadgetMatrix.chainCond[0,X86_REG_RCX]isTrueif and only if there is a dependency fromX86_REG_RCXbefore the execution of the gadget toripafter its execution. In other word: if you controlrcxbefore the execution of the gadget, you might be able to control its destination and chain it with other gadgets or code.
Thus, the previous snippet of code will print all gadgets that overwrite rsp to a value influenced by rax and jump to an address influenced by rcx.
Since we import x86_const, all the X86_REG_* can be directly used as indices to gadgetMatrix.matrix and gadgetMatrix.chainCond. Other indices can be used:
deref: as an exampleg.gadgetMatrix.matrix[deref,X86_REG_RAX]selects gadgets for which a value influenced byraxis dereferenced;memR: as an exampleg.gadgetMatrix.matrix[X86_REG_RBX, memR]selects gadgets that modify the value ofrbxwith something taken from memory (together withg.gadgetMatrix.matrix[deref,X86_REG_RAX]andg.gadgetMatrix.matrix[X86_REG_RBX,X86_REG_RAX], it would select gadgets that modify the value ofrbxwith something pointed to byrax);stackTop + <integer>: as an exampleg.gadgetMatrix.matrix[X86_REG_RCX, stackTop+2]selects gadgets that modify the value ofrcxwith something influenced by the second cell of the stack (i.e.[rsp+0X10]). Just make sure thatstackTop+xis betweensFandsLas the size of the pseudo-stack is limited. Negative values ofxrefer to cells out of the stack.
When you feel desperate. Seriously, start with rp++ or ROPGenerator.
If you cannot grep your way out of a huge gadgets listing, or cannot express your constraints, or load your target binary, with a semantic tool, then you might want to give TBrop a try.
As an example, if you control rip and the buffer pointed to by rsp+8, TBrop might help you find a stack pivot (along with a few silly suggestions):
mov rcx, qword ptr [rsp + 8]; # rcx now points to the buffer you control
mov byte ptr [rsp], dl;
mov rax, qword ptr [rcx]; # thus you control rax
mov rdi, rcx;
call qword ptr [rax + 0x48]; # and can jump wherever you want, while rcx points to your buffer
mov rsp, rcx;
retAnother use case is if you find yourself working with an exotic architecture and do not want to encode the whole precise semantic of instructions, you can try to retrieve the taint propagation rules of instructions (with TaintInduce or another approach) and implement the corresponding architecture for TBrop (good luck with that).
Originally developped by @iNod3 and @clslgrnc at @DGA-MI-SSI.