Simulating Slices of iOS Apps
In 2019, I constructed a work-for-hobby iOS simulator on a strict routine of weekends and low. Whereas the complete particulars of this venture will keep in-house, there’s sufficient I can share to hopefully be fascinating!
First up, right here’s what this seems like working in opposition to a easy demo app. On the fitting is the bona-fide iOS simulator from Xcode, and on the left is the simulator I constructed.
Simulating Opcodes
The guts of this simulator is a basic interpreter runloop. Every tick, the following opcode is parsed from the binary and interpreted to replace the digital machine state. Right here’s a easy opcode handler:
exec_context.py
@mnemonic_handler(["add"])
def add(exec_context: ExecContext, instr: CsInsn) -> None:
dest = AArch64Interpreter._op0_storage(exec_context, instr, Register)
offset_op = instr.operands[2]
if offset_op.sort == ARM64_OP_IMM:
offset_imm = offset_op.worth.imm
elif offset_op.sort == ARM64_OP_REG:
offset_reg = AArch64Interpreter._op2_storage(
exec_context,
instr,
expected_type=Register,
)
offset_imm = offset_reg.learn(ConstantValue).worth()
else:
elevate NotImplementedError(f'Unknown operand sort: {offset_op.sort}')
supply = AArch64Interpreter._op1_storage(
exec_context,
instr,
expected_type=Register,
)
source_val = supply.learn(ConstantValue).worth()
exec_context.set_imm(dest, source_val + offset_imm)
The ldr
and str
instruction households had been by far probably the most ugly to pin down, as they each are available in a wide range of completely different flavors and modes. The simulator must deal with a slew of load and retailer variants: fast, pre-indexed, pre-indexed writeback, and post-indexed writeback, simply to call a handful. These implementations had been the most important of the instruction handlers, and so they led to delicate bugs when the implementations had been a bit off.
test_aarch64_load_instructions.py
# Load/retailer addressing guidelines (discovered by experimentation)
# * You possibly can solely use writeback if there's an integer offset
# * Pre-index addressing mode should use an integer offset
# * You can't use pre- and post- listed addressing in the identical instruction
# * You can't write-back in pre-indexed addressing
# * You can't write-back with out an int offset
# * The vacation spot register might not be the stack pointer
# * ldp might not have a register offset in both addressing mode
#
# The exams under cowl these `ldr` variants:
# ldr x0, #0x10000
# ldr x0, [x1]
# ldr x0, [x1], #0x10
# ldr x0, [x1, #0x10]
# ldr x0, [x1, #0x10]!
# ldr x0, [x1, x2]
# ldr x0, [sp]
# ldr x0, [sp], #0x10
# ldr x0, [sp, #0x10]
# ldr x0, [sp, #0x10]!
# ldr x0, [sp, x1]
# ldr x0, [x1, x2, lsl #3]
To get real-world code to work, I wanted to implement all types of wacky opcodes interacting with floating level and SIMD registers.
exec_context.py
@mnemonic_handler(["scvtf", "ucvtf"])
def scvtf(exec_context: ExecContext, instr: CsInsn) -> None:
dest = AArch64Interpreter._op0_storage(exec_context, instr, expected_type=SIMDRegister)
supply = AArch64Interpreter._op1_storage(exec_context, instr, expected_type=Register)
source_val = supply.learn(ConstantValue).worth()
converted_val = int.from_bytes(struct.pack("d", source_val), "little")
dest.write(ConstantValue(converted_val))
VM Structure
The simulator was constructed on a basic approximation of the von Neumann structure: each piece of knowledge was a Variable
, and a Variable
is at all times held in a VariableStorage
. A CPU register is one type of VariableStorage
, and a reminiscence cell is one other.
This design was a misstep that I’d rework if I got here again to the venture: modelling every reminiscence phrase as an object’s complete storage cell precludes any wise risk of working on a buffer of bytes and modifying knowledge throughout phrase boundaries. The technique I selected, nevertheless, does work fairly properly for software code that simply shops tips to heap-allocated objects in reminiscence phrases and sends messages to them.
Image Modeling
Ultimately, the simulated code goes to department
and, worse but, department
to one thing imported from one other binary. As an alternative of constructing a full dynamic linker, I added particular assist to the bl
mnemonic handler to carry out bespoke operations when sure branches had been carried out.
For instance, if the simulator noticed {that a} bl
was being carried out to the _random
imported image, it may entice into its personal in-house random()
implementation.
modelled_functions.py
@modelled_function(["_random"])
def _random(simulator: "Simulator", instr: ObjcUnconditionalBranchInstruction) -> None:
r = random.randint(0, (2 ** 31) - 1)
simulator.current_exec_context.register("x0").write(ConstantValue(r))
Rather more fascinating, although, is _objc_msgSend
.
modelled_functions.py
@modelled_function(["_objc_msgSend"])
def _objc_msgSend(simulator: "Simulator", instr: ObjcUnconditionalBranchInstruction) -> None:
selname_ptr = simulator.current_exec_context.register("x1").learn(ConstantValue).worth()
selector_name = simulator.binary.read_string_at_address(selname_ptr)
if not selector_name:
elevate ValueError(f"Failed to seek out messaged selector {selname_ptr}")
receiver = simulator.current_exec_context.deref_reg("x0")
is_classmethod = isinstance(receiver, ObjcMetaclass)
# Extra under...
This implementation would produce pretend objects on a digital heap, as an alternative of simulating the true Goal-C runtime. I ended up with my very own itty bitty commonplace library.
objc_class.py
class NSNumber(NSObject):
CLASS_NAME = "_OBJC_CLASS_$_NSNumber"
def __init__(self, binary: MachoBinary, machine: ExecContext, selector_name: str) -> None:
tremendous().__init__(
binary,
machine,
class_name=self.CLASS_NAME,
selector_name=selector_name,
)
self.quantity: Non-obligatory[int] = None
def __eq__(self, different: Any) -> bool:
if not issubclass(sort(different), NSNumber):
return False
return self.quantity == solid(NSNumber, different).quantity
def __repr__(self) -> str:
return f"[@{self.number}]"
@classmethod
@implements_objc_class_methods([
"numberWithInt:",
"numberWithBool:",
"numberWithUnsignedInt:",
"numberWithUnsignedInteger:",
])
def number_with_int(
cls,
binary: MachoBinary,
selector_name: str,
machine: ExecContext,
) -> "NSNumber":
num = cls(binary, machine, selector_name)
val = machine.register("x2").learn(ConstantValue).worth()
# Validate that we have acquired a quantity literal and never one thing sudden
if machine.is_val_pointer(val):
elevate WrongVariableClassError(
f"+[NSNumber numberWithInt:{hex(val)}] "
f"({machine.deref_mem(VirtualMemoryPointer(val))})"
)
num.quantity = machine.register("x2").learn(ConstantValue).worth()
return num
I additionally wrote implementations of humorous constructors like +[NSDictionary dictionaryWithObjectsAndKeys:]
. It was fascinating to see how these sorts of name websites work beneath the hood!
On this case, the compiler will place the primary argument of the variadic checklist in x2
. The compiler will organize the remainder of the arguments on the stack, beginning with x2
’s corresponding key. It’s the implementation’s duty to iterate the checklist on the stack, alternately popping off values and keys, till a NULL
is reached.
The Mach-O VM loader by no means received fairly as much as the requirements of the true factor. As an alternative of faithfully following no matter was described within the Mach-O, I carried out particular assist for mapping varied bits of the binary as the necessity arose:
exec_context.py
# Map Goal-C selector strings
logging.debug("VM mapping __objc_selrefs...")
selref_sect = macho_binary.section_with_name("__objc_selrefs", "__DATA")
if selref_sect:
for raw_addr in vary(selref_sect.tackle, selref_sect.end_address, sizeof(c_uint64)):
addr = VirtualMemoryPointer(raw_addr)
# Simply copy the worth immediately from the true binary into VM reminiscence
self.map_static_binary_data(addr, ConstantValue(macho_binary.read_word(addr)))
self.reminiscence(addr).set_readonly()
Debugger
Simulators and emulators are notoriously troublesome to debug, as usually the errors solely grow to be seen within the higher-level logic of no matter you’re simulating.
I constructed a debugger that allowed me to run lldb
on an actual iOS system on one aspect, and the simulator on one other, and run every forwards till the register or reminiscence states of the simulator diverged from the true factor.
Since this can be a digital surroundings, it was additionally easy for me to snapshot the machine state at each instruction, which facilitated reverse debugging (’time travelling backwards’) to any earlier execution level. I referred to as this ‘v
isit mode’, for the reason that REPL allowed you to run all the conventional inspection instructions (r
ead, ex
amine, p
rint, and so on.) as if execution was paused at a earlier instruction pointer worth.
Dynamic Linker
Ultimately, I did department out to actually mapping and invoking different binaries! On this tiny demo, a binary masses a framework and efficiently branches to certainly one of its exported symbols:
I then moved on to CoreFoundation
, and wrote a pile of hacks to get it working. On this demo, I’m dynamically loading CoreFoundation
, and the runtime is creating actual ObjC strings and arrays!
My lldb
comparability instrument was important right here. I discovered that I wanted to execute the routines specified by LC_ROUTINES_64
in order that CoreFoundation
had a possibility to create its Goal-C courses and populate them in __CFRuntimeObjCClassTable
. CoreFoundation queried this desk when attempting to make use of _CFAllocatorAllocate
.
It was unbelievable to observe the simulated CoreFoundation bootstrap the runtime by calling issues like object_setClass()
on __NSCFArray
! I additionally discovered that I may power CoreFoundation
to do every little thing in-house, as an alternative of shelling out to Basis
, by patching the reminiscence in CoreFoundation
’s ___FoundationPresent.current
flag to 0
.
Testing the Simulator
I wrote a unit testing harness that allowed me to totally take a look at the implementations of those mnemonics, significantly trickier ones like ldr
and str
.
test_aarch64_branch_instructions.py
# Given an unconditional branch-with-link to a label
supply = """
; Transfer a sentinel worth into x30 so we will confirm it is restored after getting back from a subroutine name
mov x30, 0xbeef
; Department to a subroutine
bl SubroutineLabel
; Elevate a breakpoint after the above name returns
brk #0x1
SubroutineLabel:
; Transfer some sentinel knowledge so we will confirm that this subroutine ran
mov x0, 0xcafe
; Return to caller
ret
; This could not run
brk #0x2
"""
# Once I simulate the code and encounter a breakpoint
breakpoint_exc = simulate_until_breakpoint(supply)
# Then an exception with code 1 has been raised
assert breakpoint_exc.exception_code == 1
# And the subroutine has run
assert breakpoint_exc.exec_context.register("x0").learn(ConstantValue).worth() == 0xcafe
# And the hyperlink register has been modified
assert breakpoint_exc.exec_context.register("x30").learn(ConstantValue).worth() != 0xbeef
I locked down all types of habits, such because the habits of comparability flags:
test_aarch64_compare_instructions.py
# Given I consider a unfavourable comparability of a quantity and itself in one other register
supply = """
mov x2, #0x800
mov x3, #-0x800
cmn x2, x3
"""
# Once I simulate the code
with simulate_assembly(supply) as ctxs:
# The proper standing flags are set
assert ctxs[0].condition_flags == {
ConditionFlag.EQUAL,
ConditionFlag.LESS_EQUAL,
ConditionFlag.GREATER_EQUAL,
}
Propagating Unknown Knowledge
The purpose of this venture wasn’t to simulate an software from begin to end, however fairly to simulate a selected tree of execution to make selections in regards to the code’s habits.
Which means that the simulator natively works with a number of unknown (‘unscoped’) knowledge. For instance, any arguments to the simulator’s chosen entry level will certainly be unscoped by the simulation.
This unscoped knowledge is represented by one of some particular sorts, resembling FunctionArgumentObject
and NonLocalVariable
. These objects proliferate themselves when the simulated code tries to make use of them. For instance, sending a message or accessing a area of a NonLocalVariable
will spawn a NonLocalDataLoad
as an output. It’s all a pile of hacks, but it surely works properly sufficient.
Typically, the simulated code will, pretty, attempt to entry an ivar from an object that was created by the simulator’s pretend Goal-C runtime. This is able to yield an UninitializedVariable
occasion if we didn’t do the rest, which is not any enjoyable and usually causes the simulated code to complain. So, the simulator will stroll the ivar desk and instantiate dummy NSObjects
to pop into these fields.
It’s troublesome to say what to do when the simulated code performs a conditional department that depends on unscoped knowledge. So, I made the simulator break up into two timber of execution: one the place the situation was true, and one the place it was false. Each paths would then be adopted.
This prompted a number of wasted work, as a result of each path the place if ((self = [super init])) {}
fails was simulated. I ultimately improved issues such that execution was solely break up in two when obligatory. This allowed me to appropriately comply with conditional branches when all of the implicated knowledge was accessible.
Observing Outcomes
A simulator isn’t a lot good with out some technique to observe what the simulated code is doing. The simulated code’s output is apparent within the GUI-centric demo up high, but it surely’s much less clear how this works after we’re simulating code with none UI.
For my use case, I wished to watch the system state at varied completely different instruction pointer values. This allowed me to assemble human-consumable stack traces with all of the dynamic arguments to every perform crammed in.
The simulator API allowed the programmer to specify all of the instruction pointer values that they had been excited about observing. The simulator would then comply with all of the execution timber, splitting off into subtrees with completely different units of constraints when the simulator wasn’t positive which course of a conditional was appropriate. On the finish, all of the machine snapshots throughout all of the attainable execution timber had been returned to the programmer, who may then examine the register and reminiscence state at every snapshot.
Enjoyable Bugs
For the reason that simulator runs untrusted code, I wished to guarantee that the simulator may gracefully deal with infinite loops within the simulated code. I added a fundamental loop detector to make sure that the simulator at all times terminated.
At some point, although, the simulator received caught regardless of my preliminary efforts.
Like I discussed above, the simulator had particular assist for sure capabilities that I had particularly modeled. For capabilities undefined by the simulator, although, the simulator would simply pop a NonLocalVariable
into x0
and keep it up.
This works wonderful, until the perform being referred to as is abort()
! On this case, a code path tried to abort()
, then the code that occurred to right away comply with ended up doing a backwards leap. The consequence was the world’s most well mannered infinite loop, wherein the consumer code requested the simulator to cease on each iteration, however fell on deaf ears.
The repair right here was easy: I modeled abort()
to terminate the present execution path, and improved my loop detector.
This venture provided many satisfying issues alongside the way in which. It’s at all times enjoyable to color your self into a giant system with its personal quirks and constraints, then discover methods out of them. Thanks for following alongside!