Now Reading
Simulating Slices of iOS Apps

Simulating Slices of iOS Apps

2024-01-15 11:57:51

In 2019, I constructed a work-for-hobby iOS simulator on a strict routine of weekends and low. Whereas the complete particulars of this venture will keep in-house, there’s sufficient I can share to hopefully be fascinating!

First up, right here’s what this seems like working in opposition to a easy demo app. On the fitting is the bona-fide iOS simulator from Xcode, and on the left is the simulator I constructed.

Simulating Opcodes

The guts of this simulator is a basic interpreter runloop. Every tick, the following opcode is parsed from the binary and interpreted to replace the digital machine state. Right here’s a easy opcode handler:

The ldr and str instruction households had been by far probably the most ugly to pin down, as they each are available in a wide range of completely different flavors and modes. The simulator must deal with a slew of load and retailer variants: fast, pre-indexed, pre-indexed writeback, and post-indexed writeback, simply to call a handful. These implementations had been the most important of the instruction handlers, and so they led to delicate bugs when the implementations had been a bit off.

To get real-world code to work, I wanted to implement all types of wacky opcodes interacting with floating level and SIMD registers.

VM Structure

The simulator was constructed on a basic approximation of the von Neumann structure: each piece of knowledge was a Variable, and a Variable is at all times held in a VariableStorage. A CPU register is one type of VariableStorage, and a reminiscence cell is one other.

This design was a misstep that I’d rework if I got here again to the venture: modelling every reminiscence phrase as an object’s complete storage cell precludes any wise risk of working on a buffer of bytes and modifying knowledge throughout phrase boundaries. The technique I selected, nevertheless, does work fairly properly for software code that simply shops tips to heap-allocated objects in reminiscence phrases and sends messages to them.

Image Modeling

Ultimately, the simulated code goes to department and, worse but, department to one thing imported from one other binary. As an alternative of constructing a full dynamic linker, I added particular assist to the bl mnemonic handler to carry out bespoke operations when sure branches had been carried out.

For instance, if the simulator noticed {that a} bl was being carried out to the _random imported image, it may entice into its personal in-house random() implementation.

Rather more fascinating, although, is _objc_msgSend.

This implementation would produce pretend objects on a digital heap, as an alternative of simulating the true Goal-C runtime. I ended up with my very own itty bitty commonplace library.

I additionally wrote implementations of humorous constructors like +[NSDictionary dictionaryWithObjectsAndKeys:]. It was fascinating to see how these sorts of name websites work beneath the hood!

On this case, the compiler will place the primary argument of the variadic checklist in x2. The compiler will organize the remainder of the arguments on the stack, beginning with x2’s corresponding key. It’s the implementation’s duty to iterate the checklist on the stack, alternately popping off values and keys, till a NULL is reached.

The Mach-O VM loader by no means received fairly as much as the requirements of the true factor. As an alternative of faithfully following no matter was described within the Mach-O, I carried out particular assist for mapping varied bits of the binary as the necessity arose:

Debugger

Simulators and emulators are notoriously troublesome to debug, as usually the errors solely grow to be seen within the higher-level logic of no matter you’re simulating.

I constructed a debugger that allowed me to run lldb on an actual iOS system on one aspect, and the simulator on one other, and run every forwards till the register or reminiscence states of the simulator diverged from the true factor.

Since this can be a digital surroundings, it was additionally easy for me to snapshot the machine state at each instruction, which facilitated reverse debugging (’time travelling backwards’) to any earlier execution level. I referred to as this ‘visit mode’, for the reason that REPL allowed you to run all the conventional inspection instructions (read, examine, print, and so on.) as if execution was paused at a earlier instruction pointer worth.

Dynamic Linker

Ultimately, I did department out to actually mapping and invoking different binaries! On this tiny demo, a binary masses a framework and efficiently branches to certainly one of its exported symbols:

I then moved on to CoreFoundation, and wrote a pile of hacks to get it working. On this demo, I’m dynamically loading CoreFoundation, and the runtime is creating actual ObjC strings and arrays!

My lldb comparability instrument was important right here. I discovered that I wanted to execute the routines specified by LC_ROUTINES_64 in order that CoreFoundation had a possibility to create its Goal-C courses and populate them in __CFRuntimeObjCClassTable. CoreFoundation queried this desk when attempting to make use of _CFAllocatorAllocate.

It was unbelievable to observe the simulated CoreFoundation bootstrap the runtime by calling issues like object_setClass() on __NSCFArray! I additionally discovered that I may power CoreFoundation to do every little thing in-house, as an alternative of shelling out to Basis, by patching the reminiscence in CoreFoundation’s ___FoundationPresent.current flag to 0.

Testing the Simulator

I wrote a unit testing harness that allowed me to totally take a look at the implementations of those mnemonics, significantly trickier ones like ldr and str.

I locked down all types of habits, such because the habits of comparability flags:

Propagating Unknown Knowledge

The purpose of this venture wasn’t to simulate an software from begin to end, however fairly to simulate a selected tree of execution to make selections in regards to the code’s habits.

See Also

Which means that the simulator natively works with a number of unknown (‘unscoped’) knowledge. For instance, any arguments to the simulator’s chosen entry level will certainly be unscoped by the simulation.

This unscoped knowledge is represented by one of some particular sorts, resembling FunctionArgumentObject and NonLocalVariable. These objects proliferate themselves when the simulated code tries to make use of them. For instance, sending a message or accessing a area of a NonLocalVariable will spawn a NonLocalDataLoad as an output. It’s all a pile of hacks, but it surely works properly sufficient.

Typically, the simulated code will, pretty, attempt to entry an ivar from an object that was created by the simulator’s pretend Goal-C runtime. This is able to yield an UninitializedVariable occasion if we didn’t do the rest, which is not any enjoyable and usually causes the simulated code to complain. So, the simulator will stroll the ivar desk and instantiate dummy NSObjects to pop into these fields.

It’s troublesome to say what to do when the simulated code performs a conditional department that depends on unscoped knowledge. So, I made the simulator break up into two timber of execution: one the place the situation was true, and one the place it was false. Each paths would then be adopted.

This prompted a number of wasted work, as a result of each path the place if ((self = [super init])) {} fails was simulated. I ultimately improved issues such that execution was solely break up in two when obligatory. This allowed me to appropriately comply with conditional branches when all of the implicated knowledge was accessible.

Observing Outcomes

A simulator isn’t a lot good with out some technique to observe what the simulated code is doing. The simulated code’s output is apparent within the GUI-centric demo up high, but it surely’s much less clear how this works after we’re simulating code with none UI.

For my use case, I wished to watch the system state at varied completely different instruction pointer values. This allowed me to assemble human-consumable stack traces with all of the dynamic arguments to every perform crammed in.

The simulator API allowed the programmer to specify all of the instruction pointer values that they had been excited about observing. The simulator would then comply with all of the execution timber, splitting off into subtrees with completely different units of constraints when the simulator wasn’t positive which course of a conditional was appropriate. On the finish, all of the machine snapshots throughout all of the attainable execution timber had been returned to the programmer, who may then examine the register and reminiscence state at every snapshot.

Enjoyable Bugs

For the reason that simulator runs untrusted code, I wished to guarantee that the simulator may gracefully deal with infinite loops within the simulated code. I added a fundamental loop detector to make sure that the simulator at all times terminated.

At some point, although, the simulator received caught regardless of my preliminary efforts.

Like I discussed above, the simulator had particular assist for sure capabilities that I had particularly modeled. For capabilities undefined by the simulator, although, the simulator would simply pop a NonLocalVariable into x0 and keep it up.

This works wonderful, until the perform being referred to as is abort()! On this case, a code path tried to abort(), then the code that occurred to right away comply with ended up doing a backwards leap. The consequence was the world’s most well mannered infinite loop, wherein the consumer code requested the simulator to cease on each iteration, however fell on deaf ears.

The repair right here was easy: I modeled abort() to terminate the present execution path, and improved my loop detector.


This venture provided many satisfying issues alongside the way in which. It’s at all times enjoyable to color your self into a giant system with its personal quirks and constraints, then discover methods out of them. Thanks for following alongside!

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top