Now Reading
Glitching the Olimex LPC-P1343

Glitching the Olimex LPC-P1343

2023-01-17 10:14:46

Again in the summertime I used to be fortunate sufficient to finagle my boss into letting me take Dmitry Nedospasov (@nedos)’s hardware hacking training. In it I reduce my tooth on utilizing an FPGA to interface with goal {hardware}. After implementing a UART we carried out a module that might parse a part of Apple’s OneWire, used to barter energy trade, amongst different issues, together with your iPhone over the lightning cable. Our ‘closing mission’ was to construct a UART-controllable glitcher, utilizing it to attempt to glitch a improvement board.

Whereas we obtained it working, it was with a little bit of hand-holding from Dmitry, together with organising the scope, the facility provide and many others. In an effort to concretize the information, I made a decision after I obtained residence that I needed to do it once more alone.

When you haven’t seen, hobbyist FPGAs have flooded the market. In 2015 Lattice’s iCE40 collection FPGA had its bitstream format reverse engineered, spawning an explosion of open supply tooling for synthesizing, place and route, and simulation. Altera’s (Intel’s) and Xilinx’s bit stream format haven’t been reverse engineered, and so you might be caught utilizing their instruments for those who determine to develop on these boards. I’ve performed round with just a few boards, and the iCEBreaker is my current favorite. The folks on their Discord are tremendous useful, the toolchain is great, and the board itself is nice for the worth.

Porting it over wasn’t an excessive amount of work, the one actual distinction was that the FPGA we utilized in coaching was a Digilent Arty, which has a 100MHz clock, whereas the iCEBreaker’s has a 12MHz. This requires we modify something that’s counting cycles to account for the 8.33x slower clock, and we lose some granularity in something we need to depend (since every clock cycle has an extended period). That is additionally a chance to generalize a number of the code to not make as many assumptions in regards to the FPGA it’s operating on. As a result of I used to be rusty, I selected to attempt to ‘blindly’ re-write a number of the modules, as an alternative of utilizing the code Dmitry has on his github.

glitcher setup

I can’t actually give a greater background to this than Dmitry does in his blog post about it. Briefly, when the goal board boots up the bootROM reads the flash, and relying on the worth it reads from handle 0x2FC, and the state of some pins, it determines whether or not the UART goes to a form of shell, and whether or not you should utilize this shell to learn out the flash. That is supposed so that you could develop your firmware and debug within the bootloader, however then flash a model that units this worth when your firmware is manufacturing prepared, hopefully stopping the top consumer from dumping it from the flash. The vulnerability right here is that it’s a 4 byte worth, and solely a particular worth (0x12345678) will lock the bootloader within the anticipated approach. That signifies that if any of the 32 bits learn listed here are learn incorrectly, the bootROM will contemplate the gadget unlocked. That is against, for instance, requiring a particular worth to unlock the bootloader, and having the opposite 4 billion values lock it.

If we are able to get the CPU to misinterpret the flash on the very second it occurs to be studying that worth, then we are able to have it soar to the bootloader within the unlocked state. It’s as straightforward as that! (Well-known final phrases)

The thought right here is that we are going to use the FPGA as a instrument that goes between my host machine and the goal board. We will talk with it utilizing UART, and sure particular bytes are interpreted as instructions for the FPGA, whereas different values are merely handed via to the goal board (to speak to its bootloader). The values despatched again from the goal board are merely handed immediately via to the host machine. The FPGA helps configuring the delay between resetting the goal board and pulsing a ‘glitch voltage’, and the way lengthy that glitch pulse lasts. It additionally helps sending a number of pulses, and naturally can reset the goal board and activate the glitch.

We use an FPGA right here as an alternative of a microcontroller for just a few causes:

  • First, as a result of we are able to configure issues on the clock stage, we are able to have very particular timing (1s/12000000 = 83.3ns precision).
  • Second, for a similar purpose, we don’t have to fret about jitter: With a raspberry PI we’d fear in regards to the OS scheduling different course of and such, contributing to inconsistency between runs. Even with an Arduino or different microcontroller, with no working system, we’d have to fret about interrupts messing up the timing.
  • Third, I needed to get extra observe writing Verilog and utilizing an FPGA.

For the toolchain, I largely took the whole lot from WTFpga, which is a newbie’s lab that makes use of the iCEBreaker board. It makes use of Yosys, nextpnr, and some instruments from mission icestorm. These are all open supply instruments that you could invoke from the command line, you don’t want a GUI (so I don’t have to run Vivado in a VM) and the time to construct is a lot, a lot quicker than the couple of minutes it takes to get a synthesis to fail with obscure errors in Vivado. This actually sped me up as a result of I’m not disciplined and as an alternative of inspecting my code for errors forward of time, I are inclined to compile, patch, and iterate till it builds.

For debugging I used PulseView (a part of sigrok) when debugging actual indicators, and GTKwave to have a look at my simulated waveforms.

For {hardware}, I clearly used the iCEBreaker, in addition to the Olimex target board. My bench high provide is a DC50V5A, an inexpensive however useful configurable buck converter I obtained on Ali categorical. Whereas I’ve a Saleae, I choose Sigrok, and on the speeds I used to be operating issues at, an inexpensive 24MHz logic analyzer was sufficient. In the long run I had some troubles debugging one thing utilizing that alone and borrowed an oscilloscope, however in hindsight it wasn’t vital, simply good to have.

Once more, I’m simply porting over Dmitry’s design, so right here’s the block drawing stolen from his blog:

glitcher block diagram

The tl;dr:

  • The cmd module intercepts the whole lot the host laptop sends over UART. Primarily based on the primary byte it both interprets it as a command for the FPGA, or passes it via to the goal.
  • The resetter merely holds the reset line down lengthy sufficient for the goal to reset (as an alternative of 1 cycle).
  • The delay module begins relying on reset and waits the configured variety of cycles earlier than sending its personal sign.
  • The set off module waits for the delay to complete after which tells the heart beat module to ship a pulse.
  • The pulse module is so much just like the delay module, besides that it makes use of a distinct config, and its output is related to the facility multiplexer.

That is all managed by a python script that talks to the UART, first configuring the FPGA, then activating a glitch, after which speaking with the goal. It determines whether or not it will possibly learn out the flag, and if it will possibly’t, it adjusts the delay and pulse width configs and tries once more.

The FPGA has two inputs: The UART from the host, and the UART from the goal board. It has 4 outputs: The UART to the host, the UART to the goal board, a reset line to reset the goal, and a vout that’s used to manage the analogue multiplexer, to rapidly drop the voltage powering the goal board.

Whilst you can definitely check on an actual FPGA, it’s very troublesome to see what’s happening contained in the FPGA. You may blink LEDs, or, if in case you have the {hardware}, use seven segments shows to output no matter related worth. I discovered this very useful and ended up shopping for a second simply in order that I may inform each my pulse width and delay depend at a look.

That stated, even with the slightly fast instruments, flashing and debugging with a logic analyzer is so much slower than perfect. I counsel organising a very good check bench and run simulations.

Simulation

I used Icarus Verilog to simulate the varied modules, and GTKWave to have a look at the waveforms it generates. The essential concept is that you simply write further verilog that simulates the inputs to your high module, after which confirm that behaviour of the inner indicators are as anticipated. Versus operating on actual {hardware}, it’s straightforward to introspect any inside worth any any time limit. You can even write check benches for any particular person module, be sure that every half is behaving as anticipated earlier than combining them collectively.

Right here we see my simulation of the entire glitcher, I ship just a few configurations, after which some instructions that needs to be handed via to the goal.

glitcher commands simulation

And here’s what occurs when the glitch command is shipped. We see the reset line go down, then there’s a delay based mostly on the delay configuration we beforehand set, and the vout line goes low for a time frame decided by the heart beat width configuration.

glitcher pulse simulation

Whereas a simulation is nice, the true check is whenever you see it work on actual {hardware}, which I used to be capable of see right here with my logic analyzer:

glitcher logic analyzer dump

The Olimex board really runs quicker than my FPGA, and in coaching our FPGA was greater than 8 instances quicker. In observe our profitable glitch had a really quick pulse, within the tens of cycles (at 100MHz). With a 12MHz clock (I believed) I used to be in bother. There may be considerably much less granularity in pulse widths, and we threat the best pulse width between someplace between a n cycles and (n+1) cycles.

I attempted to repair this through the use of a PLL which lets you generate a clock that’s quicker than the enter clock by some a number of. Once more, icestorm to the rescue right here, I used to be in a position to make use of icepll to generate many of the code wanted to generate a 48MHz clock from a 12MHz enter.

I saved most modules on the principle clock, however fed my new fast_clk to the heart beat module. I adjusted my testbench to generate the quicker clock and was capable of confirm that I may generate shorter pulses (with 4 instances the granularity). I really discovered a bug in my pulse module right here: Since my pulse module was operating on a clock 4 instances quicker than the remainder of the system, together with the module that permits the heart beat, with quick width values my pulse was ending earlier than the allow sign was unset. This was inflicting the heart beat module to instantly begin a second pulse. I fastened this by including an additional state to the module that made it wait till after the allow sign was unset earlier than returning to the ready state.

Sadly, after I ran this on actual {hardware}, I discovered that my pulse width gave the impression to be persistently the identical worth. I used to be unable to debug this and determined to see if I may get the glitch to work with out the finer granularity (spoiler: I may).

As soon as I had my simulation trying good, and I used to be capable of see that the behaviour in the true world seemed like what I used to be anticipating, it was time to really glitch the board.

Modifying the board

To have the CPU misinterpret the lock worth we would like the voltage to drop at exactly the time when it’s studying the suitable handle from flash. As a result of there are decoupling capacitors on the board, which might clean out any abrupt voltage change and make this much more troublesome, I needed to take away these capacitors. I additionally reduce traces between the board’s voltage regulator and the VCC and VCCIO, in order that it’s powered solely from the output of my analogue multiplexer. That is all properly documented within the third a part of Dmitry’s blog series.

Figuring out the availability voltage

As a result of we don’t stay in a frictionless vacuum the place infinitely quick adjustments in voltage are doable, even with out decoupling capacitors the CPU gained’t see an instantaneous change in voltage after we toggle vout. Due to this, we need to reduce the time the voltage change occurs, and so we run the board on the lowest voltage we are able to discover the place the board behaves usually.

For me this was 2.30V, which I made up my mind by operating a loop the place I constantly reset the board after which tried to speak with the bootloader. I adjusted the voltage on my benchtop provide whereas this was operating till it was simply barely excessive sufficient to reliably learn “Synchronized” after sending the “?” bootloader command. The glitch voltage for my setup was merely 0V, however this was largely as a consequence of me solely having a single channel energy provide. You may be capable of extra reliably reproduce the glitch if as an alternative of glitching between X and 0 you glitch between X and Y, but when I obtained it to work with 0V, you may, too.

The second of fact

I connected the reset line of my FPGA to the reset on the board, connected vout to the management pin of the multiplexer, tied Vglitch to floor, after which tied Vcc to the two.3V I made up my mind because the minimal steady voltage.

glitcher setup

A pal from work let me borrow his oscilloscope, so I used to be capable of see in actual time the reset sign adopted by the vout sign pulsing and the corresponding voltage drop in Vcc.

oscilloscope view

The yellow line is the reset sign, which is held low for 5uS, after which vout, the cleaner purple sign, is toggled delay cycles later, for pulse width cycles. The cyan sign is the precise voltage seen, which we are able to see is ‘dirtier’ than the purely logic-level vout sign.

Given sufficient time you may principally brute pressure any mixture of delay and width, however figuring out that the board boots in underneath 100uS and figuring out that with pulses too large we by no means get a steady system I performed with delays between 60 and 1200 cycles (5uS to 100uS) and pulses between 1 and 25 cycles.

I then ran my script, which brute pressured via the vary of delay and pulse width values till I used to be handled to a pleasant dump of the flash I used to be not imagined to learn! You may see within the video beneath that the width of the heart beat get wider till it reaches its max, at which level the delay is incremented and the width values are all tried once more.


And with a sure width and delay (15 and 680 in my case) our script dumps out the flash!

glitcher pulse simulation

If we take a look at the lock worth at 0x2fc, we see that it’s 0x12345678 as we count on:

glitcher pulse simulation

We don’t see a glitched out worth as a result of this dump is after we’ve glitched the bootloader, with the voltage again at its steady worth. At this time limit the bootloader has already (incorrectly) decided that the bootloader is unlocked, and so all subsequent reads succeed. We don’t know precisely what occurred through the glitch: The learn may have misinterpret the worth from flash, the comparator may have mis-behaved, the conditional flag may haven’t been set, the soar instruction may have been skipped, or numerous different bizarre issues. What we do know, nonetheless, is that we obtained what we would like: The bootloader operating stably in a state the place it thinks the flash isn’t locked down.

In fact in actual life issues didn’t go this easily, whereas testing it out I discovered a bunch of bugs, together with how if my delay was too quick it will ship a pulse whereas the reset was nonetheless low and my pulse width counter had an off-by-one in counting its cycles, amongst different issues, so don’t fear for those who do this and have points as properly: Isolate the difficulty, be sure your simulation works, evaluate to actual life, and debug.

Conclusions

I discovered this to be a very enjoyable private mission. I discover repetition key to remembering issues, and so rewriting modules I had written months in the past, relearning Verilog syntax, and hitting the identical points whereas debugging (hopefully) helped me to recollect it for subsequent time.

On the FPGA facet, the open supply instrument chain I used for the FPGA stuff was very straightforward to make use of, and with the ability to rapidly construct each simulations and the true bitstream helped me iterate much more rapidly. It was properly price my time to get my Makefile operating properly.

On the glitching facet, I discovered that it actually demystified the entire thing for me. If I can whip up one thing that’s apparently exact sufficient to (form of) reliably glitch a goal board operating within the dozens of MHz, I have to be doing one thing proper.

I wouldn’t have been in a position to do that with out a number of assist from Dmitry. Even now months after taking his coaching he was prepared to assist me with no matter dumb questions I’ve. Significantly contemplate taking his training. I had Dmitry take a look at this put up earlier than publishing and he supplied a reduction code! So for those who join use the code grazfather for five% off!

Since I used to be concurrently attempting to determine glitching stuff whereas additionally re-learning Verilog and attempting to do it in a brand new ecosystem, I wanted and obtained lot of assist from the good folks on the 1bitsquared discord, so massive because of them! When you’re on the lookout for your individual FPGA board, I can’t suggest the iCEBreaker sufficient!

My code is out there on github, which incorporates the verilog to configure the FPGA, in addition to the python script used to orchestrate the whole lot. It’s closely based mostly on nedos’s own version.


Final modified on 2019-12-08



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top