Now Reading
Advanced PIO equipment – Dmitry.GR

Advanced PIO equipment – Dmitry.GR

2023-04-09 19:43:24

Utilizing RP2040 PIO to drive a poorly-designed show

Desk of Contents

  1. The givens
    1. Initially…
    2. The issues
    3. PIO
  2. Towards a solution
    1. The math
    2. The 16bpp mode
    3. Touch
    4. The 4bpp mode
    5. 1bpp and 2bpp modes
    6. The 8bpp indexed-colour mode
  3. Some more details
    1. DMA chaining woes
    2. Why no nPENIRQ
    3. SD card access
    4. Clock speeds
    5. Setup & use
  4. Comments…

The givens

Initially…

Demo of the 16 bpp mode

I used to be trying to find a show with not less than 160×160 decision and a resistive contact display screen that will mate simply to RaspberryPi pico for my rePalm venture? There weren’t many, really. However I did come throughout this display at Waveshare – a 2.8inch 320×240 full-colour LCD. With out a lot additional trying, I ordered a couple of and began planning the code to drive it. The aim was to have a framebuffer supporting 1, 2, and 4 bits per pixel greyscale, 8 bits per pixel listed color, and 16 bits per pixel full color modes, whereas getting contact knowledge in realtime. As this show doesn’t help listed color or greyscale, that will have to be regionally synthesized in some way. Moreover, I noticed a couple of totally different articles on the market about how individuals drive this show, and so they had been all atrociously inefficient, so I figured I might publish a approach to do it proper and save everybody else the difficulty and ache. Offered here’s a very quick driver for this show supporting 1, 2, and 4 bpp greyscale, 8-bit listed color, and 16-bit fullcolour modes on the entire show or any rectangular subset of it. Contact knowledge is copied to a reminiscence location on your perusal at your leisure mechanically, and if you want, you possibly can even do tear-free web page flipping and get VSync interrupts. No CPU cycles are used in any respect! An actual, correct driver. So, why is driving this show correctly a ache? Nicely…

The problems

Waveshare supplies pattern code for this system. It’s past unhealthy. The code assumes that you just need to actively draw to the show – actually ship pixels or rectangles of knowledge to it manually, everytime you need to draw. I do not know who’d really do that type of factor. It’s pure madness each when it comes to code dimension and when it comes to velocity. The assorted articles I discovered from others on utilizing this show did equally insane issues, like DMA-ing a line of knowledge at a time, and utilizing the CPU to arrange the following xfer. Oof… Fortunately the datasheet for the LCD controller, the datasheet for the contact controller, and the schematic for this board can be found. Now, usually you’d simply arrange a repeating DMA to the SPI controller to ship the info to the display screen repeatedly and name it carried out, however not this time. The idiotsjokers who designed this board apparently by no means thought-about that somebody may need to really use it. Whereas there’s a vast abundance of pins to make use of, they caught the show, contact, and SD card all on the identical SPI bus. They did present some solder bridges that can be utilized to modify the SD card to its personal bus, however the contact controller and the LCD are caught sharing the bus. So, if I had been to arrange a repeating DMA to the display screen, there could be no time to speak to the contact controller. I might use a timer to schedule the DMA, and do contact sampling, however that requires CPU involvement. I wished an answer that used no CPU in any respect.

PIO

As I used to be utilizing the RP2040, I made a decision to see if I might use PIO to resolve this downside. Every PIO state machine is slightly easy, containing solely 2 basic objective registers, 2 shift registers, and house for at most 32 directions (really 32 are shared between teams of 4 state machines). They lack any potential to do math. The state machines can, nonetheless, ship and obtain interrupts, together with from one another, and may ship and obtain knowledge by way of DMA. I figured that with some effort I would have the ability to cobble collectively a working CPU-free driver for this show and contact display screen.

In direction of an answer

The maths

First, some math. The said most SPI clock frequency that this show controller chip helps is 62.5MHz. It takes 16 bits to ship a single full-colour RGB565 pixel, and we’ve 320×240 of them. Which means the utmost potential framerate within the excellent situations is 62.5e6/320/240/16 = 50.86fps. We’ll not attain this. In excellent situations, sampling X, Y, and Z from the contact controller takes 51 cycles, and the utmost potential SPI clock price when speaking to the contact controller is 2.5MHz. Which means an entire X, Y, Z pattern takes 51/2.5e6 = 20.4 microseconds. Since contact is noisy, it must be sampled slightly typically to offer sufficient samples for smoothing. I targetted 400Hz. So, per second contact sampling would steal 20.4*400 = 8.16ms. With that eliminated, our new most potential body price for display screen updating is (1-.00816)*62.5e6/320/240/16 = 50.45fps. This accounts for 16bpp mode in addition to the 8bpp listed color mode (since we’ll be sending it 16bpp knowledge). For the greyscale modes, we’ll put the show into 12bpp mode to save lots of on SPI visitors. For that, our most potential framerate will likely be (1-.00816)*62.5e6/320/240/12 = 67.26fps. Not unhealthy. In principle… Now, we simply must kind out learn how to make all of it work.

The 16bpp mode

Flowchart of the 16 bpp mode

The 16bpp mode is the only, for the reason that show natively helps RGB565 fullcolour mode, so it made sense to begin there. Issues are fairly simple right here. State machine 0 (SM0) right here will ingest knowledge, 16 bits at a time, and shift it out MSB first SPI-like to the show. However, keep in mind, we have to pattern contact. New plan. We set the X register to some worth, say 2560. We then decrease the show’s nCS, and ship it knowledge. We ship pixels till we’ve despatched X of them (that is approx 1/30 of a display screen’s price of pixels). After this we increase the show’s nCS and sign an irq to State machine 2 (SM2). We then anticipate an irq again from it. SM2 was ready all alongside. It lowers the contact controller’s nCS, gathers a pattern, raises contact controler’s nCS, and alerts an irq to SM0, which continues sending knowledge to the display screen. This could work and it does.

We’ll want some DMA channels to help this. Channel 0 will ship uncooked display screen knowledge to SM0, when carried out, it is going to set off channel 1, whose solely job will likely be to re-start and re-trigger channel 0. Thus the show is continually refreshing from our framebuffer. Contact is a little more complicated. To collect our samples we have to ship three instructions to the chip, and get three 12-bit responses. DMA channel 3 will program channel 2, in sequence, to first ship the three instructions to SM2’s TX buffer, then obtain the three knowledge factors from SM2’s RX buffer, then reprogram channel 3 in order that we will do that once more in a loop. Surely we’ll be receiving 4 knowledge factors, not three. Extra on why later. In any case, with these 4 DMA channels and two state machines, the 16 bit mode works. We get contact knowledge DMAed to RAM and show picture DMAed from RAM, all with no CPU involvement in any respect. Channel 2’s completion interrupt can be utilized to inform us when a brand new pattern has arrived, if desired.

Contact

Capture of the machinery on a logic analyzer, zoomed out

The contact controller, in the most effective case, can produce a brand new pattern each 15 cycles. We’ll attempt to go for that mode. This doesn’t imply that we will get three samples in 45 cycles. It signifies that in an infinite stream of samples being carried out, a brand new one is prepared each 15. In actuality to collect three samples, 51 cycles are wanted. A brand new command begins each 15 cycles, and after a delay of 9 cycles, a 12-bit reply (pattern) could also be learn, the following will likely be 15 cycles after that one, and so forth. All this counting is getting slightly complicated, although. This could waste loads of PIO directions. However there’s a higher manner. 51 = 17 * 3, so if I put together the command bitstream, slice it into three 17-bit phrases, and allow auto-pull, the PIO code needn’t do any counting within the TX path, simply OUT a bit at a time. This won’t work for obtain, since every pattern would find yourself shifted by a variable quantity. As a substitute, we actually need to pattern 15 bits at a time. So we notice that 51 = 15 * 4 – 9. Thus, we will configure our PIO code to auto-push each 15 bits, pre-load ISR with 9 bits of zeros up entrance, and IN a bit at a time. It will produce 4 15-bit samples, the primary containing all zeroes, and the following three containing the info we wished (X, Y, Z). Since every PIO SM has a 4-word FIFO for TX and identical for RX, we will first load the TX fifo with our instructions, then anticipate RX utilizing the identical DMA channel, as described above. Superior! Right here on the proper you possibly can see the way it seems when zoomed out – the show knowledge is sometimes interrupted to pattern contact, however many of the SPI time is used to ship the info to the show, as supposed.

The 4bpp mode

Flowchart of the 1/2/4 bpp mode

This show doesn’t help any greyscale modes. The bottom bits-per-pixel that it will probably help is 12, in RGB444 format. That mode is exactly what we’ll have to make use of to create our 1bpp (B&W), 2bpp (4 greys), and 4bpp (16 greys) modes. We might, in fact, simply do that by taking our framebuffer, fetching each 4 bits sequentially, calculating the right 12-bit color, and sending that by way of SPI. Nonetheless, the aim is to do all this with no CPU involvement. Fortunately, we will. Here’s what we’ll do.

SM0 will likely be fed (by DMA) the 4bpp knowledge from RAM, 32 bits at a time. It would shift off 4 bits from the low finish into a brief variable. Shift that variable in 3 instances, and output these twelve bits. Thus, SM0 merely expands the incoming pixel knowledge into 12-bit show knowledge. We will then arrange a DMA channel to feed SM0’s output to SM1, which is able to shift out 12-bit phrases to the show. After some numbers of pixels it’s going to give the contact controller a while to pattern contact, and repeat ceaselessly. There may be a few points to think about right here. To start with, regardless of every pixel being 12 bits in dimension, the show won’t settle for non-integer-number-of-bytes-long writes, so the variety of pixels we ship between contact samples should be even. OK. Straightforward sufficient.

The final challenge is pushback. DMA will be triggered by one occasion (knowledge request, aka DREQ). For instance, the DMA that feeds SM0 with our uncooked pixel knowledge is triggered by SM0 having house in its enter FIFO. So how will we correctly set off the DMA that feeds SM0’s output to SM1’s enter? If we set off it primarily based on SM0’s output having knowledge, we’d overflow SM1’s enter FIFO. If we set off it primarily based on SM1’s enter having house, we’d learn rubbish from SM0 because it has not but produced knowledge. You could possibly try to resolve this by claiming that SM0 will at all times be sooner to supply knowledge than SM1 can devour it. That is usually true, however what if there’s a delay on the enter knowledge to SM0? Actually we wish a DMA channel that we will set off primarily based on each SM0’s non-empty output and SM1’s non-full enter. Sadly, such DMA set off mechanisms don’t exist on RP2040 (or every other chip i’ve ever seen). The answer I settled on was this: the DMA will likely be triggered by SM1 having house within the enter buffer. However to stop it overflowing when SM0 is simply too quick, SM1 will sign an IRQ to SM0, and SM0 will anticipate IRQ earlier than producing one other output pattern. To prime this technique, we’ll force-issue one IRQ at first. This works!

One tiny very last thing to recollect – most monochrome screens deal with zero as white and all ones as black, for the reason that pure relaxed state of an LCD is evident. However for our color show all ones is white and all zeroes is black. Thus, SM0, on enter, will invert its knowledge to offer for the right color mapping.

1bpp and 2bpp modes

Constructing on the success of the 4bpp mode, supporting 2bpp and 1bpp will not be onerous. Let’s check out 2bpp first. As earlier than, enter knowledge is inverted. However now we stumble upon the query of correctly mapping the 2bit brightness worth unto a 4-bit brightness worth. It’s tempting to simply add two zeroes, that’s 00 -> 0000, 01 -> 0100, 10 -> 1000, 11 -> 1100, however clearly this doesn’t takes us to the whitest white. OK, we will attempt including ones: 00 -> 0011, 01 -> 0111, 10 -> 1011, 11 -> 1111. OK, we now have the whitest white, however lack the blackest black. We actually need an excellent distribution of brightnesses. Doing somewhat math, we discover that the right mapping is: 00 -> 0000, 01 -> 0101, 10 -> 1010, 11 -> 1111. This mapping is uniform and covers your entire vary. Taking a look at it carefully, additionally it is fairly simple to supply, since we simply take the 2 enter bits, and duplicate them.

Keep in mind how SM0 in 4bpp mode would learn in 4 bits of knowledge, repeat them 3 instances, and supply these 12 bits as output? Nicely, appears that now, for 2bpp, we will simlpy learn in 2 bits of enter, repeat them 6 instances (to supply 4 bits for R, then 4 for G then 4 for B), and supply that as output. Yup! That works. Actually, this trivially extends to 1bpp, the place we’ll learn in a single bit, repeat it 12 instances, and ship that out. The remainder of the equipment doesn’t want to vary in any respect, which is why 1/2/4bpp mode code is all grouped collectively.

The 8bpp indexed-colour mode

Flowchart of the 8 bpp mode

I left the indexed-colour mode for final, as it’s the hardest. Ideally, the best way it really works is that there’s a color lookup desk (CLUT) someplace, with 256 entries, every representing a full RGB565 color. The framebuffer is one byte per pixel, represending an index into the colourtable. This mode was generally used on outdated PCs, however can also be nice for UIs that need to be colored however dont need to waste the framebuffer house on 2 bytes per pixel. This show controller doesn’t help listed color mode and has no reminiscence for a CLUT, so it’s going to must dwell in RAM, in some way.

PIO equipment can’t do math effectively, nor can the DMA models. However PIO can concatenate bit strings. An concept begins to kind. For instance that our CLUT will comprise 16-bit RGB565 entries and dwell in RAM at an deal with divisible by 512. Which means to get an deal with of the Nth entry we don’t want so as to add. We will simply insert “N” shifted left by 1 into the deal with, since its decrease 9 bits are assured to be zero. This we CAN do in PIO. So SM0 will enter pixel knowledge, 8 bits at a time. Certainly one of its registers will likely be pre-programmed with the deal with of the CLUT, proper shifted by 9. It would shift that into the ISR, shift within the pixel worth, and shift in a single zero bit. Now it will probably output this 32-bit worth, which is an deal with, in RAM, of the CLUT entry with the precise RGB565 color of this pixel.

Cool, however we can’t feed a RAM deal with to our show – it has no concept what our RAM incorporates. We will, nonetheless use one DMA channel to program one other… Sure, we’ll DMA out each output from SM0 into the “learn deal with” register of one other DMA channel. That channel will likely be programmed to switch a single 16-bit worth from there to SM1’s enter. Thus it’s going to copy the precise RGB565 worth to SM1’s enter. SM1 then merely sends it, because it did for 16bpp mode. We apply the identical form of bidirectional pushback mechanism right here as we do for 1/2/4 bpp mode, and the contact sampling works the identical right here as in all different instances. It’s fairly convoluted, but it surely works! The color desk will be up to date any time and takes impact instantly, as a result of how it’s used. And sure, I’m utilizing a PIO state machine to … add. Cool, eh?

With some minor work, this mode will be modified to make use of RGB666 mode for barely higher color accuracy, however as a result of needing to ship extra bits over the SPI bus (24 bits despatched per pixel vs 16 for RGB565), the framerate will endure by an element of 33%. That is why I compromised on RGB565 for this mode – it’s simply not price it for two extra bits.

See Also

Some extra particulars

DMA chaining woes

Whereas engaged on this venture I bumped into a problem whereby issues would work effectively, till the CPU’s inside AHBlite bus was very loaded, at which level issues would break down and a DMA channel would find yourself misconfigured in some way. After breaking my head over it for a couple of days and discovering no manner through which my code was incorrect, I filed bug 321. It seems that RP2040’s DMA chaining, in some instances, might set off the chained channel earlier than the final write has accomplished. Appears the chain sign is fired when the final write is issued, not when it’s accomplished. Generally this is able to be completely tremendous, since virtually actually no one depends upon cycle-exact DMA chaining, and doing it this manner makes the sign seem sooner. In my case, when one channel reconfigures one other, after which chains to it, it mattered, since generally the second channel would get triggered to begin earlier than the final configuration phrase had been written. Fortunately, the repair is easy. Apart from chaining, one might triger a channel by writing to sure register addresses. I modified my DMA configuration to at all times set off utilizing such writes and now the entire thing is rock secure. All-in-all, this driver wants six DMA channels and utterly takes over PIO0, because it makes use of 30/32 of its instruction slots and three/4 of its state machines.

Why no nPENIRQ

Why not use the nPENIRQ output from the contact controller? The unhappy half is that the sign is borderline ineffective. Why? Seemingly it ought to go low when the show is touched, and keep up when it isn’t. Two obstacles are in the best way of utilizing it as specified. First, it’s noisy. Whereas the show will not be touched, generally the sign will go low for somewhat, randomly. Secondly, whereas really sampling, it goes low as per spec. Which means correctly sampling it requires each smoothing and correct timing. It seems to not be price it.

SD card entry

Whoever designed this board had yet another nasty shock up his sleeve. The SD card is additionally wired into the identical SPI bus. Certainly, the identical system I exploit to share the SPI bus between the show and contact may very well be used to additionally give a while to the SD card, however since SD could be a slightly high-bandwidth peripheral, I made a decision that this isn’t price it. The board has provisions to wire the SD card to totally different pins (they name it SDIO mode). I like to recommend doing that since two high-bandwidth peripheral sharing one SPI bus is a recipe for disappointment.

Clock speeds

The contact controller has a 2.5MHz most clock velocity, whereas the show helps as much as 62.5MHz velocity. Whenever you use the supplied code, ensure that to set your clock divider for SM1 appropriately such that the info will not be despatched sooner than 62.5MHz and SM2’s not speaking sooner than 2.5MHz. The display screen will settle for the info somewhat over 62.5MHz, however because the spec doesn’t enable it, you get no guarantees how effectively it’s going to work. Ditto for contact. It isn’t price it.

Setup & use

The dialogue to date has been in regards to the steady-state “working” state of affairs. How will we get there? Lazily. I didn’t trouble utilizing PIO in any respect – simply bit-bang SPI utilizing GPIO till we’re able to allow all of this equipment. As setup is rare, I do under no circumstances really feel unhealthy about this.

I usually allow interrupt on DMA full for contact, in order that the contact coordinate will be processed, smoothed, and many others. This will also be disabled for those who favor to simply learn the situation and get the most recent contact level as a substitute.

In case you don’t want to use your entire show, the scale for the used space will be set within the supply code, solely that dimension will then be drawn from the framebuffer. Web page-flipping is straightforward, simply modify the mFb variable, and the following display screen refresh will use the brand new base deal with. It doesn’t matter whenever you make the modification because the variable is simply learn as soon as per body (by DMA). No tearing will ever occur on such a pageflip. If you want, you possibly can allow “xfer full” interrupt on the DMA channel that sends the display screen knowledge to get an interrupt when the flipping is accomplished.

The API supplied are dispSetDepth() to set the present depth, dispSetClut() to set a number of color desk entries for 8bpp mode, dispInit() to do one-time show init, and dispOn()/dispOff() for turning the show on and off.

Obtain: [HERE]. License is BSD 2-clause. I’m too lazy (and disgusted) to show this into some type of an arduino or a micropython library, however I’m positive another person will. My supplied code will construct standalone with no dependency on something. License is BSD-2 clause. Take pleasure in


Feedback…

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top