Now Reading
ECE4760 rp2040 DMA machine

ECE4760 rp2040 DMA machine

2023-01-21 09:20:35



ECE4760 rp2040 DMA machine




Cornell College ECE4760

Direct Reminiscence Entry computing machine


RP2040

DMA on RP2040

DMA makes use of reminiscence controllers separate from the CPU to speed up information movment between reminiscence places, or between peripherials and reminiscence. The RP2040 has 12 DMA channels which might stream an agregate of over 100 megabytes/sec with out affecting CPU efficiency, in lots of circumstances. There are an enormous variety of choices obtainable to arrange a DMA switch. You’ll be able to consider a DMA channel controller as a separate, programmable, processor with the principle job of transferring information. Reminiscence on the rp2040 is organized as a bus matrix with separate reminiscence bus management masters for every ARM core and for the DMA system, and a number of other reminiscence bus targets accessed by the masters. Every bus goal may be accessed on every machine cycle.

Right here we use the DMA subsystem to supply a whole computing system, unbiased of the principle ARM cpus. The DMA machine makes use of memory-copy capacity, transport_triggered operations, and self-modifying code. The code consists of a sequence of DMA block descriptors saved in an array. The carried out operations are Turing Complete, and run at concerning the pace of an Arduino. About 8 million DMA blocks/second may be fetched/executed. There’s a historical past of utilizing solely memory-moves to construct a common cpu. In 2013 Stephen Dolan printed x86 mov is Turing-Complete describing an instance of a one-opcode machine. The paper Run-DMA by Michael Rushanan and Stephen Checkoway exhibits how to do that with one model (Raspbery Pi 2) of ARM DMA. The DMA system on the RP2040 has extra transport-triggered capabilities and is just a little simpler to construct. Joseph Primmer and I constructed a DMA processor utilizing the Microchip PIC32 DMA system. Addition and branching needed to be based mostly on table-lookup. See DMA Weird machine.

The DMA machine is a fetch-execute cpu the place the fetch operate is finished by one DMA channel, which masses DMA management block pictures from RAM into one other (execute) DMA channel. The ‘program’ which is loaded consists of a fastidiously crafted collection of DMA management blocks which collectively act as a common goal pc. By utilizing DMA1 blocks to modify following DMA1 management block pictures within the array, simply earlier than they’re transfered to the {hardware} DMA1 management registers, we are able to carry out addition, increments, conditional jumps, and/or/not logic operations, and another operations required. The design is made simpler by a number of transport- triggered actions within the DMA subsystem. These embrace an adder within the ‘channel sniffer’ and atomic SET/CLEAR/XOR write capabilities on all SFRs. The essential fetch/execute machine makes use of the channel DMA0 learn tackle as a program counter. Each fetch that happens leaves the learn tackle pointing to the following block location. DMA0 reads the following block from the RAM array and copies it to the DMA1 channel {hardware} management registers, then chains to the newly loaded DMA1 channel. The DMA1 channel performs no matter daata transfer is specified, then chains to DMA2. DMA2 resets the DMA0 write_address to level to DMA1 management registers. Program branching is carried out by utilizing DMA1 to load a brand new DMA0 learn tackle to the DMA0 management registers. Writing a program of DMA blocks could be very very like programming in some unusual meeting language for a machine with one accumulator register and solely memory-to-accumulator operations.

The next diagram is an try and summarize all this insanity.

Black arrows are information move. Blue arrows characterize chaining between channels.

The management block array is simply an array of ints which can be learn by DMA0 in units of 4.

Along with a straight copy operation, there are a couple of transport-triggered operations within the RP2040 DMA system which occur as a aspect impact of studying or writing a selected tackle:

  • Writing to sure shadow registers related to every particular operate register (SFR) clears, units, or XORs bits within the SFR.

    The DMA sniffer information register is one SFR that can be utilized this fashion.

    This permits common logic operations in every SFR because of this to transfering information to the register.

    Each peripherial management register (ports, timers, DMA, all the things) is an SFR with three shadow registers!

    • For 2 logic bits A and B:
    • load B then SET bits utilizing A as a masks implements A OR B
    • load B then CLR bits utilizing NOT(A) as a masks implements A AND B
    • load B then XOR bits utilizing A implements A XOR B

  • The DMA sniffer system itself helps computing a CRC32 on-the-fly whereas a channel is transfering information.

    The sniffer also can do a operating add whereas transfering information. The 32-bit add functionality makes

    different operations a lot simpler
    to implement.
  • Information being copied out of the sniffer information register may be logically inverted or bit-reversed.

  • Any DMA channel switch may be byte-swapped.
  • After all, writing sure SFRs might have system capabilities.

    As an illustration, writing to an i/o port units the worth of output pins.

    Or setting an interrupt flag will drive a cpu interrupt.

The programming course of has to map these uncommon primitive operations into acquainted mathematical and logical operations, and a few type of conditional department or soar. The sniffer add operation and the bitwise SFR operations means we are able to straight implement these capabilities.

However keep in mind that each fundamental operation is only a information transfer.

For add a sequence of DMA blocks might be:

  1. transfer one operand to the sniffer_data register
  2. transfer the opposite operand to the bit_bucket (discard) with sniffer enabled (this does the add)
  3. transfer sniffer_data register to the outcome tackle.

For shift-left we simply do an ADD of a variable to itself (multiply by two).

For a logic operation (OR, AND, XOR, and many others):

  1. transfer one operand to the sniffer_data register
  2. transfer the opposite operand to the sniffer SET, CLR, or XOR write tackle (e.g. DMA_SNIFF_DATA_CLR)
  3. transfer sniffer_data register to the outcome tackle.

For subtract of (A-B) now we have to explicitly compute the two’s complement unfavourable of B:

  1. transfer B to the sniffer_data register
  2. transfer 0xFFFFFFFF to the XOR write tackle ( DMA_SNIFF_DATA_XOR) to invert bits
  3. transfer unity to the bit_bucket (discard) with sniffer enabled (this provides 1 to from the two’s complement)
  4. transfer the A operand to the bit_bucket (discard) with sniffer enabled (this does an add)
  5. transfer sniffer_data register to the outcome tackle.

For a shift-right the method is way more annoying. A right-shift is a bit-reversed left-shift
:

  1. transfer the variable to the sniffer_data register
  2. transfer the sniff_rev_mask to the DMA_SNIFF_CTRL_SET.

    This can trigger a write from the sniff_data register to reverse the order of the bits within the phrase.
  3. transfer the sniff_data register to a temp_register (with bits reverse-order)
  4. transfer the temp_register again to sniff_data
  5. transfer the temp_register to the bit_bucket (discard) with sniffer enabled (doubling it; left-shift)
  6. transfer the sniff_data register to the outcome tackle (with bits reverse-order, restoring the right order)
  7. transfer the sniff_rev_mask to the DMA_SNIFF_CTRL_CLR.

    This turns off the bit-reverse possibility

An unconditional soar is straightforward.

One step: transfer the soar goal tackle to the DMA0 hadware learn tackle management phrase.

The toughest operation to get proper is a conditional soar. Each soar situation (e.g. soar on unfavourable quantity) have to be transformed into an absolute tackle and all information prospects (e.g. optimistic, zero, unfavourable) MUST JUMP! It’s because the final step of organising the conditional soar is to push information to the DMA0 {hardware}. This bizarre constraint signifies that soar situations have to be transformed to small integers representing block addresses. I’ll define the jump-on-negative-number-in-variable scheme.

  1. transfer variable to be examined to the sniffer_data register with DMA byte-swap turned on.

    This strikes the sign-bit to bit 7. Bits 4-6
    would be the identical because the signal bit, so long as absolutely the worth

    of the register is lower than pow(2,28).
  2. transfer 0xFFFFFFeF to the CLR write tackle ( DMA_SNIFF_DATA_CLR) to isolate bit 4.

    (Or any bits from 4 to 7). The outcome will probably be zero for a optimistic quantity (or zero) and 16 for a unfavourable quantity, for those who selected bit 4.
  3. transfer the specified tackle ADDR of a soar for optimistic enter to the bit_bucket (discard) with sniffer enabled.

    The outcome will probably be an tackle
    of both ADDR or ADDR+16, with 16 being the dimensions of 1 block in this system array.

    Every of those addresses might include an unconditional soar to anyplace else in this system.
  4. transfer the sniff_data register to the DMA0 hadware learn tackle management phrase to drive the precise soar to one of many two places.


The applications under are in blog-style reverse time order, latest stuff on the prime.

The next program checklist is in time-order.

  1. Take a look at program to validate fundamental execution mannequin and take a look at GPIO output, add, OR operation, conditional department, and unconditional soar.(23dec2022)
  2. Direct Digital Synthesis is used to check timer-regulated execution pace, SPI output, and mixing the DMA channel byte-swap operate and CLR-masking to isolate the highest byte of the 32-bit accumulator to make use of as an index right into a sine-table. Insertion of the pointer to the sine desk requires self-modifying code. Efficiency is sweet sufficient to make use of for audio synthesis charges. (28dec2022)
  3. Up to date take a look at program which implements add, subtract, shift-left, shift-right, and a few completely different methods of producing a conditional soar. (2jan2023)
  4. Refactored and generalized model. DMA channel dependencies are cleaned up for compatability with different software program (e.g. VGA technology). The fetch/execute structure is separated from the DMA program definition. (3jan2023)
  5. Use the DMAcpu machine to learn the ROSC random-bit, shift it into the sniffer, then use the outcome to compute a CRC32 worth utilizing the sniffer {hardware}, after which output that to an SPI channel to make audio white noise. (7jan2023)
  6. The white noise generator was low-passed filtered utilizing a 1-pole IIR, principally simply to see if the DMAcpu may do the arithmetic. It took a 38 step program about 4 uSec to compute a low-passed pattern. (11jan2023)
  7. Merging the DMAcpu with VGA technology. Since each use the DMA system closely, a take a look at was essential to see if both one broke when merged. Video additionally gave a method to visually take a look at the random quantity technology high quality. (11jan2023)
  8. Refining the DMAcpu random quantity generator and simulating Diffusion-Limited Aggregation (DLA). Whereas testing the DLA, I seen that there’s some serial correlation within the DMAcpu random quantity technology. This code eliminates the correlation. (13jan2023)


DMAcpu and DLA, with refined random quantity technology. (13jan2023)

DLA runs for 100s of tens of millions random quantity evaluations, moderately than the 100s of 1000’s used to generate the distributions in initiatives under. The older random generator produced a barely biased DLA, so I wrote a take a look at program that simply plotted sequential rands as factors in 2D. A transparent, however uncommon, diagonal line was produced. Introducing a slight delay decorrelated the ROSC bit, however slows down the random quantity generator to about one uSec. This pace is about the identical because the C rand() operate, however it’s a true (versus pseudorandom) random quantity generator. The 2 pictures use the improved DMAcpu random operate. The left picture has a one-pixel seed within the heart of the display screen. The picture on the precise has the textual content “ECE 4760” as a seed.

DLA code, ZIP ( additionally corellation test code, random distributions code)

DISCLAIMER! The ROSC isn’t proven by the producer or by me to have any dependable degree of random technology. Additional, of the three rp2040’s I’ve examined, every offers a considerably completely different oscillator pace, and completely different distributions of bits. Don’t use this for any important venture with out doing your individual checks!


DMAcpu and 256 coloration VGA. — Distribution testing. (11jan2023)

Take a look at applications from the random number generation page have been transformed to make use of the DMAcpu-generated random numbers. The ROSC ring oscillator rnd_reg is used to drive a CRC32 within the DMAcpu. The checks chosen have been a 20-coin toss binomial distribution and summing a number of uniform random numbers to kind a traditional distribution. The serial interface permits the consumer to decide on regular/binomial, a scale issue, and (for regular) the variety of uniform numbers to be added to make one approximate regular pattern. The photographs under present a binomial distribution with 1.3 million whole occasions, with every occasion being the variety of heads from 20 coin tosses. The traditional distribution is constructed from 3 million occasions every consisting of the sum of 12 uniformly distributed numbers. The crimson dot and blue dots are the anticipated distributions. There could also be barely too few samples close to the height within the regular distribution. The DMAcpu program is simply 4 blocks which carry out a CRC operation on the ROSC random bit and the final CRC outcome, shops the outcome, indicators the thread {that a} new worth is learn, then jumps again to the start block.

DISCLAIMER! The ROSC isn’t proven by the producer or by me to have any dependable degree of random technology. Additional, of the three rp2040’s I’ve examined, every offers a considerably completely different oscillator pace, and completely different distributions of bits. Don’t use this for any important venture with out doing your individual checks!

Code, ZIP


Filtering white noise utilizing DMAcpu.(11jan2023)

The white noise generator was low-passed filtered utilizing a 1-pole IIR, principally simply to see if the DMAcpu may do the arithmetic. It took a 38 step program about 4 uSec to compute a low-passed pattern. I do not assume it is a sensible use for the DMAcpu, however it did take a look at a number of capabilities. The filter carried out is an easy one-pole IIR filter with a filter coefficient set by proper shifting:

output = old_output + [(input – old_output) >> n]

It took a 38 DMA block program about 4 uSec to compute. Each channels are set to the DAC with the filtered output on channel A and unfiltered on channel B. Since my right-shift operate solely works with optimistic numbers, the precise operate computed was:

output = old_output + (enter>>n) – (old_output >> n)

Proven under are time area and spectra waveforms for an n=4.

Prime hint is the unfiltered noise in each the time area, and the magenta spectra on the precise.

Code, ZIP


Producing white noise from the DMAcpu. (7jan2023)

This system reads the ROSC random-bit
and shifts it into the sniffer information register. This shift register is then used as a seed for CRC32 {hardware} computation and the ensuing scrambled bits are truncated to 12-bits for the SPI DAC to supply good sounding white noise. The machine runs at a pattern price of fifty KHz within the instance code, however will run as quick at 500 KHz. The picture under exhibits the spectrum at a pattern price fo 100 KHz. The spectrum is down about 6 db at 50 KHz, however flat by means of the audio spectrum. The time to generate a brand new audio pattern is about 0.9 uSec. (the timing code within the linked program, however not proven under, provides 0.3 uSec.)

The code exhibits the simplifed block syntax.

Observe that sniffer operate alernates between add and CRC32.

Additionally, sending a worth by means of the sniffer twice, in add mode, doubles it (shift-left)


// dma_sniffer_ set so as to add: add operate code is 0xf within the calc discipline

build_block(&sniff_calc_mask, DMA_SNIFF_CTRL_SET, 1, STANDARD_CTRL);

// load a random bit fom ROSC to smell information reg: dma_hw->sniff_data

build_block(rnd_reg, &dma_hw->sniff_data, 1, STANDARD_CTRL);

// go shift-var via the sniffer twice to the bit_bucket

build_block(&dma_noise_temp, &bit_bucket, 2, STANDARD_CTRL | SNIFF_EN) ;

// retailer again to shift-var

build_block( &dma_hw->sniff_data, &dma_noise_temp, 1, STANDARD_CTRL);

// dma_sniffer_ set to CRC32: CRC32 operate code is 0x0 within the calc discipline

build_block(&sniff_calc_mask, DMA_SNIFF_CTRL_CLR, 1, STANDARD_CTRL);

// compute CRC32 build_block(&dma_noise_temp, &bit_bucket, 1, STANDARD_CTRL | SNIFF_EN) ;

// dma_sniffer_ set so as to add

build_block(&sniff_calc_mask, DMA_SNIFF_CTRL_SET, 1, STANDARD_CTRL);

// restrict to 12 bit information: masks worth is 0xffff000

build_block( &sniff_dac_data_mask, DMA_SNIFF_DATA_CLR, 1, STANDARD_CTRL);

// OR within the DAC management phrase

build_block( &dac_config_mask, DMA_SNIFF_DATA_SET, 1, STANDARD_CTRL);

// ship to DAC

build_block(&dma_hw->sniff_data, &spi0_hw->dr, 1, (DMA_CHAIN_TO(fix_chan) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_16) | DMA_IRQ_QUIET | DMA_EN));

// unconditional soar to begin of program

// push the DMA_blocks[0] tackle into this system counter (fetch channel learn pointer)

// !!NOTE that this block throttles the machine to the frequency of Timer 3 !!

// To run at full DMA pace, change to DMA_TREQ(DREQ_FORCE)

build_block(&DMA_blocks_addr, &dma_hw->ch[fetch_chan].read_addr, 1, DMA_CHAIN_TO(fix_chan) | DMA_TREQ( DREQ_DMA_TIMER3) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_IRQ_QUIET | DMA_EN) ;

Code, ZIP


Improved group of the take a look at program (3jan2023)

The
express DMA channel numbers have been changed with macros defining the precise channels in order that the machine may be extra simply used with different DMA-based protocols. The fetch/execute structure is separated from the DMA program definition to make it simpler to switch for brand new DMA progams. A macro is added to make the DMA channel management register specification extra compact and simpler to learn.

Code, ZIP


Extra operations and higher conditional jumps (2jan2023)

This system asks the consumer for 2 numerical values
then computes the sum, distinction, shifted values and the signal of the primary enter worth. Many of the explaination of this program is above, describing how every operation is decomposed right into a collection of data-moves.

Test program, project ZIP


An software: Direct Digital Synthesis (12/28/2022)

DDS is a beautiful instance as a result of it requires quick, exact, timing, however is actually a pointer increment adopted by a table-lookup, then a 16-bit SPI load. In different phrases, principally information movement. The DMA program takes to type of a linear set of directions with no branching, only a loop again to the start of this system. Quite than operating the DMAcpu at full pace, the loop again block is paced by one of many excessive precision DMA timers set to a set frequency. This system ran accurately as much as the restrict of the SPI DAC chip, 500,000 samples/sec, however for audio synthesis I used a decrease frequency tempo of 200,000 loops(samples)/sec. At that price, the DMA machine ran about 25% of the time. The generated frequency matched the mathematics to throughout the accuracy of my scope (about 0.1%).

Algorithm:

  1. dds_accum += dds_inc (32 bits) the place dds_accum is the DDS part accumulator

    and dds_inc is incremental pace of rotation of the phasor (proportional to the frequency)

    the place: dds_inc = Fout * pow(2,32 )/ Fs ; with Fs = 2e5 and Fout the specified sinewave frequency
  2. The excessive byte of dds_accum turns into the index into sine desk:

    Use DMA BSWAP to maneuver it to low byte of sniffer information register

    clear higher bytes utilizing the transport triggered CLR write with masks 0xffffff00

    Multiply by 2 to transform index right into a byte-count of quick ints
  3. add byte-count to the bottom tackle of the sine_table to kind a pointer to the following entry
  4. Shove the pointer simply computed into the NEXT BLOCK learn tackle, so it could copy the desk worth to the SPI channel
  5. Do a 2-byte switch from sine desk to SPI_data register.

    The SPI switch takes about 0.8 uSec, but when the pacing timer is about to 200 KHz no wait is critical right here.
  6. Stall ready for pacing timer, then soar again to step 1

The DMA-machine program:

  1. Ship a timing pulse to GPIO2. The size of the heartbeat would be the execution time of the loop.


    build_block(&pin_on, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


  2. ADD increment to the accumulator. That is the phasor used to search for a sine worth

    // === add dds_accum and dds_inc by transport-triggered operation in sniff reg

    build_block(&dds_accum, &dma_hw->sniff_data, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;

    // go dds_inc via the sniffer to the bit_bucket

    build_block(&dds_inc, &bit_bucket, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | SNIFF_EN) ;

    // retailer the sniff information reg again to dds_accum

    build_block( &dma_hw->sniff_data, &dds_accum, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;
  3. Kind pointer to subsequent sine-table entry from the accumulator

    // load dds_accum to sniffer BUT byte reversed! see BSWAP

    build_block(&dds_accum, &dma_hw->sniff_data, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | BSWAP) ;

    // clear excessive bytes — depart low byte alone — the clear_high_bytes masks is 0xffffff00

    build_block(&clear_high_bytes, DMA_SNIFF_DATA_CLR, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN ) ;

    // mult by 2 for ‘quick’ array pointer by addding sniffer to itself

    build_block( &dma_hw->sniff_data, &bit_bucket, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | SNIFF_EN) ;

    // add to sine desk base tackle

    build_block(&sine_table_addr, &bit_bucket, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | SNIFF_EN) ;
  4. Transfer the just-formed sine desk pointer into the NEXT BLOCK learn tackle

    build_block(&dma_hw->sniff_data, next_block_addr, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN ) ;
  5. Transfer sine desk entry to SPI related to DAC– spi0_hw->dr

    // NOTE that the learn tackle is only a place-holder for the earlier block to overwrite.

    // NOTE that the SPI CS line is pushed routinely by the write to the SPI information reg

    build_block(sine_table_addr, &spi0_hw->dr, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_16) | DMA_EN ) ;
  6. Clear the timing GPIO pin

    build_block(&pin_off, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;
  7. Soar again to the start, however look forward to the DMA pacing timer.

    // push the DMA_blocks[0] tackle into this system counter (DMA0 learn pointer)

    // !!NOTE that this block throttles the machine to the frequency of Timer 3 !!

    // To run at full DMA pace, change TREQ to DREQ_FORCE

    build_block(&DMA_blocks_addr, &dma_hw->ch[0].read_addr, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_DMA_TIMER3) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_IRQ_QUIET | DMA_EN) ;

DDS program, ZIP


First DMA take a look at program. (12/23/2022)

This program simply toggles an i/o pin to set off an oscilloscope, then runs by means of fundamental proof-of-concept
constructions. This program performs a number of operations utilizing solely DMA-logic. the DMA machine is totally asynchronous and unbiased of MAIN, as soon as began. MAIN units up the DMA machine program, defines variables for the machine, then prints the outcomes of an ADD and OR operation on the serial console. No different microcontroller sources are wanted (besides reminiscence, in fact) to make the machine run. The execution pace is about 8 million blocks/sec. To make life simpler I outlined a macro to insert DMA management blocks into the array defining this system.

build_block(read_addr, write_addr, depend, ctrl)

builds a DMA management block based on the specs within the information sheet.
Keep in mind that management blocks are pulled one-at-a-time from the array, positioned within the DMA1 {hardware} registers by DMA0, then triggered to perfrom the specified information transfer. After the transfer, DMA1 chains to DMA2 to reset the DMA0 write tackle to level to DMA1 management registers.

See Also

The DMA-machine program:

  1. Sends a two set off pulses to GPIO2, for an oscilloscope, and to time the machine.

    // === set the GPIO2 pin by transfering a management phrase on to the pad management register.

    The parameters on line 2 configure the DMA management register in order that the channel runs as quickly

    as attainable, with a width of 32 bits, no increments, and chaining to channel DMA2 when performed.


    The phrase transfered is 0x3300 which permits output and writes a 1.


    build_block(&pin_on, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


    // === clear the pin

    The phrase transfered is 0x3200 which permits output and writes a 0.

    build_block(&pin_off, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


    // === set pin — repeat

    build_block(&pin_on, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


    // === clear the pin — repeat

    build_block(&pin_off, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


  2. Provides two 32-bit variables and retailer the outcome again to a variable.

    Three DMA1 blocks:

    // === add two variables by transport-triggered operation in sniff reg

    // assumes: dma_sniffer_enable(1, sniffer_add, true);

    // === load a var to smell information reg: dma_hw->sniff_data

    build_block(&dma_var_1, &dma_hw->sniff_data, 1, DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;

    // == go one other var via the sniffer to the bit_bucket–data has to go via for the add to work.

    // the bit_bucket is only a dummy variable to discard transfered information. the add happens as the info passes

    // by means of the sniffer.

    // result’s in sniffer_data register


    // discover the SNIFF_EN is about to activate the add operate for one block


    build_block(&dma_var_2, &bit_bucket, 1, DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | SNIFF_EN) ;

    // = = retailer the sniff information reg again to var_2

    build_block( &dma_hw->sniff_data, &dma_var_2, 1, DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;
  3. Computes the OR of two 32-bit variables and shops the outcome

    These operations use transport-triggered operations constructed into SFR to implement logic.

    Three DMA1 blocks:

    // === OR two variables by transport-triggered operation in sniff reg

    // = === load a var to smell information reg: dma_hw->sniff_data

    build_block(&dma_var_3, &dma_hw->sniff_data, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;

    // == load one other var to the SET reg. EVERY SFR has SET, CLR, XOR


    build_block(&dma_var_4, DMA_SNIFF_DATA_SET, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


    // == retailer the sniff information reg again to var_5

    build_block( &dma_hw->sniff_data, &dma_var_5, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;
  4. Multiply a variable by a continuing.

    Three blocks:

    // === mult a variable by a continuing in sniff reg

    // on this case, occasions 4

    // by substituting the ‘4’ to a variable, you are able to do common mult

    // == clear sniff information reg: dma_hw->sniff_data (with no clear get MAC operation)

    build_block(&dma_var_0, &dma_hw->sniff_data, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


    // == go the var via the sniffer to the bit_bucket 4 occasions. notice SNIFF_EN is on

    build_block(&dma_var_6, &bit_bucket, 4,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | SNIFF_EN) ;


    // == retailer the sniff information reg again to var_2

    build_block( &dma_hw->sniff_data, &dma_var_7, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;
  5. Computes a conditional skip based mostly on consumer enter from a thread.

    Three DMA blocks to compute department:

    // === conditional skip

    // the dma_flag variable can take solely values 0, 16, 32, or 48 as set by consumer thread

    // these numbers correspond to leaping 0, 1, 2, or 3 blocks forward.


    // == learn flag to sniffer

    build_block(&dma_flag, &dma_hw->sniff_data, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;


    // == kind goal block tackle by including soar on zero tackle; jump_zero_addr = block_addr(16) ;

    // needed to depend the blocks to seek out out the following one AFTER the DMA0 load was block quantity 16.


    build_block( &jump_zero_addr, &bit_bucket, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN | SNIFF_EN) ;


    // transfer sniffer information to learn addr of DMA0 to drive subsequent learn from new location

    // sniffer incorporates zero_jump_address + offset to one_jump

    // == push the brand new block tackle to DMA0 block

    build_block(&dma_hw->sniff_data, &dma_hw->ch[0].read_addr, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_IRQ_QUIET | DMA_EN) ;
  6. The skip targets are in steps of 16 bytes/DMA block saved within the array.

    The targets simply change the size of a pulse on GPIO2.

    // === TARGET if dma_flag == 0 — THIS is block quantity 16 in this system checklist

    // == set pin

    build_block(&pin_on, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;

    // TARGET if dma_flag == 16

    // == set pin

    build_block(&pin_on, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;

    // TARGET if dma_flag== 32

    // === clear the pin

    build_block(&pin_off, &iobank0_hw->io[2].ctrl, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_EN) ;

    // === TARGET if dma_flag == 48

    // === unconditional soar to begin of program (#1 on this checklist)

    // push the DMA_blocks[0] tackle into this system counter (DMA0 learn pointer)

    build_block(&DMA_blocks_addr, &dma_hw->ch[0].read_addr, 1,

    DMA_CHAIN_TO(2) | DMA_TREQ( DREQ_FORCE) | DMA_DATA_WIDTH(DMA_SIZE_32) | DMA_IRQ_QUIET | DMA_EN) ;

Test code, ZIP


 

 


Not present under this line!

The DMA test program performs a number of operations utilizing solely DMA-logic. the DMA machine is totally asynchronous and unbiased of MAIN, as soon as began. MAIN units up the DMA machine program, defines tables for the machine, then prints the outcomes of an ADD and NOR operation on the serial console. No different microcontroller sources are wanted (besides reminiscence, in fact) to make the machine run. The execution pace is about 200,000 blocks/sec.

The DMA weird-machine program:

  1. Sends a set off pulse to PortA, for an oscilloscope

    One DMA2 block to switch one byte to LATA.
  2. Increments a variable for use within the arithmetic under

    Two DMA2 blocks:


    transfer variable worth to low byte of supply tackle discipline (increment array) of subsequent DMA block.

    — transfer worth in supply tackle
    to variable. Contents of increment array is (supply addr low-byte worth)+1.
  3. Provides two 8-bit variables and shops the outcome (terminal image displaying sum, modulo 256, and results of NOR operation under)

    4 DMA2 blocks:

    — transfer variable_1 worth to low byte of supply tackle discipline (increment array) of DMA block 3 blocks later.

    — transfer variable_2 worth to low byte of supply size discipline (increment array) of DMA block 2 blocks later.


    transfer variable_2 worth to low byte of cell size discipline (increment array) of DMA block 1 block later.

    — increment by means of the desk specified by the earlier three blocks and retailer outcome into variable_4.
  4. Computes the NOR of the 2 8-bit variables and shops the outcome

    These operations use transport-triggered operations constructed into SFR to implement logic.

    4 DMA2 blocks:

    — transfer variable1 worth to an SFR that helps (CLEAR, SET, INVERT) write places.


    — transfer variable2 worth to the SFR SET write location.

    — transfer 0xff worth to the SFR INVERT write location.

    — transfer NOR worth within the SFR to variable_3
  5. Units a print strobe, to be cleared by MAIN when the variables are printed.

    That is essential as a result of the DMA machine is totally asynchronous with the CPU


    — One DMA2 block to maneuver a 0x01 to the print strobe, which is cleared by the CPU in MAIN.
  6. Computes a conditional department to see if the print strobe is cleared, and loop till it’s cleared.

    5 DMA blocks to compute department:


    transfer print_strobe to low byte of supply tackle discipline (offset array) of subsequent DMA block.

    This successfully multiplies the logical 0/1 to 0/4 as a result of the soar tackle is 4 bytes

    — transfer the offset to
    low byte of supply tackle discipline (soar array) of subsequent DMA block.

    This can choose the soar tackle entry from the soar desk.



    transfer the precise goal block tackle to
    low two byte of supply tackle discipline of the DMA0 block two blocks forward.

    — transfer the following block to DMA0 management registers.

    — outline
    the DMA0 block to be moved by the earlier block.
  7. Increments a variable, modulo 3, to decide on one among three output wavefroms to ship to PortB
  8. Computes a conditional department to one among three waveform mills based mostly on the mod 3 variable:

    — 1 microsec pulse

    — 2 microsec pulse

    — 8 microsec pulse
  9. Unconditional soar again to the start of this system (merchandise 1 on this checklist)

Fetch/execute machine particulars:

The syntax given under assumes that DMA0 or DMA2 block pictures may be constructed by defining them utilizing PLIB DMA instructions, then copying the blocks (within the perparation stage) to the big array or arrays of block pictures. Macros cover the precise preparation and summary the preparation to specifying the supply tackle, vacation spot tackle, supply size (in bytes), vacation spot size (in bytes), and the cell switch size (in bytes). For instance:

make_DMA2_block(LED_pattern2, (void*)&LATB, 64, 1, 64);

constructs and shops a block destined to be copied to the DMA2 management block which strikes 64 bytes from a reminiscence array, in a burst of 64, to the 1 byte
port B output latch. The impact is to generate a burst of output transitions on the port, when the block is later loaded into the DMA2 management register and executed.

  • Output to a port:

    There at the least two methods to make a pulse on an output pin. A method was described above of transferring an array to a port.

    The opposite approach is to use the PIC32 special-function-register shadow registers defined in a separate bullet under:

    make_DMA2_block(&test_var_0, (void*)&PORTB, 1, 1, 1); // test_var_0 = 0x01

    make_DMA2_block(&inv_mask, (void*)&PORTBINV, 1, 1, 1); // inv_mask = 0xff


    The primary block activates port B, bit zero. The second block inverts all the latch byte to show off bit zero.
  • Increment a variable:

    Increment makes use of the worth of a variable as an index into an array which has contents equal to index+1. For this to work you have got to have the ability to copy the variable into the low byte of the supply tackle of the following block. You additionally must align the array in reminiscence in order that the bottom tackle of the array has a zero-valued low order byte. Perparation contains the array definition, which on this case causes the byte-aligned variable to cycle by means of three values. The macro next_blk_src_addr calculated to reminiscence tackle of the supply discipline of the following outlined block.

    unsigned char jmp_inc_array[] __attribute__ ((aligned(256))) = {1, 2, 0} ;

    make_DMA2_block(&inc_value, next_blk_src_addr, 1, 1, 1);//load to low-byte of supply tackle in subsequent block

    make_DMA2_block(jmp_inc_array, &inc_value, 1, 1, 1); // learn array entry again into variable
  • Logic operations:

    Particular operate registers (e.g. DMA management blocks, or i/o ports), that are writable all have three shadow, write-only, registers. The shadow registers are write-only addresses and act as bit-mask modifiers of the principle register. The three registers let you set, clear, or invert particular person bits in the principle register. For example, writing 0x04 to PORTBINV negates the third little bit of PORTB. The final naming scheme is sfrSET, sfrCLR and sfrINV. I selected to make use of sfr which can be usually used to set evaluate information for DMA switch termination. The precise termination operate isn’t turned on, so these registers are usually not utilized by the bizarre machine. For instance, to compute the bit-wise NOR:

    make_DMA2_block(&test_var_1, scratch_sfr, 1, 1, 1); //scratch_sfr load

    make_DMA2_block(&test_var_2, scratch_sfr_set, 1, 1, 1); // OR in one other variable

    make_DMA2_block(&inv_mask, scratch_sfr_inv, 1, 1, 1); // invert to make NOR operation

    make_DMA2_block(scratch_sfr, &test_var_3, 1, 1, 1); // retailer to 3rd variable
  • Add two variables:

    So as to add two variables you could provide three items of knowledge to 1 block for one desk lookup. The desk lookup makes use of the truth that transferring a collection of bytes is precisely the counting operation which is critical so as to add. Variable test_var_1 is used because the offset into an array to begin the depend. Variiable test_var_2 is used as each the dimensions of the supply, and the variety of bytes to maneuver, however all the bytes transfer to 1 goal byte, basically making a counter. There an edge situation when test_var_2=0, which signifies that the dimensions of the switch must be at the least 256 bytes, therefore the 0x100 dimension within the final block. As traditional, the precise worth of a table-lookup is substituted into the low-order byte of the following block’s supply discipline, or dimension discipline. Observe that the inc_array is 768 byte lengthy (256×3) to account of the zero edge case and longest attainable increment sequence.

    // –load first operand into blocK+3 supply addr

    make_DMA2_block(&test_var_1, (void*)(DMA_blocks+length_of_block*(N+3)+DCH0SSA_OFFSET), 1, 1, 1);

    // — load second operand into block+2 supply SIZE !!!!CANNOT BE ZERO!!!

    // Therefore the 0x100 offset two blocks down


    make_DMA2_block(&test_var_2, (void*)(DMA_blocks+length_of_block*(N+2)+DCH0SSIZ_OFFSET), 1, 1, 1);

    // — load second operand into block+1 cell SIZE !!!!CANNOT BE ZERO!!!

    // Therefore the 0x100 offset within the subsequent block


    make_DMA2_block(&test_var_2, (void*)(DMA_blocks+length_of_block*(N+1)+DCH0CSIZ_OFFSET), 1, 1, 1);

    // –read sum array entry into variable

    make_DMA2_block(inc_array, &test_var_4, 0x100, 1, 0x100);
  • Conditional soar:

    Since each operation is a reminiscence switch, every department of a computed soar should soar, in order that the operations are uniform. There are a number of steps required.

    1. Assuming that the department is dependent upon the worth of a byte variable, the department scheme wants a lookup desk wherein every entry is the 4*(byte worth).

      instance of a modulo 3 multiply desk:

      unsigned char offset_array[] __attribute__ ((aligned(256))) = {0, 4, 8} ;

      The multiply desk will probably be used to generate an 4-byte offset for every attainable increment worth.


      The desk
      have to be aligned in reminiscence in order that the bottom byte of the tackle is zero.
    2. The department scheme additionally wants a lookup desk wherein every entry is the precise reminiscence tackle of the following block to execute,

      relying on the index, which is the worth of the incremented variable


      The tackle will probably be moved into the brand new DMA0 block.

      instance of
      soar desk: unsigned int jmp_array[3] __attribute__ ((aligned(256))) ;

      later within the code, set

      io_jmp_array[0] = DMA_blocks + one_short_pulse_label*length_of_block ;

      io_jmp_array[1] = DMA_blocks + two_short_pulse_label*length_of_block ;

      io_jmp_array[2] = DMA_blocks + one_long_pulse_label*length_of_block ;


      However NOTE that these are digital addresses which have to be transformed to bodily addresses.

      The conversion is finished by solely utilizing the decrease two bytes of the array worth.
    3. DMA block N copies the 1-byte increment variable into the low byte of the supply tackle discipline of block N+1,

      which incorporates the bottom tackle of the offset array.
    4. DMA block N+1 copies the modified offset array tackle contents into the low byte of the supply tackle discipline of block N+2,

      which incorporates the bottom tackle of the soar array.
    5. DMA block N+2 copies the modified soar array tackle contents into the low 2 bytes of the supply tackle discipline of block N+4,
    6. DMA block N+3 copies block N+4 into the DMA0 management registers
    7. DMA block N+4 is the block (with new supply pointer) to truly copy into the DMA0 management registers,

      thus implementing a soar by updating the supply pointer to the DMA block checklist in reminiscence.


    make_DMA2_block(&inc_value, next_blk_src_addr, 1, 1, 1);

    make_DMA2_block(jmp_offset_array, next_blk_src_addr, 1, 1, 1);

    make_DMA2_block(io_jmp_array, (next_blk_src_addr+length_of_block), 2, 2, 2);

    make_DMA2_block(next_blk_addr, DMA0_addr_2, length_of_block, length_of_block, length_of_block);

    make_DMA0_block(DMA_blocks, DMA2_addr_2, number_of_blocks*length_of_block, length_of_block, length_of_block);

  • Unconditional soar:

    An unconditional soar requires two blocks. This primary block strikes the second block to DMA0 management registers.

    The DMA0_addr is the loacation of the DMA0 management registers, DMA_blocks the the tackle of the soar goal.

    The second block is to be moved to DMA0 to drive soar to starting of pgm and begin loading blocks into DMA2

    make_DMA2_block(next_blk_addr, DMA0_addr, length_of_block, length_of_block, length_of_block);

    make_DMA0_block(DMA_blocks, DMA2_addr, number_of_blocks*length_of_block, length_of_block, length_of_block);

Direct Digital Synthesis — A attainable sensible use for the DMA machine (and optimizing execution)

DDS makes use of a table-lookup to ship sine values to a SPI-attached DAC. It’s attainable to do DMA switch to the SPI utilizing framed mode, which autogenerates a chip choose on the channel slave-select line. Nonetheless, the chip choose is restricted to 1 pin and there can solely be one peripherial on the channel. The serial DMA machine lets you outline an arbitrary chip choose pin and manipulate it. The draw back is that the utmost pace for the switch is round 11.4 Ksamples/sec (when utilizing the usual 192 byte full DMA block definition). The example code waits for a timer occasion, toggles the chip choose, sends two bytes by means of SPI to the DAC, increments an array pointer, then auto-loops again to attend the start for a timer occasion. To show off the machine, simply freeze timer3 in order that one other SPI system can entry the bus. The demo code does this with a serial command.


The speed-limiting step within the DMA bizarre machine execution is loading the 192 byte blocks for each operation. Cautious consideration of the contents of the DMA management block means that the final two phrases are usually not wanted for this machine (except you attempt to use transport-triggered evaluate). This shaves 32 bytes off. One other 12 bytes may be pulled off the top as a result of every management register has three shadow registers for transport-triggered logic operations. The primary phrase of the block is fixed and may be set as soon as, saving 16 bytes. The web result’s 132 byte transfers which hurries up execution by about 1.5 occasions. The pattern price jumps to 18 Ksamples/second. Code.

— Simply operating the DAC switch as quick as attainable with NO time-trigger management hurries up the pattern price to 23 Ksamples/sec. The pace up occurs as a result of the block dimension is reduce to 100 bytes (minimal). The minimal dimension doesn’t embrace the power to arrange a time set off utilizing the DMA block interrupt detect {hardware}. Code.

— Altering the code to make use of 2-byte transfers to the SPI channel requires a modified increment desk which limits the utmost sine decision to 128 samples/cycle. The scheme makes a table wherein the increment is 2, moderately than one. The impact is to take away two blocks from the DMA-block DDS loop, and elevating the utmost synthesis frequency to 23.6 KHz (nonetheless with timer management). For the DDS sinewave this offers a frequency vary from 2.95 Hz for a 8-sample sine to 184 Hz for a 128-sample sine. The 23.6 KHz synthesis price corresponds to a timer interval of 1700 cycles. Because of this altering the pattern price permits frequency management of higher than 0.1%. Altering the size of the sine desk by one pattern yields frequency management of 1/(sine_table_size).

Code. <<use this model for DDS>>

Pseudorandom or random sequence technology

This instance makes use of the CRC {hardware} module to generate a pseudorandom 16-bit
quantity sequence. OPtionally, studying a floating ADC enter provides some entropy to make the sequence actually random, however not cryptograhic high quality. The sequence is output by means of the SPI DAC interface for spectral evaluation. If the ADC is used, it’s learn each eighth interation of the LFSR, with 8-bits of the ADC studying XORed with the decrease 8-bits of the LFSR seed. Operating the CRC LFSR, emitting the SPI information, computing the conditional ADC learn all runs at about 10KHz. The code wants a 16-bit SFR to make use of as a 16-bit ALU. The OCR5 set/reset registers have been used. This model of the code optimizes for pace by eliminating attainable timer management, so the system simply runs as quick as it could. Eliminating SPI output would pace up random quantity genration about 30%. Eliminating the ADC learn would pace it up by about 25%, however makes the sequence fully repeatable, and depending on the preliminary seed chosen. The output noise spectrum drops with a 3db level at about 25% of the pattern frequency and a minimal on the pattern frequency at the least 30db down.

Code (with ADC learn each three LFSR operations)

Spectrum of DAC output with no ADC reads. Pattern price is about 16.8 KHz.


Older variations:

Time synced operation:

It’s attainable to sync total machine operation to a timer by modifying one DMA2 block definition to set off a switch on a timer occasion. Observe that it is a blocking-wait, which kills DMA execution till the timer occasion. The might be helpful for a small program that, for instance, sends a phrase to the SPI channel on an everyday schedule to run a DAC. The sequential machine would look forward to a timer occasion, drop the chip-select line, switch a phrase to the SPI buffer, elevate the chip-select line, then loop to attend for the following timer occasion. The DMA2 block definition which waits, then executes a NOP might be:

DmaChnOpen(2, 0, DMA_OPEN_AUTO);

DmaChnSetTxfer(2, &inc_value, &bit_bucket, 1, 1, 1);

DmaChnSetEventControl(2, DMA_EV_START_IRQ(_TIMER_3_IRQ));

DmaChnSetEvEnableFlags(2, DMA_EV_CELL_DONE);

DmaChnEnable(2);

memcpy(DMA_blocks+length_of_block*N, &DCH2CON, length_of_block);

N++;


This
code runs the principle DMA loop at 100 Hz by ready for timer3 occasion.

Optimizing take a look at code execution pace

— The execution pace of the DMA machine is restricted by the necessity to load a 192 byte management block for every operation. By lowering the felxibility of the machine, sure chunks of the DMA2 block don’t have to be reloaded every time. An optimized model with about 1.4 speed-up minimizes DMA2 block updates, however nonetheless permits full capabilities described above. optimized code.
The minimal execution time for one block dropped from 10 µsec to 7 µsec as a result of the bytes per block have been decreased from 192 to 132.


It’s attainable to optimize additional, however the capacity to set off a block from an outdoor supply (maybe a timer) is misplaced. By eliminating the copy of the interrupt management registers, the copy depend drops to 100 bytes, and the minimal block execution time drops to five.5 µsec. The general test code above nonetheless runs, however time sync is way more durable. The DCHxSSA, supply tackle, register is the primary tackle copied and the DCHxCSIZ, cell dimension register, is the final (see datasheet web page 52).

A unique (and doubtless inferior) method to run the Fetch/Execute cycle

The strategy used above is perfect when it comes to losing no cycles as a result of the fetch/execute cycle is asychronous. As quickly as an operation finishes, then subsequent one can begin. Nonetheless, all 4 DMA channels are wanted to make the machine run. One channel is the fetch unit, one other is the execute unit, and two others are simply used to clear interrupt flags within the first two channels.
If a timer and output evaluate unit are used to generate two time-synched interrupt flags, then the 2 (fetch and execute) DMA channels may be triggered by the interrupt flags. The up aspect of this scheme is that it frees up two DMA channels. The down aspect is that the slowest operation determines the execution price of the machine. Most operations are pretty quick, however add is way slower and department is just a little slower. Together with add operation drops efficiency by an element of 10. Department operation drops efficiency by an element of two.5. Tuning turns into fairly dificult. However for reference, a operating code (with out add operation) is included which runs about 0.4 as quick because the async code. Code.


Copyright Cornell College
January 13, 2023

 


Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top