There are multiple ways of "magic happening":
- blocking mode - CPU actively does nothing, but checks the peripheral's status until transmit is done
- interrupt mode - CPU hands off data and then proceeds with dealing with whatever it has to do. It gets interrupted, once transfer is complete and then can act upon it (store to memory, parse or whatever)
- direct memory access (DMA) mode - CPU hands off the data and proceeds. Peripheral meanwhile sends and receives data directly from/to the memory region defined and notifies the CPU only when the transfer is done. At that point CPU can access data already in memory, it does not have to be fetched from peripheral anymore
STM provides AN-4031 application note in which DMA functionality is described. STM32F4 has 2 DMA controllers, each responsible for its own set of peripherals. Each controller has 8 streams, each stream bound to specific peripherals. Tables 1 and 2 show each controller stream/peripheral mappings. We are interested in SPI5, since that's where on-board gyro is connected to. SPI5 is available on DMA2 Streams 3, 4 (channel 2), 5 and 6 (channel 7). Both transmit (TX) and receive (RX) channels are available as separate streams. Since streams have hardware priority according to inverse of their number (stream0 is of higher priority than stream2), care must be taken with distributing streams in application to avoid race conditions and bad states. But we don't care much about it now, because our application is tiny and we are not using any more streams which could affect our lives.
OK, let's get to coding. First, we need to initialize DMA, by enabling its clock and defining streams. We'll do that in the same SPI5 configuration (spi.c):
What is happening here, is we define 2 handles for DMA configuration. We leave actual SPI peripheral initialization code as is, but in HAL_SPI_MspInit() we enable clocks for DMA2 channel and then configure streams themselves. Since DMA on STM32 is quite flexible, you can have it working only on transmit and receive. So we have to set up two DMA channels - one for RX and one for TX. Once DMA is configured, we initialize it and link it to the SPI type itself - it now holds a reference to the DMA instance, which is used internally for transfer management.
Last, but not least, we should enable interrupts, so that we get some feedback on when the transfer has been completed.
In cleanup function we also should stop the DMA2 clock (if it's not used anywhere else) and deinitialize the DMA channels we were using.
HAL also provides callback functions for TX completed, RX completed and TXRX (transceiving) completed, which can be used once interrupt service routine has been completed (clearing flags, etc). Don't worry, it's done by HAL internally.
You might notice some weirdness happening in the callback function. What I have done, is defined a custom structure, which I fill with data in callback functions. This structure is defined in spi.h:
SPI_QueueItem_t strucure has two members - an uint32 value and a pointer to a character sequence, which I'll use to store a string value with info on origin of the structure.
Now it's time for a brief theoretical intermission.
QueuesI will use this structure, to pass around received data from interrupt to a processing task using a queue. A queue is just that - a fixed-length list of items. A task can be assigned to monitor a queue and take items from the list. Items are put in the list by some other tasks. Normally, items can be taken only once. Initially this list is empty and monitoring task is in suspended state and gets woken up (transitions into Ready state) once an item is available in queue. As soon as processing task gets some CPU time (there are no tasks of higher priority blocking it), it'll take one item from the queue and process it, thus freeing a slot in the queue. Queues by nature are first-in-first-out (FIFO), so care must be taken to give some time slot for data processing, otherwise the data can get stale.
Queue by default can pass a single uint32 value, which, conveniently enough, is just the right amount of space to hold a pointer to a memory location, which can store whatever - from series of bits to bitmapped images. The latter is what we are going to do - we will pass a pointer instead of value, since I want not just a value, but also a source of the data. For this we need a mechanism for managing memory and this is where comes in
Memory poolingA memory pool is a bunch of memory that is assigned for storing a number of particular objects. Usually application developer is able to predict types and amount of data and thus the amount of memory required for storing temporary data. As with a queue, it is fixed-size, except one can retrieve any value and in any order, using a reference (pointer). Object is stored in memory until it is specifically freed.
So, back to code. In our spi.h we define that we'll use external queue and a memory pool, as well as TX/RX buffers. In spi.c callback function we request a slot for a new object from our spi_pool pool by calling osPoolAlloc(spi_pool). This function returns a pointer to allocated space. We cast this pointer to our SPI_QueueItem structure pointer and then fill it with values. Once structure is populated, we put pointer to it into the queue.
Let's look at how do we define memory pool and queue for our data in freertos.c:
First, we use osPoolDef() macro to define pool, its name, depth and content type. Then we define a global variable for passing it around. In MX_FREERTOS_Init() we create the actual pool before using it. To create an object in the pool, we once again use osPoolAlloc() to get a pointer and use pointer to assign values. Here we create a test value with dummy data just to see, that the pool and queue processing works correctly.
Up to now we can fill the pool, but have no way of retrieving anything from it. For this we'll create a queue to pass around pointers to items in our pool. osMessageQDef() and osMessageCreate() deals with that. We create queue the same size as the pool itself and assign it to a global for use elsewhere.
vGyroPrinterTask() deals with processing the queue and freeing the pool. By default it sits in suspended state (waiting forever) until message comes in. Messages in CMSIS OS are implemented as a subtype of events, so we have to check for event type before we start processing it. Once it's clear, that it actually is a message, we can check for number of messages waiting in the queue and then process them until the queue is empty. Otherwise our thread will process a single message and then wait until the next tick to process next one.
In the processing loop we fish out pointer, and, since we don't have any other message types, we assign them to our SPI_QueueItem type pointer. At which point we get access to the members of the structure and can print them out. Once we are done with the object, we throw it out of the pool by telling memory management to free item at this location via osPoolFree().
Now we would like to populate the queue from DMA interrupts. The callback function defined in spi.c can be used for that. Otherwise for passing the data from DMA, we can use ISRs. Particulary interrupt processing in stm32f4xx_it.c:
Here we once again create an item in the memory pool and send pointer to the queue. I have also defined a handler for interrupt-based transmit, if anybody wants it, but it is not required for dealing with DMA.
NB! To get access to queues (or any other FreeRTOS API functionality) within interrupts, we have to increase configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY in FreeRTOSConfig.h from 5 to 3 (interrupt priorities are also inverted, i.e. smaller numbers mean higher priority). I spent a bit of time debugging this, until I found out about this issue. Turns out, STMCubeMx does not assign these values correctly while generating code.
Now last touch - actually using all this setup in freertos.c:
Here we still send data in blocking mode. Afterwards we do the same in DMA mode. While DMA is working in the background, we print the peripheral state in a loop without delays (well, actually delay is while UART is sending, since UART right now is working in blocking mode). Anyhow, first printout of the state should be "5" or "busy", once it's done sending, printouts should stop. Memset is used to clear the receive buffer to show, that we are actually receiving the data, not just reusing values already there.
This setup should show callback hierarchy - first DMA IRQ should be printed, and afterwards SPI general callback. And the result is as expected:
- Blocking mode still works
- DMA is transferring in background, while UART is blocking for printing
- Once UART stops blocking, data is available in the same thread
- Sending in non-blocking mode with interrupt gets executed and interrupt gets called (but no data for some reason). Too lazy right now to debug it and I'm not particularly interested in such usecase. Might return to it eventually
- Messages are put from both ISR and callback in this order.
- Lowest priority tasks (queue processor and blinky) get executed last
- FIFO nature of message queue.
As usual, sources are available on GitHub