STM32 HAL I2C IT/DMA gotcha

I2C is one of the most popular buses in use for sensors. It is not particularly fast or robust, but it's simple and supported by most of the microcontrollers. It is also reasonably easy to bit-bang. The problem is, that most of the implementation examples I found online (as well as issues) are related to polling mode communications. But I wanted to do DMA and couldn't understand, why doesn't it work out of the box.

STM32F4 external SDRAM with HAL

STM32F427/9 has this nice FMC module, which stands for Flexible Memory Controller. It is able to deal with all kinds of memories. For now I want to add external SDRAM to my project to hold large(ish) amount of data captured by sensors (images) for later retrieval over slower UART. 8MB (64Mbit) on Discovery board is quite a bit, but it can hold only 1 raw 5 megapixel image, perhaps multiple JPEGs. I would like to have a bit more, but let's figure out the setup first on a DISCO board, before trying to improve on that.

Debugging STM32 with JLink and Eclipse

Debugging with JLINK

J-Link is a rather nice general-purpose ARM debugger and offers more standardized GDB implementation than the ST-Link. Also, EDU version is priced at about 50 euros, while basic full version is not that much more expensive at about 300 euros. The super-duper Ultra version goes for about 600, or 800 with Ethernet. Since I use it for research and education, non-commercial EDU version suits me just fine. Although it does have some limitations license-wise, it is identical to JLink Base.

10+ Mbit UART part II

Previously I wrote about about my need for high-speed RS-485 interface. I finally had both parts and time to try to set up and test it. I assembled the whole thing on a breadboard to simulate the noisy environment (definitely not because I didn't want to draw a board which might not work).

STM32F4 interrupt status register gotcha

I have been trying to implement UART line idle detection. Idle condition is considered, if no control bits are received within time frame of next byte.

Seemed reasonably simple - enable RXNE (receive buffer not empty) and IDLE interrupts. Sometimes it would work, and sometimes it wouldn't. All Most of the time I had USART peripheral memory reading enabled in debugger and it would show, that the IDLE bit is set in status register, but code would read the value without the bit. And, of course, that would break all the ISR logic. After banging my head against it for 2 days, I realized, that to reset the IDLE flag, one has to read the register. And then it dawned on me, that my debugger is reading the register before code gets there, and, hence, it is reset.


10+ Mbit USART signal on STM32F429

Previously I wrote about the need for high-speed UART signal to feed into RS-485 line. Today I decided to try getting the required baud rate out of STM32F429-Discovery board.

The quest for 10+ Mbit RS-485 for STM32

I would like to move relatively large amounts of data from my STM32F427 device to a remote PC. Speed wouldn't be an issue, if not for the distance, which is not known at this point, but assumed at least 15-20 meters up to a couple of hundred meters.
For my project I decided that 4 to 10 megabits should be sufficient to manage sending away of all the generated data before the next measurement cycle.

Custom STM32 board and USB re-enumeration issue

Lately I've been extensively using STMs USB Virtual Communications Port (VCP) - a serial port implementation over USB. It works reasonably well with the Discovery board I used for prototyping. But not so much with my custom board.

FreeRTOS and Semihosting issues

Semihosting is a way of providing missing I/O resources to the development platform. Most popular use of it is as debug output console using standard printf(). Turns out, there are some certain limitations to it.

STM32 + HAL + FreeRTOS Part V: SPI (with DMA)

The main flow of SPI (or any other communications for that matter) is such, that a CPU generates data to send, passes it along to the peripheral (or bit-banging logic, but that's out of scope) and then waits  for magic to happen.
There are multiple ways of "magic happening":
  • blocking mode - CPU actively does nothing, but checks the peripheral's status until transmit is done
  • interrupt mode - CPU hands off data and then proceeds with dealing with whatever it has to do. It gets interrupted, once transfer is complete and then can act upon it (store to memory, parse or whatever)
  • direct memory access (DMA) mode - CPU hands off the data and proceeds. Peripheral meanwhile sends and receives data directly from/to the memory region defined and notifies the CPU only when the transfer is done. At that point CPU can access data already in memory, it does not have to be fetched from peripheral anymore
DMA provides quite a few benefits - CPU does not have to worry about arranging transmission or data storage. It can even go to sleep, thus saving power. Or in data-intensive applications it can process the data batch, while another is on its way. So, what we get is extra clock cycles and possible power saving.

STM provides AN-4031 application note in which DMA functionality is described. STM32F4 has 2 DMA controllers, each responsible for its own set of peripherals. Each controller has 8 streams, each stream bound to specific peripherals. Tables 1 and 2 show each controller stream/peripheral mappings. We are interested in SPI5, since that's where on-board gyro is connected to. SPI5 is available on DMA2 Streams 3, 4 (channel 2), 5 and 6 (channel 7). Both transmit (TX) and receive (RX) channels are available as separate streams. Since streams have hardware priority according to inverse of their number (stream0 is of higher priority than stream2), care must be taken with distributing streams in application to avoid race conditions and bad states. But we don't care much about it now, because our application is tiny and we are not using any more streams which could affect our lives.

OK, let's get to coding. First, we need to initialize DMA, by enabling its clock and defining streams. We'll do that in the same SPI5 configuration (spi.c):
#include "spi.h"
#include "gpio.h"

SPI_HandleTypeDef hspi5;
DMA_HandleTypeDef hdma_rx;
DMA_HandleTypeDef hdma_tx;

/* SPI5 init function */
void MX_SPI5_Init(void) {

  hspi5.Instance = SPI5;
  hspi5.Init.Mode = SPI_MODE_MASTER;
  hspi5.Init.Direction = SPI_DIRECTION_2LINES;
  hspi5.Init.DataSize = SPI_DATASIZE_8BIT;
  hspi5.Init.CLKPolarity = SPI_POLARITY_HIGH;
  hspi5.Init.CLKPhase = SPI_PHASE_2EDGE;
  hspi5.Init.NSS = SPI_NSS_SOFT;
  hspi5.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_16;
  hspi5.Init.FirstBit = SPI_FIRSTBIT_MSB;
  hspi5.Init.TIMode = SPI_TIMODE_DISABLE;
  hspi5.Init.CRCPolynomial = 10;
  if (HAL_SPI_Init(&hspi5) != HAL_OK) {
    _Error_Handler(__FILE__, __LINE__);


void HAL_SPI_MspInit(SPI_HandleTypeDef* spiHandle) {

  GPIO_InitTypeDef GPIO_InitStruct;

  if (spiHandle->Instance == SPI5) {

    /* SPI5 clock enable */
    /**SPI5 GPIO Configuration    
    PF7     ------> SPI5_SCK
    PF8     ------> SPI5_MISO
    PF9     ------> SPI5_MOSI 
    GPIO_InitStruct.Pin = SPI5_SCK_Pin | SPI5_MISO_Pin | SPI5_MOSI_Pin;
    GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
    GPIO_InitStruct.Pull = GPIO_NOPULL;
    GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
    GPIO_InitStruct.Alternate = GPIO_AF5_SPI5;
    HAL_GPIO_Init(GPIOF, &GPIO_InitStruct);

    hdma_rx.Instance = DMA2_Stream3;
    hdma_rx.Init.Channel = DMA_CHANNEL_2;
    hdma_rx.Init.Direction = DMA_PERIPH_TO_MEMORY;
    hdma_rx.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_rx.Init.MemInc = DMA_MINC_ENABLE;
    hdma_rx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_rx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
    hdma_rx.Init.Mode = DMA_NORMAL;
    hdma_rx.Init.Priority = DMA_PRIORITY_VERY_HIGH;
    hdma_rx.Init.FIFOMode = DMA_FIFOMODE_DISABLE;
    if (HAL_DMA_Init(&hdma_rx) != HAL_OK) {
        _Error_Handler(__FILE__, __LINE__);

    __HAL_LINKDMA(&hspi5, hdmarx, hdma_rx);

    hdma_tx.Instance = DMA2_Stream4;
    hdma_tx.Init.Channel = DMA_CHANNEL_2;
    hdma_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
    hdma_tx.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_tx.Init.MemInc = DMA_MINC_ENABLE;
    hdma_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
    hdma_tx.Init.Mode = DMA_NORMAL;
    hdma_tx.Init.Priority = DMA_PRIORITY_VERY_HIGH;
    hdma_tx.Init.FIFOMode = DMA_FIFOMODE_DISABLE;
    if (HAL_DMA_Init(&hdma_tx) != HAL_OK) {
        _Error_Handler(__FILE__, __LINE__);

    __HAL_LINKDMA(&hspi5, hdmatx, hdma_tx);
    HAL_NVIC_SetPriority(DMA2_Stream3_IRQn, 3, 0);

    HAL_NVIC_SetPriority(DMA2_Stream4_IRQn, 4, 0);

    HAL_NVIC_SetPriority(SPI5_IRQn, 1, 0);

void HAL_SPI_MspDeInit(SPI_HandleTypeDef* spiHandle) {

  if(spiHandle->Instance == SPI5) {
    /* Peripheral clock disable */
    /* DMA2 clock disable */
    /**SPI5 GPIO Configuration    
    PF7     ------> SPI5_SCK
    PF8     ------> SPI5_MISO
    PF9     ------> SPI5_MOSI 
    /* SPI5 DMA DeInit */

void HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *hspi) {
    SPI_QueueItem_t *item = (SPI_QueueItem_t*)osPoolAlloc(spi_pool);
    item->value = rxbuf[1];
    item->source = "CALLBACK";
    osMessagePut(xGyroQueue, (uint32_t) item, 0);
What is happening here, is we define 2 handles for DMA configuration. We leave actual SPI peripheral initialization code as is, but in HAL_SPI_MspInit() we enable clocks for DMA2 channel and then configure streams themselves. Since DMA on STM32 is quite flexible, you can have it working only on transmit and receive. So we have to set up two DMA channels - one for RX and one for TX. Once DMA is configured, we initialize it and link it to the SPI type itself - it now holds a reference to the DMA instance, which is used internally for transfer management.

Last, but not least, we should enable interrupts, so that we get some feedback on when the transfer has been completed.
In cleanup function we also should stop the DMA2 clock (if it's not used anywhere else) and deinitialize the DMA channels we were using.
HAL also provides callback functions for TX completed, RX completed and TXRX (transceiving) completed, which can be used once interrupt service routine has been completed (clearing flags, etc). Don't worry, it's done by HAL internally.

You might notice some weirdness happening in the callback function. What I have done, is defined a custom structure, which I fill with data in callback functions. This structure is defined in spi.h:
extern SPI_HandleTypeDef hspi5;
extern osMessageQId xGyroQueue;
extern uint8_t rxbuf[3];
extern uint8_t txbuf[3];
extern osPoolId spi_pool;

extern void _Error_Handler(char *, int);

typedef struct __SPI_QueueItem_t {
  uint32_t value;
  char *source;
} SPI_QueueItem_t;
SPI_QueueItem_t strucure has two members - an uint32 value and a pointer to a character sequence, which I'll use to store a string value with info on origin of the structure.
Now it's time for a brief theoretical intermission.


I will use this structure, to pass around received data from interrupt to a processing task using a queue. A queue is just that - a fixed-length list of items. A task can be assigned to monitor a queue and take items from the list. Items are put in the list by some other tasks. Normally, items can be taken only once. Initially this list is empty and monitoring task is in suspended state and gets woken up (transitions into Ready state) once an item is available in queue. As soon as processing task gets some CPU time (there are no tasks of higher priority blocking it), it'll take one item from the queue and process it, thus freeing a slot in the queue. Queues by nature are first-in-first-out (FIFO), so care must be taken to give some time slot for data processing, otherwise the data can get stale.
Queue by default can pass a single uint32 value, which, conveniently enough, is just the right amount of space to hold a pointer to a memory location, which can store whatever - from series of bits to bitmapped images. The latter is what we are going to do - we will pass a pointer instead of value, since I want not just a value, but also a source of the data. For this we need a mechanism for managing memory and this is where comes in

Memory pooling

A memory pool is a bunch of memory that is assigned for storing a number of particular objects. Usually application developer is able to predict types and amount of data and thus the amount of memory required for storing temporary data. As with a queue, it is fixed-size, except one can retrieve any value and in any order, using a reference (pointer). Object is stored in memory until it is specifically freed.

So, back to code. In our spi.h we define that we'll use external queue and a memory pool, as well as TX/RX buffers. In spi.c callback function we request a slot for a new object from our spi_pool pool by calling osPoolAlloc(spi_pool). This function returns a pointer to allocated space. We cast this pointer to our SPI_QueueItem structure pointer and then fill it with values. Once structure is populated, we put pointer to it into the queue.
Let's look at how do we define memory pool and queue for our data in freertos.c:
/* Variables -----------------------------------------------------------------*/
osThreadId defaultTaskHandle, blinkyTaskHandle, gyroTaskHandle, gyroPrinterHandle;
osMessageQId xGyroQueue;
uint8_t rxbuf[3] = {0x00, 0x00};
uint8_t txbuf[3] = {0x0F | 0x80, 0x00}; // 0x0F is WHO_AM_I register, 0x80 read bit, should return 0b11010100 or 0xD4
osPoolDef(spi_pool, 10, SPI_QueueItem_t);
osPoolId spi_pool;

/* Function prototypes -------------------------------------------------------*/
void StartDefaultTask(void const * argument);
void vBlinkyTask(void const * argument);
void vGyroTesterTask(void const * argument);
void vGyroPrinterTask(void const * argument);
void MX_FREERTOS_Init(void); /* (MISRA C 2004 rule 8.1) */
/* Init FreeRTOS */
void MX_FREERTOS_Init(void) {

 spi_pool = osPoolCreate(osPool(spi_pool));
 osThreadDef(defaultTask, StartDefaultTask, osPriorityLow, 0, 1000);
 defaultTaskHandle = osThreadCreate(osThread(defaultTask), NULL);

 osThreadDef(blinkyTask, vBlinkyTask, osPriorityHigh, 4, 1000);
 blinkyTaskHandle = osThreadCreate(osThread(blinkyTask), NULL);

 osThreadDef(gyroPrinterTask, vGyroPrinterTask, osPriorityLow, 1, 1000);
 gyroPrinterHandle = osThreadCreate(osThread(gyroPrinterTask), NULL);

 osMessageQDef(gyroPrinterQueue, 10, SPI_QueueItem_t); // 10 pointers
 xGyroQueue = osMessageCreate(osMessageQ(gyroPrinterQueue), NULL);

 // Put test data into the queue
 SPI_QueueItem_t *item = (SPI_QueueItem_t*)osPoolAlloc(spi_pool);
 item->value = 0x33;
 item->source = "TEST";
 osMessagePut(xGyroQueue, (uint32_t) item, 0);

 osThreadDef(gyroTask, vGyroTesterTask, osPriorityHigh, 0, 1000);
 gyroTaskHandle = osThreadCreate(osThread(gyroTask), NULL);



void vGyroPrinterTask(void const * argument) {
 osEvent event;
 uint8_t count = 0;
 SPI_QueueItem_t *item;
 while(1) {
  event = osMessageGet(xGyroQueue, osWaitForever);
  printf("Got %ld messages in queue\r\n", osMessageWaiting(xGyroQueue));
  while (event.status == osEventMessage) {
   item = (SPI_QueueItem_t *)event.value.p;
   printf("Message %d: from %s: %lx\r\n", count, item->source, item->value);
   osPoolFree(spi_pool, item);
   event = osMessageGet(xGyroQueue, 1);
  count = 0;
First, we use osPoolDef() macro to define pool, its name, depth and content type. Then we define a global variable for passing it around. In MX_FREERTOS_Init() we create the actual pool before using it. To create an object in the pool, we once again use osPoolAlloc() to get a pointer and use pointer to assign values. Here we create a test value with dummy data just to see, that the pool and queue processing works correctly.

Up to now we can fill the pool, but have no way of retrieving anything from it. For this we'll create a queue to pass around pointers to items in our pool. osMessageQDef() and osMessageCreate() deals with that. We create queue the same size as the pool itself and assign it to a global for use elsewhere.

vGyroPrinterTask() deals with processing the queue and freeing the pool. By default it sits in suspended state (waiting forever) until message comes in. Messages in CMSIS OS are implemented as a subtype of events, so we have to check for event type before we start processing it. Once it's clear, that it actually is a message, we can check for number of messages waiting in the queue and then process them until the queue is empty. Otherwise our thread will process a single message and then wait until the next tick to process next one.
In the processing loop we fish out pointer, and, since we don't have any other message types, we assign them to our SPI_QueueItem type pointer. At which point we get access to the members of the structure and can print them out. Once we are done with the object, we throw it out of the pool by telling memory management to free item at this location via osPoolFree().

Now we would like to populate the queue from DMA interrupts. The callback function defined in spi.c can be used for that. Otherwise for passing the data from DMA, we can use ISRs. Particulary interrupt processing in stm32f4xx_it.c:
extern TIM_HandleTypeDef htim6;
extern SPI_HandleTypeDef hspi5;
extern osMessageQId xGyroQueue;
extern uint8_t rxbuf[3];
extern uint8_t txbuf[3];
extern osPoolId spi_pool; 
// SPI5 DMA receive done
void DMA2_Stream3_IRQHandler(void) {
 SPI_QueueItem_t *item = (SPI_QueueItem_t*)osPoolAlloc(spi_pool);
 item->value = rxbuf[1];
 item->source = "DMA IRQ";
 osMessagePut(xGyroQueue, (uint32_t) item, 0);

// SPI5 DMA transmit done
void DMA2_Stream4_IRQHandler(void) {
 // Don't do anything, but still works

void SPI5_IRQHandler(void) {
 spiDone = 1;
Here we once again create an item in the memory pool and send pointer to the queue. I have also defined a handler for interrupt-based transmit, if anybody wants it, but it is not required for dealing with DMA.

NB! To get access to queues (or any other FreeRTOS API functionality) within interrupts, we have to increase configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY in FreeRTOSConfig.h from 5 to 3 (interrupt priorities are also inverted, i.e. smaller numbers mean higher priority). I spent a bit of time debugging this, until I found out about this issue. Turns out, STMCubeMx does not assign these values correctly while generating code.

Now last touch - actually using all this setup in freertos.c:
void vGyroTesterTask(void const * argument) {
 HAL_StatusTypeDef response = HAL_ERROR; // default to error, so we can see, if value actually gets updated by HAL

 /* Transceive data with gyro in blocking mode */
 response = HAL_SPI_TransmitReceive(&hspi5, txbuf, rxbuf, 2, 1000);
 if (response == HAL_OK) {
  printf("Sent: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);
 } else {
  printf("Got error response as %d\r\n", response);

 /* Now do the same in DMA mode */
 memset(rxbuf, 0x00, sizeof rxbuf);
 printf("RX buffer reset to %02x %02x\r\n", rxbuf[0], rxbuf[1]);
 response = HAL_SPI_TransmitReceive_DMA(&hspi5, txbuf, rxbuf, 2);
 if (response != HAL_OK) {
  printf("Got error response as %d\r\n", response);

 /* Print fome stuff, just to keep CPU busy to show that it's actually DMA performing transmit */
 uint8_t state = HAL_SPI_GetState(&hspi5);
 while (state != HAL_SPI_STATE_READY) {
   state = HAL_SPI_GetState(&hspi5);
   printf("State is: %d\r\n", state);
 printf("Sent via DMA: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);

 /* Again, this time using interrupts */
 memset(rxbuf, 0x00, sizeof rxbuf);
 printf("RX buffer reset to %02x %02x\r\n", rxbuf[0], rxbuf[1]);
 spiDone = 0;
 response = HAL_SPI_TransmitReceive_IT(&hspi5, txbuf, rxbuf, 2);
 if (response != HAL_OK) {
  printf("Got error response as %d\r\n", response);

 while (spiDone != 1) {
  printf("Not done yet!\r\n");
 printf("Sent via IT: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);

Here we still send data in blocking mode. Afterwards we do the same in DMA mode. While DMA is working in the background, we print the peripheral state in a loop without delays (well, actually delay is while UART is sending, since UART right now is working in blocking mode). Anyhow, first printout of the state should be "5" or "busy", once it's done sending, printouts should stop. Memset is used to clear the receive buffer to show, that we are actually receiving the data, not just reusing values already there.
This setup should show callback hierarchy - first DMA IRQ should be printed, and afterwards SPI general callback. And the result is as expected:
What we see in this screenshot is:
  1. Blocking mode still works
  2. DMA is transferring in background, while UART is blocking for printing 
  3. Once UART stops blocking, data is available in the same thread
  4. Sending in non-blocking mode with interrupt gets executed and interrupt gets called (but no data for some reason). Too lazy right now to debug it and I'm not particularly interested in such usecase. Might return to it eventually
  5. Messages are put from both ISR and callback in this order.
  6. Lowest priority tasks (queue processor and blinky) get executed last
  7. FIFO nature of message queue.
I'll end on this for now, post is becoming a bit bloated already.
As usual, sources are available on GitHub

STM32 + HAL + FreeRTOS Part IV: IDE (Eclipse) setup

Maybe I should've started with this a bit earlier, but IMO getting project to build from Eclipse is pretty straight forward - just import project as a Makefile project. Despite the fact, that it can't resolve some symbols, make is aware of them and compiles just fine.

So, for starters you'll need Eclipse itself, which is available on their downloads page. What you'll need is "Eclipse IDE for C/C++ developers". I have an old Mars2 version, but it shouldn't matter that much.

To run Eclipse, you also need Java Runtime (JRE) for your system available. JRE has to be with the same architecture (x86, x64) as Eclipse itself (and preferably OS).

Once installed and started up, you have to set up your workspace (directory, where it'll store all the project-related data). It does not have to be your projects folder, I prefer to keep it separately from actual code I commit, to avoid cluttering. Existing code can be easily imported afterwards.

Once workspace is set up, all you have to do is import Makefile project by clicking File/New/Makefile project with Existing code. Click Browse... and browse into the directory with our project source code, where the Makefile is located. It should fill project name based on the folder name. Toolchain selection can be left as <none> for now. Click finish and enjoy your project.

It should be able to build as-is now. Open Src/main.c file in project explorer and hit CTRL+B. If you see .hex and .bin file names in output console, you're dandy.

Now there are some issues with unresolved symbols (which annoyingly get reported as bugs). Let's try to fix that.

The following steps configure the CDT build output parser to automatically discover symbols, include paths and compiler settings based on the output produced by the Makefile.
  • Right-click on project Project Properties/C/C++/Preprocessor Include Paths,etc./Providers to open configuration window we're interested in
  • Click on CDT GCC Build Output Parser and change the compiler command pattern from (gcc)|([gc]\+\+)|(clang) to (.*gcc)|(.*[gc]\+\+) then apply changes.
  • Click on CDT Built-in Compiler Settings and replace ${COMMAND} with $toolchain-path\arm-none-eabi-gcc and click Apply. Here $toolchain-path is path to the toolchain binary folder:
    # Windows
    C:\tools\arm-gcc\7-2017-q4-major\bin\arm-none-eabi-gcc ${FLAGS} -E -P -v -dD "${INPUTS}"
    # Linux
    ~/tools/arm-gcc/7-2017-q4-major/bin/arm-none-eabi-gcc ${FLAGS} -E -P -v -dD "${INPUTS}"
Note, that  full path to the toolchain is needed only if you don't have it in system PATH.

Now do Project/Clean and then rebuild the project, so that Indexer can read the console output and add index everything mentioned there. Most of the warnings should go away now.

You can ignore make errors on performing make clean, it's usually Windows complaining that it can't find .dep directory. It should still clean up build.

Debugging STM32 applications with Eclipse via ST-Link is a bit more convoluted setup, so that will be reviewed some other time.

STM32 + HAL + FreeRTOS Part III: SPI (blocking)

Serial Peripheral Interface (SPI) is quite widely used in embedded systems for connecting all kinds of ICs - sensors, memories, screens, you name it. It seems to be somewhat less popular among the beginners/Arduino crowd than I2C, because of relatively more complicated setup. But it does provide higher speeds and possibility to have more of the same type sensors on a single bus without addressing issues.

STM32F429I-Discovery board has an L3GD20 3-axis gyroscope onboard connected to SPI channel 5. In this article we'll try to get it up and running. I will continue where we left off last time - a working example with blinking LEDs and UART.

So, let us look at the devkit connections, as defined in Discovery board documentation:
  • MOSI is on PF9
  • MISO is on PF8
  • SCK is on PF7
  • Chip-select (CS) is in PC1
  • Interrupt 1 is on PA1
  • Interrupt 2 is on PA2
When we initially generated the code in STM32Cube MX in the first part of this series, I told to include SPI5 in peripheral list, for which to generate initialization code. If we take a look in main.h generated header file, we can find SPI5_XXX_Pin and SPI5_XXX_GPIO_PORT definitions are already there and defined as the ones we need:

#define SPI5_SCK_Pin GPIO_PIN_7
#define SPI5_SCK_GPIO_Port GPIOF
#define SPI5_MISO_Pin GPIO_PIN_8
#define SPI5_MOSI_Pin GPIO_PIN_9
#define MEMS_INT1_Pin GPIO_PIN_1
#define MEMS_INT2_Pin GPIO_PIN_2
In addition to that, NCS_MEMS_SPI has been defined. That would be our CS. And last bunch are interrupt pins. Neat.

Another thing we should worry about (it's a pretty common source of headache) are SPI bus setup. Generated setup in spi.c states:

  hspi5.Instance = SPI5;
  hspi5.Init.Mode = SPI_MODE_MASTER;
  hspi5.Init.Direction = SPI_DIRECTION_2LINES;
  hspi5.Init.DataSize = SPI_DATASIZE_8BIT;
  hspi5.Init.CLKPolarity = SPI_POLARITY_HIGH;
  hspi5.Init.CLKPhase = SPI_PHASE_2EDGE;
  hspi5.Init.NSS = SPI_NSS_SOFT;
  hspi5.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_16;
  hspi5.Init.FirstBit = SPI_FIRSTBIT_MSB;
  hspi5.Init.TIMode = SPI_TIMODE_DISABLE;
Let's review it and compare to the data in datasheet. Sadly, they don't just tell you mode SPI is working in, so you have to read the signal time chart and figure it out on your own:
  1. Instance: SPI5 is what we want, yes
  2. Mode: Master mode - we will be initiating the communications
  3. Direction: 2 Lines. That's what we have and want. Apparently, there's a support for single-wire half-duplex comms as well, but that's not for us
  4. Datasize: 8bit. True. All the registers are 8bit, wider words are sent as pairs of high/low bytes
  5. Clock polarity (CPOL): Idle state is high, so polarity is high
  6. Clock phase (CPHA): Sampling seems to be done on the trailing edge, so this should be SPI_PHASE_2EDGE
  7. NSS (Slave select) will be done in software for now, by manually pulling CS pin low before transfer, with subsequent pull back up once transfer is done. It is possible to use hardware NSS, but it requires disabling the SPI master after transfer to pull it back up (just an implementation in STM32 HAL)
  8. Baud Rate Prescaler is a clock divider, which sets the transfer speed. Leave it be for now
  9. First Bit should be the most significant bit (MSB) as per gyro datasheet
  10. We are not interested in TI mode for now, so leave it disabled
Now that we have set up our comms with the chip, we can start abusing it. I created a small single-shot task, which reads the gyroscope WHO_AM_I register and terminates itself. Let's add it to our freertos.c, where other tasks are already defined:
void vGyroTesterTask(void const * argument) {
 HAL_StatusTypeDef response = HAL_ERROR; // default to error 
 // 0x0F is WHO_AM_I register, 0x80 read bit, should return 0b11010100 or 0xD4
 uint8_t txbuf[3] = {0x0F | 0x80, 0x00}; 
 uint8_t rxbuf[3] = {0x00, 0x00};

 response = HAL_SPI_TransmitReceive(&hspi5, txbuf, rxbuf, 2, 1000);
 if (response == HAL_OK) {
  printf("Sent: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);
 } else {
  printf("Got error response as %d\r\n", response);
What is happening here, is:
  1. We define default response as error, just to be sure, that it gets changed to HAL_OK
  2. define a 2 byte wide transmit buffer (with last spot as NULL terminator) First is the byte (command) we send. We want to read (0x80) register 0x0F, then we send bunch of zeros as a dummy payload to give chance to the chip to respond. 
  3. define a 2 byte wide receive buffer (with last spot as NULL terminator). First byte is dummy receive, normally filled with 0xFF (no data), the second one will contain the response from the chip
  4. pull down our CS pin, to indicate to the slave, that we are talking to it
  5. Do the actual transceiving of the data
  6. pull back up the CS pin, to tell the chip, that we are done with it
  7. check the response status code and print out the data or code received
  8. kill the thread, for we are done with it
We could (and should) clean up resources as well, by de-initializing the SPI bus, but that's not necessary right now.

All that's remaining, is just starting the task and let it run. I will add a new handle and reduce the priorities of blinky tasks to avoid them interrupting our transmit, since it is in blocking mode, that might mess things up:

osThreadId defaultTaskHandle, blinkyTaskHandle, gyroTaskHandle;


void MX_FREERTOS_Init(void) {

 osThreadDef(defaultTask, StartDefaultTask, osPriorityLow, 0, 1000);
 defaultTaskHandle = osThreadCreate(osThread(defaultTask), NULL);

 osThreadDef(blinkyTask, vBlinkyTask, osPriorityLow, 0, 1000);
 blinkyTaskHandle = osThreadCreate(osThread(blinkyTask), NULL);

 osThreadDef(gyroTask, vGyroTesterTask, osPriorityHigh, 0, 1000);
 gyroTaskHandle = osThreadCreate(osThread(gyroTask), NULL);
And voila:

As expected, we get our notification, that system is up. First goes high-priority SPI task and then it continues with low priority blinking. The printout shows, that MCU sent 0x8F to the gyro and got back it's chip identification byte 0xD4, just as expected.

I'm not particularly interested in implementing full driver for the gyro, that can be left as an exercise for the reader. There should be plenty of those already available online. 

As usual, sources on GitHub have been updated to include all of the above. Next up: Part V: SPI with DMA

STM32 + HAL + FreeRTOS Part II: UART

Previously we started a blinky project on STM32F429-Discovery board with HAL and FreeRTOS. I will continue to build up on it with Universal Asynchronous Receiver-Transmitter or UART.

If you remember well, during code generation, I instructed to leave USART1 in the list of peripherals to initialize in generated code. This was done so that we wouldn't have to figure out how to initialize it and provide it with clocks manually.

So STM32CubeMx generated usart.c/.h files for us, which contain USART1 initialization function, which defines UART parameters and two functions, which HAL expects user code to implement - HAL_UART_MspInit() and HAL_UART_MspDeInit() for setting up and teardown of hardware resources required for peripheral.

With this setup, UART should already be working and you could try to transmit data by using HAL_UART_Transmit(&huart1, &charbuffer, size, timeout):

 HAL_UART_Transmit(&huart1, &charbuffer, size, timeout)

But most of the time we don't want to deal with bytes and buffers, we want a convenient printf() function, which does all the formatting for us. So what we have to do, is called retargeting printf(). Meaning, that we tell it to send to USART1 as a default console. To accomplish that, all we have to do is (re)implement __io_putchar() and _write() functions:
#include "usart.h"

/* (Re)Define stdio functionality, so printf would output to USART1 */
int __io_putchar(int ch) {
 uint8_t c[1];
 c[0] = ch & 0x00FF;
 HAL_UART_Transmit(&huart1, &c[0], 1, 10);
 return ch;

int _write(int file,char *ptr, int len) {
 int DataIdx;
 for(DataIdx= 0; DataIdx< len; DataIdx++) {
 return len;

We shall call it printf_retarget.c. We'll make a companion header file, where we'll add include guards and prototypes:

int __io_putchar(int ch);
int _write(int file,char *ptr, int len);



For this bit of code to be used, we have to include it after usart.c in C_SOURCES definition in the Makefile and in main.c we can replace '#include "usart.h"' with '#include "printf_retarget.h"', for usart.h becomes a nested include.

And now we can use printf() function to our own pleasure. Let us add some output to the startup sequence:

  printf("System configured!\r\n");

  /* Call init function for freertos objects (in freertos.c) */

and to our task as well, so it would inform us, that it is turning the LED on:
void StartDefaultTask(void const * argument) {

  for(;;) {
   HAL_GPIO_TogglePin(GPIOG, GPIO_PIN_14);

Now, if we connect USART to our PC (I use FTDI232 for that. STM32 I/O pins are 5V tolerant, meaning that the serial converter does not necessarily have to be running in 3V3 mode) by connecting pins PA09 RX to PC TX, PA10 TX to PC RX, we can get our printouts:
It speaks!

The serial port settings are defined in usart.c:
  • 115200 baud
  • 8bit word length, 1 stop bit
  • Parity: none
  • Hardware control: none
That would be it for now. As before, sources are available on GitHub, notes and comments are welcome.

Next up: Part III: SPI in blocking mode