10+ Mbit UART part II

Previously I wrote about about my need for high-speed RS-485 interface. I finally had both parts and time to try to set up and test it. I assembled the whole thing on a breadboard to simulate the noisy environment (definitely not because I didn't want to draw a board which might not work).

STM32F4 interrupt status register gotcha

I have been trying to implement UART line idle detection. Idle condition is considered, if no control bits are received within time frame of next byte.

Seemed reasonably simple - enable RXNE (receive buffer not empty) and IDLE interrupts. Sometimes it would work, and sometimes it wouldn't. All Most of the time I had USART peripheral memory reading enabled in debugger and it would show, that the IDLE bit is set in status register, but code would read the value without the bit. And, of course, that would break all the ISR logic. After banging my head against it for 2 days, I realized, that to reset the IDLE flag, one has to read the register. And then it dawned on me, that my debugger is reading the register before code gets there, and, hence, it is reset.


10+ Mbit USART signal on STM32F429

Previously I wrote about the need for high-speed UART signal to feed into RS-485 line. Today I decided to try getting the required baud rate out of STM32F429-Discovery board.

The quest for 10+ Mbit RS-485 for STM32

I would like to move relatively large amounts of data from my STM32F427 device to a remote PC. Speed wouldn't be an issue, if not for the distance, which is not known at this point, but assumed at least 15-20 meters up to a couple of hundred meters.
For my project I decided that 4 to 10 megabits should be sufficient to manage sending away of all the generated data before the next measurement cycle.

Custom STM32 board and USB re-enumeration issue

Lately I've been extensively using STMs USB Virtual Communications Port (VCP) - a serial port implementation over USB. It works reasonably well with the Discovery board I used for prototyping. But not so much with my custom board.

FreeRTOS and Semihosting issues

Semihosting is a way of providing missing I/O resources to the development platform. Most popular use of it is as debug output console using standard printf(). Turns out, there are some certain limitations to it.

STM32 + HAL + FreeRTOS Part V: SPI (with DMA)

The main flow of SPI (or any other communications for that matter) is such, that a CPU generates data to send, passes it along to the peripheral (or bit-banging logic, but that's out of scope) and then waits  for magic to happen.
There are multiple ways of "magic happening":
  • blocking mode - CPU actively does nothing, but checks the peripheral's status until transmit is done
  • interrupt mode - CPU hands off data and then proceeds with dealing with whatever it has to do. It gets interrupted, once transfer is complete and then can act upon it (store to memory, parse or whatever)
  • direct memory access (DMA) mode - CPU hands off the data and proceeds. Peripheral meanwhile sends and receives data directly from/to the memory region defined and notifies the CPU only when the transfer is done. At that point CPU can access data already in memory, it does not have to be fetched from peripheral anymore
DMA provides quite a few benefits - CPU does not have to worry about arranging transmission or data storage. It can even go to sleep, thus saving power. Or in data-intensive applications it can process the data batch, while another is on its way. So, what we get is extra clock cycles and possible power saving.

STM provides AN-4031 application note in which DMA functionality is described. STM32F4 has 2 DMA controllers, each responsible for its own set of peripherals. Each controller has 8 streams, each stream bound to specific peripherals. Tables 1 and 2 show each controller stream/peripheral mappings. We are interested in SPI5, since that's where on-board gyro is connected to. SPI5 is available on DMA2 Streams 3, 4 (channel 2), 5 and 6 (channel 7). Both transmit (TX) and receive (RX) channels are available as separate streams. Since streams have hardware priority according to inverse of their number (stream0 is of higher priority than stream2), care must be taken with distributing streams in application to avoid race conditions and bad states. But we don't care much about it now, because our application is tiny and we are not using any more streams which could affect our lives.

OK, let's get to coding. First, we need to initialize DMA, by enabling its clock and defining streams. We'll do that in the same SPI5 configuration (spi.c):
#include "spi.h"
#include "gpio.h"

SPI_HandleTypeDef hspi5;
DMA_HandleTypeDef hdma_rx;
DMA_HandleTypeDef hdma_tx;

/* SPI5 init function */
void MX_SPI5_Init(void) {

  hspi5.Instance = SPI5;
  hspi5.Init.Mode = SPI_MODE_MASTER;
  hspi5.Init.Direction = SPI_DIRECTION_2LINES;
  hspi5.Init.DataSize = SPI_DATASIZE_8BIT;
  hspi5.Init.CLKPolarity = SPI_POLARITY_HIGH;
  hspi5.Init.CLKPhase = SPI_PHASE_2EDGE;
  hspi5.Init.NSS = SPI_NSS_SOFT;
  hspi5.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_16;
  hspi5.Init.FirstBit = SPI_FIRSTBIT_MSB;
  hspi5.Init.TIMode = SPI_TIMODE_DISABLE;
  hspi5.Init.CRCPolynomial = 10;
  if (HAL_SPI_Init(&hspi5) != HAL_OK) {
    _Error_Handler(__FILE__, __LINE__);


void HAL_SPI_MspInit(SPI_HandleTypeDef* spiHandle) {

  GPIO_InitTypeDef GPIO_InitStruct;

  if (spiHandle->Instance == SPI5) {

    /* SPI5 clock enable */
    /**SPI5 GPIO Configuration    
    PF7     ------> SPI5_SCK
    PF8     ------> SPI5_MISO
    PF9     ------> SPI5_MOSI 
    GPIO_InitStruct.Pin = SPI5_SCK_Pin | SPI5_MISO_Pin | SPI5_MOSI_Pin;
    GPIO_InitStruct.Mode = GPIO_MODE_AF_PP;
    GPIO_InitStruct.Pull = GPIO_NOPULL;
    GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_LOW;
    GPIO_InitStruct.Alternate = GPIO_AF5_SPI5;
    HAL_GPIO_Init(GPIOF, &GPIO_InitStruct);

    hdma_rx.Instance = DMA2_Stream3;
    hdma_rx.Init.Channel = DMA_CHANNEL_2;
    hdma_rx.Init.Direction = DMA_PERIPH_TO_MEMORY;
    hdma_rx.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_rx.Init.MemInc = DMA_MINC_ENABLE;
    hdma_rx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_rx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
    hdma_rx.Init.Mode = DMA_NORMAL;
    hdma_rx.Init.Priority = DMA_PRIORITY_VERY_HIGH;
    hdma_rx.Init.FIFOMode = DMA_FIFOMODE_DISABLE;
    if (HAL_DMA_Init(&hdma_rx) != HAL_OK) {
        _Error_Handler(__FILE__, __LINE__);

    __HAL_LINKDMA(&hspi5, hdmarx, hdma_rx);

    hdma_tx.Instance = DMA2_Stream4;
    hdma_tx.Init.Channel = DMA_CHANNEL_2;
    hdma_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
    hdma_tx.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_tx.Init.MemInc = DMA_MINC_ENABLE;
    hdma_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
    hdma_tx.Init.Mode = DMA_NORMAL;
    hdma_tx.Init.Priority = DMA_PRIORITY_VERY_HIGH;
    hdma_tx.Init.FIFOMode = DMA_FIFOMODE_DISABLE;
    if (HAL_DMA_Init(&hdma_tx) != HAL_OK) {
        _Error_Handler(__FILE__, __LINE__);

    __HAL_LINKDMA(&hspi5, hdmatx, hdma_tx);
    HAL_NVIC_SetPriority(DMA2_Stream3_IRQn, 3, 0);

    HAL_NVIC_SetPriority(DMA2_Stream4_IRQn, 4, 0);

    HAL_NVIC_SetPriority(SPI5_IRQn, 1, 0);

void HAL_SPI_MspDeInit(SPI_HandleTypeDef* spiHandle) {

  if(spiHandle->Instance == SPI5) {
    /* Peripheral clock disable */
    /* DMA2 clock disable */
    /**SPI5 GPIO Configuration    
    PF7     ------> SPI5_SCK
    PF8     ------> SPI5_MISO
    PF9     ------> SPI5_MOSI 
    /* SPI5 DMA DeInit */

void HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *hspi) {
    SPI_QueueItem_t *item = (SPI_QueueItem_t*)osPoolAlloc(spi_pool);
    item->value = rxbuf[1];
    item->source = "CALLBACK";
    osMessagePut(xGyroQueue, (uint32_t) item, 0);
What is happening here, is we define 2 handles for DMA configuration. We leave actual SPI peripheral initialization code as is, but in HAL_SPI_MspInit() we enable clocks for DMA2 channel and then configure streams themselves. Since DMA on STM32 is quite flexible, you can have it working only on transmit and receive. So we have to set up two DMA channels - one for RX and one for TX. Once DMA is configured, we initialize it and link it to the SPI type itself - it now holds a reference to the DMA instance, which is used internally for transfer management.

Last, but not least, we should enable interrupts, so that we get some feedback on when the transfer has been completed.
In cleanup function we also should stop the DMA2 clock (if it's not used anywhere else) and deinitialize the DMA channels we were using.
HAL also provides callback functions for TX completed, RX completed and TXRX (transceiving) completed, which can be used once interrupt service routine has been completed (clearing flags, etc). Don't worry, it's done by HAL internally.

You might notice some weirdness happening in the callback function. What I have done, is defined a custom structure, which I fill with data in callback functions. This structure is defined in spi.h:
extern SPI_HandleTypeDef hspi5;
extern osMessageQId xGyroQueue;
extern uint8_t rxbuf[3];
extern uint8_t txbuf[3];
extern osPoolId spi_pool;

extern void _Error_Handler(char *, int);

typedef struct __SPI_QueueItem_t {
  uint32_t value;
  char *source;
} SPI_QueueItem_t;
SPI_QueueItem_t strucure has two members - an uint32 value and a pointer to a character sequence, which I'll use to store a string value with info on origin of the structure.
Now it's time for a brief theoretical intermission.


I will use this structure, to pass around received data from interrupt to a processing task using a queue. A queue is just that - a fixed-length list of items. A task can be assigned to monitor a queue and take items from the list. Items are put in the list by some other tasks. Normally, items can be taken only once. Initially this list is empty and monitoring task is in suspended state and gets woken up (transitions into Ready state) once an item is available in queue. As soon as processing task gets some CPU time (there are no tasks of higher priority blocking it), it'll take one item from the queue and process it, thus freeing a slot in the queue. Queues by nature are first-in-first-out (FIFO), so care must be taken to give some time slot for data processing, otherwise the data can get stale.
Queue by default can pass a single uint32 value, which, conveniently enough, is just the right amount of space to hold a pointer to a memory location, which can store whatever - from series of bits to bitmapped images. The latter is what we are going to do - we will pass a pointer instead of value, since I want not just a value, but also a source of the data. For this we need a mechanism for managing memory and this is where comes in

Memory pooling

A memory pool is a bunch of memory that is assigned for storing a number of particular objects. Usually application developer is able to predict types and amount of data and thus the amount of memory required for storing temporary data. As with a queue, it is fixed-size, except one can retrieve any value and in any order, using a reference (pointer). Object is stored in memory until it is specifically freed.

So, back to code. In our spi.h we define that we'll use external queue and a memory pool, as well as TX/RX buffers. In spi.c callback function we request a slot for a new object from our spi_pool pool by calling osPoolAlloc(spi_pool). This function returns a pointer to allocated space. We cast this pointer to our SPI_QueueItem structure pointer and then fill it with values. Once structure is populated, we put pointer to it into the queue.
Let's look at how do we define memory pool and queue for our data in freertos.c:
/* Variables -----------------------------------------------------------------*/
osThreadId defaultTaskHandle, blinkyTaskHandle, gyroTaskHandle, gyroPrinterHandle;
osMessageQId xGyroQueue;
uint8_t rxbuf[3] = {0x00, 0x00};
uint8_t txbuf[3] = {0x0F | 0x80, 0x00}; // 0x0F is WHO_AM_I register, 0x80 read bit, should return 0b11010100 or 0xD4
osPoolDef(spi_pool, 10, SPI_QueueItem_t);
osPoolId spi_pool;

/* Function prototypes -------------------------------------------------------*/
void StartDefaultTask(void const * argument);
void vBlinkyTask(void const * argument);
void vGyroTesterTask(void const * argument);
void vGyroPrinterTask(void const * argument);
void MX_FREERTOS_Init(void); /* (MISRA C 2004 rule 8.1) */
/* Init FreeRTOS */
void MX_FREERTOS_Init(void) {

 spi_pool = osPoolCreate(osPool(spi_pool));
 osThreadDef(defaultTask, StartDefaultTask, osPriorityLow, 0, 1000);
 defaultTaskHandle = osThreadCreate(osThread(defaultTask), NULL);

 osThreadDef(blinkyTask, vBlinkyTask, osPriorityHigh, 4, 1000);
 blinkyTaskHandle = osThreadCreate(osThread(blinkyTask), NULL);

 osThreadDef(gyroPrinterTask, vGyroPrinterTask, osPriorityLow, 1, 1000);
 gyroPrinterHandle = osThreadCreate(osThread(gyroPrinterTask), NULL);

 osMessageQDef(gyroPrinterQueue, 10, SPI_QueueItem_t); // 10 pointers
 xGyroQueue = osMessageCreate(osMessageQ(gyroPrinterQueue), NULL);

 // Put test data into the queue
 SPI_QueueItem_t *item = (SPI_QueueItem_t*)osPoolAlloc(spi_pool);
 item->value = 0x33;
 item->source = "TEST";
 osMessagePut(xGyroQueue, (uint32_t) item, 0);

 osThreadDef(gyroTask, vGyroTesterTask, osPriorityHigh, 0, 1000);
 gyroTaskHandle = osThreadCreate(osThread(gyroTask), NULL);



void vGyroPrinterTask(void const * argument) {
 osEvent event;
 uint8_t count = 0;
 SPI_QueueItem_t *item;
 while(1) {
  event = osMessageGet(xGyroQueue, osWaitForever);
  printf("Got %ld messages in queue\r\n", osMessageWaiting(xGyroQueue));
  while (event.status == osEventMessage) {
   item = (SPI_QueueItem_t *)event.value.p;
   printf("Message %d: from %s: %lx\r\n", count, item->source, item->value);
   osPoolFree(spi_pool, item);
   event = osMessageGet(xGyroQueue, 1);
  count = 0;
First, we use osPoolDef() macro to define pool, its name, depth and content type. Then we define a global variable for passing it around. In MX_FREERTOS_Init() we create the actual pool before using it. To create an object in the pool, we once again use osPoolAlloc() to get a pointer and use pointer to assign values. Here we create a test value with dummy data just to see, that the pool and queue processing works correctly.

Up to now we can fill the pool, but have no way of retrieving anything from it. For this we'll create a queue to pass around pointers to items in our pool. osMessageQDef() and osMessageCreate() deals with that. We create queue the same size as the pool itself and assign it to a global for use elsewhere.

vGyroPrinterTask() deals with processing the queue and freeing the pool. By default it sits in suspended state (waiting forever) until message comes in. Messages in CMSIS OS are implemented as a subtype of events, so we have to check for event type before we start processing it. Once it's clear, that it actually is a message, we can check for number of messages waiting in the queue and then process them until the queue is empty. Otherwise our thread will process a single message and then wait until the next tick to process next one.
In the processing loop we fish out pointer, and, since we don't have any other message types, we assign them to our SPI_QueueItem type pointer. At which point we get access to the members of the structure and can print them out. Once we are done with the object, we throw it out of the pool by telling memory management to free item at this location via osPoolFree().

Now we would like to populate the queue from DMA interrupts. The callback function defined in spi.c can be used for that. Otherwise for passing the data from DMA, we can use ISRs. Particulary interrupt processing in stm32f4xx_it.c:
extern TIM_HandleTypeDef htim6;
extern SPI_HandleTypeDef hspi5;
extern osMessageQId xGyroQueue;
extern uint8_t rxbuf[3];
extern uint8_t txbuf[3];
extern osPoolId spi_pool; 
// SPI5 DMA receive done
void DMA2_Stream3_IRQHandler(void) {
 SPI_QueueItem_t *item = (SPI_QueueItem_t*)osPoolAlloc(spi_pool);
 item->value = rxbuf[1];
 item->source = "DMA IRQ";
 osMessagePut(xGyroQueue, (uint32_t) item, 0);

// SPI5 DMA transmit done
void DMA2_Stream4_IRQHandler(void) {
 // Don't do anything, but still works

void SPI5_IRQHandler(void) {
 spiDone = 1;
Here we once again create an item in the memory pool and send pointer to the queue. I have also defined a handler for interrupt-based transmit, if anybody wants it, but it is not required for dealing with DMA.

NB! To get access to queues (or any other FreeRTOS API functionality) within interrupts, we have to increase configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY in FreeRTOSConfig.h from 5 to 3 (interrupt priorities are also inverted, i.e. smaller numbers mean higher priority). I spent a bit of time debugging this, until I found out about this issue. Turns out, STMCubeMx does not assign these values correctly while generating code.

Now last touch - actually using all this setup in freertos.c:
void vGyroTesterTask(void const * argument) {
 HAL_StatusTypeDef response = HAL_ERROR; // default to error, so we can see, if value actually gets updated by HAL

 /* Transceive data with gyro in blocking mode */
 response = HAL_SPI_TransmitReceive(&hspi5, txbuf, rxbuf, 2, 1000);
 if (response == HAL_OK) {
  printf("Sent: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);
 } else {
  printf("Got error response as %d\r\n", response);

 /* Now do the same in DMA mode */
 memset(rxbuf, 0x00, sizeof rxbuf);
 printf("RX buffer reset to %02x %02x\r\n", rxbuf[0], rxbuf[1]);
 response = HAL_SPI_TransmitReceive_DMA(&hspi5, txbuf, rxbuf, 2);
 if (response != HAL_OK) {
  printf("Got error response as %d\r\n", response);

 /* Print fome stuff, just to keep CPU busy to show that it's actually DMA performing transmit */
 uint8_t state = HAL_SPI_GetState(&hspi5);
 while (state != HAL_SPI_STATE_READY) {
   state = HAL_SPI_GetState(&hspi5);
   printf("State is: %d\r\n", state);
 printf("Sent via DMA: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);

 /* Again, this time using interrupts */
 memset(rxbuf, 0x00, sizeof rxbuf);
 printf("RX buffer reset to %02x %02x\r\n", rxbuf[0], rxbuf[1]);
 spiDone = 0;
 response = HAL_SPI_TransmitReceive_IT(&hspi5, txbuf, rxbuf, 2);
 if (response != HAL_OK) {
  printf("Got error response as %d\r\n", response);

 while (spiDone != 1) {
  printf("Not done yet!\r\n");
 printf("Sent via IT: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);

Here we still send data in blocking mode. Afterwards we do the same in DMA mode. While DMA is working in the background, we print the peripheral state in a loop without delays (well, actually delay is while UART is sending, since UART right now is working in blocking mode). Anyhow, first printout of the state should be "5" or "busy", once it's done sending, printouts should stop. Memset is used to clear the receive buffer to show, that we are actually receiving the data, not just reusing values already there.
This setup should show callback hierarchy - first DMA IRQ should be printed, and afterwards SPI general callback. And the result is as expected:
What we see in this screenshot is:
  1. Blocking mode still works
  2. DMA is transferring in background, while UART is blocking for printing 
  3. Once UART stops blocking, data is available in the same thread
  4. Sending in non-blocking mode with interrupt gets executed and interrupt gets called (but no data for some reason). Too lazy right now to debug it and I'm not particularly interested in such usecase. Might return to it eventually
  5. Messages are put from both ISR and callback in this order.
  6. Lowest priority tasks (queue processor and blinky) get executed last
  7. FIFO nature of message queue.
I'll end on this for now, post is becoming a bit bloated already.
As usual, sources are available on GitHub

STM32 + HAL + FreeRTOS Part IV: IDE (Eclipse) setup

Maybe I should've started with this a bit earlier, but IMO getting project to build from Eclipse is pretty straight forward - just import project as a Makefile project. Despite the fact, that it can't resolve some symbols, make is aware of them and compiles just fine.

So, for starters you'll need Eclipse itself, which is available on their downloads page. What you'll need is "Eclipse IDE for C/C++ developers". I have an old Mars2 version, but it shouldn't matter that much.

To run Eclipse, you also need Java Runtime (JRE) for your system available. JRE has to be with the same architecture (x86, x64) as Eclipse itself (and preferably OS).

Once installed and started up, you have to set up your workspace (directory, where it'll store all the project-related data). It does not have to be your projects folder, I prefer to keep it separately from actual code I commit, to avoid cluttering. Existing code can be easily imported afterwards.

Once workspace is set up, all you have to do is import Makefile project by clicking File/New/Makefile project with Existing code. Click Browse... and browse into the directory with our project source code, where the Makefile is located. It should fill project name based on the folder name. Toolchain selection can be left as <none> for now. Click finish and enjoy your project.

It should be able to build as-is now. Open Src/main.c file in project explorer and hit CTRL+B. If you see .hex and .bin file names in output console, you're dandy.

Now there are some issues with unresolved symbols (which annoyingly get reported as bugs). Let's try to fix that.

The following steps configure the CDT build output parser to automatically discover symbols, include paths and compiler settings based on the output produced by the Makefile.
  • Right-click on project Project Properties/C/C++/Preprocessor Include Paths,etc./Providers to open configuration window we're interested in
  • Click on CDT GCC Build Output Parser and change the compiler command pattern from (gcc)|([gc]\+\+)|(clang) to (.*gcc)|(.*[gc]\+\+) then apply changes.
  • Click on CDT Built-in Compiler Settings and replace ${COMMAND} with $toolchain-path\arm-none-eabi-gcc and click Apply. Here $toolchain-path is path to the toolchain binary folder:
    # Windows
    C:\tools\arm-gcc\7-2017-q4-major\bin\arm-none-eabi-gcc ${FLAGS} -E -P -v -dD "${INPUTS}"
    # Linux
    ~/tools/arm-gcc/7-2017-q4-major/bin/arm-none-eabi-gcc ${FLAGS} -E -P -v -dD "${INPUTS}"
Note, that  full path to the toolchain is needed only if you don't have it in system PATH.

Now do Project/Clean and then rebuild the project, so that Indexer can read the console output and add index everything mentioned there. Most of the warnings should go away now.

You can ignore make errors on performing make clean, it's usually Windows complaining that it can't find .dep directory. It should still clean up build.

Debugging STM32 applications with Eclipse via ST-Link is a bit more convoluted setup, so that will be reviewed some other time.

STM32 + HAL + FreeRTOS Part III: SPI (blocking)

Serial Peripheral Interface (SPI) is quite widely used in embedded systems for connecting all kinds of ICs - sensors, memories, screens, you name it. It seems to be somewhat less popular among the beginners/Arduino crowd than I2C, because of relatively more complicated setup. But it does provide higher speeds and possibility to have more of the same type sensors on a single bus without addressing issues.

STM32F429I-Discovery board has an L3GD20 3-axis gyroscope onboard connected to SPI channel 5. In this article we'll try to get it up and running. I will continue where we left off last time - a working example with blinking LEDs and UART.

So, let us look at the devkit connections, as defined in Discovery board documentation:
  • MOSI is on PF9
  • MISO is on PF8
  • SCK is on PF7
  • Chip-select (CS) is in PC1
  • Interrupt 1 is on PA1
  • Interrupt 2 is on PA2
When we initially generated the code in STM32Cube MX in the first part of this series, I told to include SPI5 in peripheral list, for which to generate initialization code. If we take a look in main.h generated header file, we can find SPI5_XXX_Pin and SPI5_XXX_GPIO_PORT definitions are already there and defined as the ones we need:

#define SPI5_SCK_Pin GPIO_PIN_7
#define SPI5_SCK_GPIO_Port GPIOF
#define SPI5_MISO_Pin GPIO_PIN_8
#define SPI5_MOSI_Pin GPIO_PIN_9
#define MEMS_INT1_Pin GPIO_PIN_1
#define MEMS_INT2_Pin GPIO_PIN_2
In addition to that, NCS_MEMS_SPI has been defined. That would be our CS. And last bunch are interrupt pins. Neat.

Another thing we should worry about (it's a pretty common source of headache) are SPI bus setup. Generated setup in spi.c states:

  hspi5.Instance = SPI5;
  hspi5.Init.Mode = SPI_MODE_MASTER;
  hspi5.Init.Direction = SPI_DIRECTION_2LINES;
  hspi5.Init.DataSize = SPI_DATASIZE_8BIT;
  hspi5.Init.CLKPolarity = SPI_POLARITY_HIGH;
  hspi5.Init.CLKPhase = SPI_PHASE_2EDGE;
  hspi5.Init.NSS = SPI_NSS_SOFT;
  hspi5.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_16;
  hspi5.Init.FirstBit = SPI_FIRSTBIT_MSB;
  hspi5.Init.TIMode = SPI_TIMODE_DISABLE;
Let's review it and compare to the data in datasheet. Sadly, they don't just tell you mode SPI is working in, so you have to read the signal time chart and figure it out on your own:
  1. Instance: SPI5 is what we want, yes
  2. Mode: Master mode - we will be initiating the communications
  3. Direction: 2 Lines. That's what we have and want. Apparently, there's a support for single-wire half-duplex comms as well, but that's not for us
  4. Datasize: 8bit. True. All the registers are 8bit, wider words are sent as pairs of high/low bytes
  5. Clock polarity (CPOL): Idle state is high, so polarity is high
  6. Clock phase (CPHA): Sampling seems to be done on the trailing edge, so this should be SPI_PHASE_2EDGE
  7. NSS (Slave select) will be done in software for now, by manually pulling CS pin low before transfer, with subsequent pull back up once transfer is done. It is possible to use hardware NSS, but it requires disabling the SPI master after transfer to pull it back up (just an implementation in STM32 HAL)
  8. Baud Rate Prescaler is a clock divider, which sets the transfer speed. Leave it be for now
  9. First Bit should be the most significant bit (MSB) as per gyro datasheet
  10. We are not interested in TI mode for now, so leave it disabled
Now that we have set up our comms with the chip, we can start abusing it. I created a small single-shot task, which reads the gyroscope WHO_AM_I register and terminates itself. Let's add it to our freertos.c, where other tasks are already defined:
void vGyroTesterTask(void const * argument) {
 HAL_StatusTypeDef response = HAL_ERROR; // default to error 
 // 0x0F is WHO_AM_I register, 0x80 read bit, should return 0b11010100 or 0xD4
 uint8_t txbuf[3] = {0x0F | 0x80, 0x00}; 
 uint8_t rxbuf[3] = {0x00, 0x00};

 response = HAL_SPI_TransmitReceive(&hspi5, txbuf, rxbuf, 2, 1000);
 if (response == HAL_OK) {
  printf("Sent: %02x %02x Got: %02x %02x\r\n", txbuf[0], txbuf[1], rxbuf[0], rxbuf[1]);
 } else {
  printf("Got error response as %d\r\n", response);
What is happening here, is:
  1. We define default response as error, just to be sure, that it gets changed to HAL_OK
  2. define a 2 byte wide transmit buffer (with last spot as NULL terminator) First is the byte (command) we send. We want to read (0x80) register 0x0F, then we send bunch of zeros as a dummy payload to give chance to the chip to respond. 
  3. define a 2 byte wide receive buffer (with last spot as NULL terminator). First byte is dummy receive, normally filled with 0xFF (no data), the second one will contain the response from the chip
  4. pull down our CS pin, to indicate to the slave, that we are talking to it
  5. Do the actual transceiving of the data
  6. pull back up the CS pin, to tell the chip, that we are done with it
  7. check the response status code and print out the data or code received
  8. kill the thread, for we are done with it
We could (and should) clean up resources as well, by de-initializing the SPI bus, but that's not necessary right now.

All that's remaining, is just starting the task and let it run. I will add a new handle and reduce the priorities of blinky tasks to avoid them interrupting our transmit, since it is in blocking mode, that might mess things up:

osThreadId defaultTaskHandle, blinkyTaskHandle, gyroTaskHandle;


void MX_FREERTOS_Init(void) {

 osThreadDef(defaultTask, StartDefaultTask, osPriorityLow, 0, 1000);
 defaultTaskHandle = osThreadCreate(osThread(defaultTask), NULL);

 osThreadDef(blinkyTask, vBlinkyTask, osPriorityLow, 0, 1000);
 blinkyTaskHandle = osThreadCreate(osThread(blinkyTask), NULL);

 osThreadDef(gyroTask, vGyroTesterTask, osPriorityHigh, 0, 1000);
 gyroTaskHandle = osThreadCreate(osThread(gyroTask), NULL);
And voila:

As expected, we get our notification, that system is up. First goes high-priority SPI task and then it continues with low priority blinking. The printout shows, that MCU sent 0x8F to the gyro and got back it's chip identification byte 0xD4, just as expected.

I'm not particularly interested in implementing full driver for the gyro, that can be left as an exercise for the reader. There should be plenty of those already available online. 

As usual, sources on GitHub have been updated to include all of the above. Next up: Part V: SPI with DMA

STM32 + HAL + FreeRTOS Part II: UART

Previously we started a blinky project on STM32F429-Discovery board with HAL and FreeRTOS. I will continue to build up on it with Universal Asynchronous Receiver-Transmitter or UART.

If you remember well, during code generation, I instructed to leave USART1 in the list of peripherals to initialize in generated code. This was done so that we wouldn't have to figure out how to initialize it and provide it with clocks manually.

So STM32CubeMx generated usart.c/.h files for us, which contain USART1 initialization function, which defines UART parameters and two functions, which HAL expects user code to implement - HAL_UART_MspInit() and HAL_UART_MspDeInit() for setting up and teardown of hardware resources required for peripheral.

With this setup, UART should already be working and you could try to transmit data by using HAL_UART_Transmit(&huart1, &charbuffer, size, timeout):

 HAL_UART_Transmit(&huart1, &charbuffer, size, timeout)

But most of the time we don't want to deal with bytes and buffers, we want a convenient printf() function, which does all the formatting for us. So what we have to do, is called retargeting printf(). Meaning, that we tell it to send to USART1 as a default console. To accomplish that, all we have to do is (re)implement __io_putchar() and _write() functions:
#include "usart.h"

/* (Re)Define stdio functionality, so printf would output to USART1 */
int __io_putchar(int ch) {
 uint8_t c[1];
 c[0] = ch & 0x00FF;
 HAL_UART_Transmit(&huart1, &c[0], 1, 10);
 return ch;

int _write(int file,char *ptr, int len) {
 int DataIdx;
 for(DataIdx= 0; DataIdx< len; DataIdx++) {
 return len;

We shall call it printf_retarget.c. We'll make a companion header file, where we'll add include guards and prototypes:

int __io_putchar(int ch);
int _write(int file,char *ptr, int len);



For this bit of code to be used, we have to include it after usart.c in C_SOURCES definition in the Makefile and in main.c we can replace '#include "usart.h"' with '#include "printf_retarget.h"', for usart.h becomes a nested include.

And now we can use printf() function to our own pleasure. Let us add some output to the startup sequence:

  printf("System configured!\r\n");

  /* Call init function for freertos objects (in freertos.c) */

and to our task as well, so it would inform us, that it is turning the LED on:
void StartDefaultTask(void const * argument) {

  for(;;) {
   HAL_GPIO_TogglePin(GPIOG, GPIO_PIN_14);

Now, if we connect USART to our PC (I use FTDI232 for that. STM32 I/O pins are 5V tolerant, meaning that the serial converter does not necessarily have to be running in 3V3 mode) by connecting pins PA09 RX to PC TX, PA10 TX to PC RX, we can get our printouts:
It speaks!

The serial port settings are defined in usart.c:
  • 115200 baud
  • 8bit word length, 1 stop bit
  • Parity: none
  • Hardware control: none
That would be it for now. As before, sources are available on GitHub, notes and comments are welcome.

Next up: Part III: SPI in blocking mode

STM32 + HAL + FreeRTOS Part I: Setup (blinky)

It's alive!

For as much as it is reasonable, I prefer to use free tools in my development work. That includes compilers, IDEs and even OS. So the software setup will be geared towards Linux, but with minimal changes should be working also under windos/mac.

First thing we need is a compiler. I will use GNU Arm Embedded Toolchain. Setup is pretty simple - unpack it and leave it be. I'll unpack it into "tools" directory in user home (~/tools). At the time of writing this, latest version is 7-2017-q4-major, so final location of toolchain is "~/tools/gcc-arm-none-eabi-7-2017-q4-major". You can test it by calling arm-none-eabi-gcc with version parameter and it should print version info:
~/tools/gcc-arm-none-eabi-7-2017-q4-major/bin$ ./arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
As per ARM recommendations, I will not add binary folder to PATH, we will need the path only once anyway, it will live its own life in a Makefile.

Next thing we need is ST-LINK/V2 utility for flashing the MCU via ST-Link present on all the Discovery boards (I think Nucleo has them as well). Binaries are available for Windows and some Linux distros, but other distro (Debian in my case) users will have to compile their own from the sources. Info on how to get ST-Link up and running is in ReadMe.md of their project on GitHub. Note, that you'll also need to grant access permissions to the actual USB device via udev as well. Once you have everything installed, you can try to connect your Discovery board and running st-util, which starts a local GDB (GNU DeBugger) and outputs device info:
~/tools/gcc-arm-none-eabi-7-2017-q4-major/bin$ st-util
st-util 1.4.0-36-g0af68c0
2018-05-16T12:26:49 INFO common.c: Loading device parameters....
2018-05-16T12:26:49 INFO common.c: Device connected is: F42x and F43x device, id 0x10036419
2018-05-16T12:26:49 INFO common.c: SRAM size: 0x40000 bytes (256 KiB), Flash: 0x200000 bytes (2048 KiB) in pages of 16384 bytes
2018-05-16T12:26:49 INFO gdb-server.c: Chip ID is 00000419, Core ID is  2ba01477.
2018-05-16T12:26:49 INFO gdb-server.c: Listening at *:4242...
Don't worry, if any of the numbers/hashes are different, as long as it runs without errors, it should be fine. Hit CTRL+C to abort the program, for now we don't need to run it, we just use the output to verify, that all is dandy.

And last, but not least, we need some actual code to compile and flash. Setting up all the clocks and peripherals on STM32 is notoriously laborious endeavor and we need drivers anyway, so why not use STM32CubeMX (what a mouthful!) for generating boilerplate? You can get it from their website (requires registration). Since it's a Java program, you obviously need some sort of Java runtime environment (JRE), but at least it's cross-platform.

We will start by generating a project for STM32F429I-Discovery board.

Let us set up project outputs and other settings first. In "Project" menu select Preferences. In preferences window Project tab define project location. For me it'll be "~/projects". Don't create separate directory for given project, for STM32CubeMX creates a separate project directory based on the project name. I'll name this project "HAL-freeRtos-test1" and refer to the folder path as a $project. "Toolchain/IDE" should be set to Makefile, rest can be left as-is, we will change those in code as needed.
In "Code Generator" tab tell the program to copy all the used libraries into project folder, generate peripheral initialization as a pair of .c/.h files, keep user changes and delete while re-generating. OK all of it, so the settings get saved.

In main window Pinout tab disable everything but FREERTOS, CRC, DMA2D, RCC, SYS, TIM1, SPI5, USART1, USB_OTH_HS. For USART1 set it to Asynchronous mode without hardware flow control.

Switch to Clock configuration tab, let it fix clocks for you (it offered to do so for me).

If all is well, you can generate code by pressing CTRL+SHIFT+G or selecting Project/Generate Code. This should create output code in the project folder you defined previously.

Next we will edit the Makefile, which is kind of a recipe for compiler what to do and how to build the executable.
The problem with STM32CubeMX code generator, is that it's somewhat buggy. At least Makefile is unusable in the state generator outputs it. First, it's missing compiler path definition. Under #binaries section you'll have to define root path for gcc binary folder. If you installed in user home/tools folder as I did, then it should look like:
# binaries
BINPATH = ~/tools/gcc-arm-none-eabi-7-2017-q4-major/bin/
PREFIX = arm-none-eabi-
AS = $(BINPATH)/$(PREFIX)gcc -x assembler-with-cpp
CP = $(BINPATH)/$(PREFIX)objcopy
HEX = $(CP) -O ihex
BIN = $(CP) -O binary -S
Note, that I have changed only BINPATH variable, rest is just for the reference.
Now you can try to build the code as-is, but most likely it will have build errors. There are a couple of issues in the generated code.

First issue is definition of "__weak" attribute, which compiler does not like. It often is used to define some sort of default implementation of an interface function, which is required by libraries. Most of these implementations are empty and are meant to be overridden by the application code somewhere else. Anyhow, we have to fix it first. Correct way of doing it, is using "__attribute__((weak))" instead. So we will have to fix it manually. In file $project/Src/freertos.c replace hook definitions with correct ones:
__attribute__((weak)) void vApplicationIdleHook( void ) {
__attribute__((weak)) void vApplicationStackOverflowHook(xTaskHandle xTask, signed char *pcTaskName) {
__attribute__((weak)) void vApplicationMallocFailedHook(void) {

If you are lucky, it might compile now. Compilation is done by running make program in the directory with Makefile in it:
$ cd ~/projects/HAL-freeRTOS-test1
$ make

It  is quite likely, that it still won't compile and will complain about main() or other functions already been defined in main.c or other files. This is an indicator of duplicate source file definitions in the Makefile. Open Makefile in your favorite text editor and under C_SOURCES definition remove all the duplicate entries (at least those, which start with Src path). Do not comment out entry, for it breaks the script, just delete the whole line.
While you are there, you can also remove all the lines that point to files, not directories in SOURCES_DIR definition.
Please note, that all of the lines in SOURCES_DIR and C_SOURCES definitions have to end with backslash ("\"), except the last one. Backslashes indicate that the command continues on the next line, lack of it indicates, that the command ends there.

Now it should compile. You could flash it as well, but it doesn't do much yet. Let's add our blinker functionality to the mix.

In Src/freertos.c there is a StartDefaultTask function defined. As of now it does not do anything. We can add our code there. All we have to do, is replace contents of infinite for loop with what we want it to do:
void StartDefaultTask(void const * argument) {
  for(;;) {
    HAL_GPIO_TogglePin(GPIOG, GPIO_PIN_14);
We should also add "#include gpio.h" in the top of the file, to use GPIO functionality here.

Now we can build the program and flash it to our dev board and we should see red LED blinking every half a second:
$ cd ~/projects/HAL-freeRTOS-test1
$ make
$ st-flash write build/HAL-freeRTOS-test1.bin 0x8000000

Let us go through the program code, as it flows:
  1. We start in main() in main.c. Generated code 
    1. initializes HAL
    2. sets up all the clocks
    3. initializes GPIO
    4. initializes CRC
    5. initializes TIM1
    6. initializes USB functionality
    7. initializes USART1
  2. main() then calls MX_FREERTOS_Init() from freertos.c. There it has just one simple task definition (StartDefaultTask()), which gets assigned to a new thread
  3. main() starts the task scheduler by calling osKernelStart(), which controls execution of all the tasks defined in MX_FREERTOS_Init() function
  4. StartDefaultTask() gets executed and then suspended for 500 ms, when it gets executed again
  5. Point 4 ad-infinitum
Working sources, including library dependencies are available on  GitHub. I will keep on updating repository, as we go along.

Next entry in series is:
STM32 + HAL + FreeRTOS Part II: UART

Edit: compiling and flashing on Windows (tested on 10)

On Windows you can use either MinGW shell port or  Linux Subsystem on Windows or just a GNUWin32 to provide make tool. Then you need Git for Windows for checking out sources. Latter is optional, if you do it from scratch. 
I went with the standalone make tool, since it integrates reasonably well with the OS and is relatively minimalistic. After installing it, you'll need to add binary folder (e.g. C:\Program Files (x86)\GnuWin32\bin) to the PATH for convenience.

Please note, that ARM Embedded toolchain installer by default wants to install in folder with a space in it. GNU make deals poorly with spaces, so please choose paths such, that they are reasonably brief and do not contain any special characters, including spaces. e.g. "C:\tools\arm\7-2017-q4-major" is fine, while "C:\Program Files (x86)\ARM\7 2017-q4-major" is not OK. I wonder, why installer devs did that.
There is another issue with path separators (windows "\" vs POSIX "/"), but windows seems to be able to chew them both in mixed usage:
C:\tools\arm-gcc\7-2017-q4-major\bin/arm-none-eabi-size build/HAL-freeRTOS-test1.elf
   text    data     bss     dec     hex filename
  18484     120   35040   53644    d18c build/HAL-freeRTOS-test1.elf
C:\tools\arm-gcc\7-2017-q4-major\bin/arm-none-eabi-objcopy -O ihex build/HAL-freeRTOS-test1.elf build/HAL-freeRTOS-test1.hex
C:\tools\arm-gcc\7-2017-q4-major\bin/arm-none-eabi-objcopy -O binary -S build/HAL-freeRTOS-test1.elf build/HAL-freeRTOS-test1.bin  

For flashing you still need ST-LINK utility, which you can download from STM website (requires email/registration). It's an installer, so it should be pretty straight forward.
Once installed (both tool and the driver), start the tool, connect your devboard and press the plug button to connect to it. Once connected you should see memory contents. Then you can open file, select the compiled .bin file from build directory and perform the flashing (Program verify option in ST-Link utility).
I updated the Makefile to include windows GCC path separately from Linux, so you can have multiple setups as I do (windows machine for playing computer games, windows machine at work and a Linux laptop for off-site coding sprees):

# binaries
ifeq ($(OS),Windows_NT)
BINPATH = C:\tools\arm-gcc\7-2017-q4-major\bin
BINPATH = ~/tools/gcc-arm-none-eabi-7-2017-q4-major/bin/