STM32 Dual-boot and firmware updates

Quite a few STM32 microcontrollers support dual-boot mechanism by splitting Flash memory into 2 banks. Recently I was facing a choice on how to implement Over The Air  (OTA) updates. Or, rather, over any medium. As long as we get new firmware. Well, some musings on the topic will follow.

There are multiple strategies of firmware updates, each has their own pros and cons:
  • single stable bootloader that selects firmware to boot from and jumps to the starting address. Purely software solution, but requires at least two binaries - bootloader and actual application. On the positive side, you can have as many application versions as fit in the flash. Bootloader resides in the firmware entry point and handles updates
  • RAM bootloader gets loaded and is run from memory. Handles obtaining and overwriting actual application firmware on the flash. This is popular for large firmware images, where only one can be fit on the flash at the same time
  • Physically multiple flash memories, each with their own application, orchestrated by the bootloader on the MCU
  • loads of variations on above
STM has an interesting blend. Flash memory is divided into n sectors, these can be (but it's not mandatory) mapped into 2 equal-sized banks, as if it would have 2 flashes, each half the size of the total. It also has a system bootloader, that lives in a separate system ROM section. This is often used to recover "bricked" devices. Based on BOOT0/1 pins and some bits in flash registers, this system bootloader decides how to map flash banks and where to boot from. A lot more on this is in application note AN2606.

Apart from just selecting bank to boot, you also have an option to switch banks on the fly. Meaning, that you don't even have to reset your device to run the new firmware. You just have to be very very careful with all the remaps. Application note AN4767 deals with that.

Since I don't have to have 100% real uptime, I decided that occasional resets are fine. So, the idea is to run firmware from flash bank 1. Once update is initiated, we save firmware to the bank2 and tell the system bootloader, that on next reset we want to boot into that.

My trusty Discovery board has STM32F429ZIT6 on it. Which means, that it has 2MB of flash memory divided into 24 sectors from 0th to 23rd. First 4 sectors are 16KB each, 5th is 64k, then 7 sectors x 128kB. That's bank 1, exactly half, from address 0x0800 0000 to 0x080F FFFF. Bank 2 is exactly the same, starting from address 0x0810 0000. Table 6 in RM0090 shows this layout.

FLASH_OPTCR control register has bit 4 "Dual-bank Boot option byte" or BFB2. From reference manual and appnotes it is not exactly clear, how this magic works, so I took my time to figure it out. Setting BFB2 bit forces booting into system memory, which checks, whether there is valid data (reset vector) at the start of the Bank2. If it is present, it proceeds with booting from Bank2. If not, it checks Bank1 for the same. This register is stored on flash, so it's not volatile.

Aliasing vs remap

If booting happens from address 0x0000 0000, how does it get to flash beginning address 0x0800 0000? Well, actual flash address gets aliased to the beginning of address space. Apparently, this is done by the hardware. If BFB2 bit is set, address 0x0810 0000 gets aliased.
Remap is the feature, that defines which flash bank is reachable by the addresses 0x0800 0000 and 0x0810 0000. By default first is Bank1 and second is Bank2. If remap is enabled, it's vice versa - 0x0800 0000 is Bank2 address and 0x0810 0000 is Bank1.

Write protection

Each bank and its option bytes are write-protected. To reprogram them, you need to unlock them by writing correct keys to KEY registers. This is done to prevent accidental overwrites and bricking.

OTA procedure

Ok, now my imagined procedure. We start in the normal operation mode, where device does what it is intended to do. Then it receives request for firmware update:
  1. Disable all running processes and unnecessary interrupts
  2. free enough RAM for firmware
  3. Get firmware, put it in RAM
  4. Check firmware image in RAM against CRC provided with it
  5. Calculate how many sectors image takes
  6. Unlock flash
  7. Erase sectors for firmware
  8. Write firmware to flash
  9. Verify flash contents against CRC
  10. Lock flash
  11. Unlock option bytes
  12. Toggle BFB2
  13. Lock option bytes
  14. Reboot
 Seems simple and straightforward enough. Implemented it and it seems to work... for a single update. To be precise - updating firmware in Bank2 works fine, updating firmware in Bank1 from Bank2 does not. Hmm...

Gocha!

When booted from Bank2 (BFB2 set), Bank2 is aliased to address start, so we boot from it. HAL_FLASHEx_Erase() function takes absolute sectors as a parameter. Meaning, that first sector of Bank1 is FLASH_SECTOR_0, but first sector of Bank2 is FLASH_SECTOR_12
What is not clearly stated anywhere, is the fact that banks are remapped, when booting with BFB2. Which means, that whichever bank you are booting from, it's always on address 0x0800 0000 and the other one is on 0x0810 0000. So, when performing an update, writing always should be done to 0x0810 0000.

Now that I think about it, it kind of makes sense - compilation of the code is always done against Bank1 address space, and it should be reachable from any bank. Seems obvious, once you figure it out, but might not be so for the first timers.

Now should figure out a few safeguards:
  • firmware stability tracking - we don't want to have frequent crashes of the new flashy firmware. System should gracefully revert to older but stabler version. Probably need a watchdog with some sort of uptime or crash-count tracking;
  • ensure, that we don't attempt to boot into broken firmware slot (vector table present, but no actual firmware)
  • it would be nice to have one of the first small sectors for non-volatile data storage. Seems a bit of waste to store couple of bytes in 128k sector, but that's ok for now.

1 comment:

  1. Hi! Thank you so much for the writeup.
    I was thinking of doing something similar, and it is good to hear I am not the only one thinking about this sort of task.

    Regarding the safety/recoverability, one option might be to reserve the initial flash sectors for a permanent bootloader. OTA then leaves those sectors intact.

    The ESP IDF provides a recovery mechanism. The first boot right after a OTA update is allowed, but a SECOND boot requires a "it's safe" flag to be set. If the system resets during the first OTA boot before this flag is set, the bootloader deems the image to be unbootable, and reverts to the previous image.

    In this case, both 16kB start sectors could feature a bootloader that checks this flag (which could be part of the later image), and when a unbootable or otherwise unstable image is loaded which cannot perform self-tests and set the "OK" flag, the start loader can map back to the previous partition.
    Firmware would then permanently be written to the later sectors.

    This might also provide a nice recovery mechanism. 16kB is juuust about enough space to squeeze in a small custom coms protocol. After an unexpected reset the bootloader could wait for a command over UART, I2C, whatever, and give us a way to reset it back...

    ReplyDelete