Demystifying Arm Cortex-M33 Bare Metal: Startup

October 13, 2023(updated on July 24, 2024)

Introduction

I had a problem when developing code for an STM32 development board, and I realized there are things I did not fully understand regarding how the processor starts (or boots) and how a program is built (compiled, assembled and linked), and the relation between these two. I decided to dig deeper, and this and subsequent posts are the result of this study.

I initially started writing a post covering all aspects of startup and build as they are very related, however, such a post becomes very long. Thus, I have decided to split this topic to a few posts. The first one is this, about the startup. As I publish new posts in this series, I will list them here.

The second post is published as Demystifying Arm Cortex-M33 Bare Metal: Compile, Assembly and Link.

I assume you have some knowledge regarding Arm Cortex-M and/or Armv8-M architecture and/or developing embedded software in general. You definitely do not need to be an expert, the information in this and subsequent posts are basic, but might not be simple. I am assuming the actual application is going to be written in C not in C++, so I disregard any requirements of a C++ runtime.

I have explicitly named this post for Arm Cortex-M33, and I am using an STM32H563 microcontroller. However, only a few things are specific to the actual processor, most of the information should be valid for any Arm Cortex-M33 processor and useful for any Arm Cortex-M processor.

I am using Arm GNU Toolchain either with STM32CubeIDE or directly on the Linux command line. Some information in this post is only accurate for this toolchain. I did not check Arm compiler for Embedded toolchain or others.

Questions

When I create an STM32 project in STM32CubeIDE with project type set to empty (not STM32Cube), a startup code startup_stm32h563zitx.s and a linker description STM32H563ZITX_FLASH.ld is copied to the project. These come from STM32CubeH5 which is ST’s CMSIS Device MCU component for H5 series.

These are the questions for which I am looking for a complete and accurate answer:

What does the MCU do when it is powered on (cold-reset) or after (warm-)reset ?
What does the startup code do ?
Is the startup code in CMSIS Core(M) as same as STM32CubeH5 ? If not, what are the differences ?
How is the program built (compiled, assembled and linked) ? and how is the build process related to the startup code ?
Is the linker script in CMSIS Core(M) as same as STM32CubeH5 ? If not, what are the differences ?
What changes when TrustZone is enabled ?

I will follow the same order when trying to give the answers. This post is about the questions (1), (2) and (3). I will probably write two more posts covering (4) and (5), and finally (6).

Reset

A reset can be a cold reset (power recycle) or a warm reset (no power recycle) but the difference between them is not important for this post.

The reset procedure is documented as a pseudocode in ARMv8-M Architecture Reference Manual. I did not include everything below, but simply during reset:

system registers are set to their default/reset values
LR is set to 0xFFFFFFFF
all exceptions are inactivated
stack limit registers (MSPLIM and PSPLIM) are cleared
first element of the vector table (stack pointer) is loaded as sp
second element of the vector table (Reset_Handler) is loaded as start
HardFault exception is activated
main stack pointer (MSP) is set to sp (first element of the vector table)
start (Reset_Handler) is called

This raises an extra question: how does the processor know the location of the vector table ?

The Location of the Vector Table

The location of the vector table is in VTOR (Vector Table Offset Register). Under VTOR section (D1.2.265), the Armv8-M Architecture Reference Manual says: “This field resets to an IMPLEMENTATION DEFINED value on a Warm reset.”. So its value depends on the silicon vendor (such as ST or NXP).

The default/reset value should either be a fixed value (such as zero) or should be “told” to the processor just after reset before reset procedure described above starts. On Cortex-M33, its initial value is provided to the processor externally through INITVTOR signals.

On STM32H563, there is an SBS block that provides some signals to the processor such as:

sbs_init_vbr_s
sbs_init_vbr_ns
sbs_tz_state

The first two provides the entry point (s is for secure, ns is for non-secure), the last provides the TrustZone state (enabled/disabled).

SBS derives these signals from the option bytes (SECBOOTADD, NSBOOTADD, TZEN) which are stored in flash, thus in a non-volatile area, and from the BOOT0 pin of the processor.

For my particular example, on NUCLEO-H563ZI development board:

BOOT0 is set to low, thus it is not modifying the default behaviour, NSBOOTADD is used.
(I guess) NSBOOTADD option byte is read after reset by SBS, and this is provided as sbs_init_vbr_ns, thus providing INITVTOR to the processor.
The processor initializes VTOR to INITVTOR before or during reset but naturally before reading the vector table.

On STM32H563, default value of NSBOOTADD is 0x08000000. This is the start of flash memory.

Startup Code

CMSIS Core(M) defines the file structure of device templates and uses the name startup_<device>.s or startup_<device>.c for the startup file that performs the CMSIS device startup and contains interrupt vectors (vector table). Traditionally, an assembly file (.s) is used, however CMSIS deprecated the assembly startup files and also provides C files. Both assembly and C startup files have the same basic functionality. At minimum, the startup file should provide a vector table and the implementation for its reset handler.

CMSIS provides a reference implementation startup_ARMCM33.S and startup_ARMCM33.c.

STM32CubeH5 contains STM32CubeH5 CMSIS Device MCU Component which also provides specific startup files for STM32 H5 MCUs, such as startup_stm32h563xx.s. This file is, I think, copied as startup_stm32h563zitx.s to my project (there is no other h563xx related startup file in STM32CubeH5).

Each startup code uses a few externally defined variables such as _sdata in STM32CubeH5, __copy_table_start__ in CMSIS. These are defined by the linker, and I will clarify this in the next post. For this post, please disregard how these are defined.

STM32CubeH5 Startup Code

I will go line by line (without the comments, and reformatted for display purposes) of the file startup_stm32h563zitx.s:

.syntax unified

it uses the unified syntax (not divided). Before Thumb-2 instruction set, the assembly syntax of Thumb instructions were different than Arm instructions. With Thumb-2, this has changed and now they have a unified syntax. I think divided is the default, so this has to be specified.

.cpu cortex-m33
.fpu softvfp
.thumb

selects the target processor, floating point option and the instruction set to be generated. There is no special directive for Thumb-2 instruction set, but the actual instruction set to be generated is decided based on the target processor architecture. Since Cortex-M33 supports Thumb-2, it will be Thumb-2.

Floating point option specifies both the linkage (or ABI) and if hardware FPU is used. softvfp means both software linkage (soft-float ABI, FP registers are not used to pass floating-point arguments) and software floating-point library is used (FPU is not used). If you are using floating-point hardware, it makes sense to use the right option here, but it should be the same option in all source files (C etc.), because the linkage has to be compatible. For Arm Compiler Toolchain some information can be found here: Compiler options for floating-point linkage and computations. For GNU Toolchain, this blog post might be useful: https://embeddedartistry.com/blog/2017/10/11/demystifying-arm-floating-point-compiler-options/.

.global g_pfnVectors
.global Default_Handler

makes these global symbols, thus visible to ld. Not sure exactly why, these are not referred by ld.

.word _sidata
.word _sdata
.word _edata
.word _sbss
.word _ebss

allocates a word for each of these symbols. These symbols are defined in the linker script which I will explain next. It allocates space because the values will be loaded in the startup code, so they have to be stored somewhere.

.section .text.Reset_Handler

the following code will be in the .text.Reset_Handler section.

If you are reading this post, you probably should know but just in case here is a quick summary. The object files and the final output (all are in ELF format) has sections. Traditionally, the code sections (meaning they are read-only and executable) are called .text, initialized data section is called .data, uninitialized data section is called .bss, and read-only data section is called .rodata. These are defined in ELF format as special sections. Putting Reset_Handler under .text.Reset_Handler does not mean it is under .text, this will be done later but it is not uncommon to put each function into a separate section. GNU C compiler gcc has -ffunction-sections and -fdata-sections that does the same by putting the functions or data until separate sections like .text.function1 or .data.mydata1.

Normally you have to define the flags and type of the section but because this starts with .text, it inherits the default flags and type of .text which includes executable flag and have type PROGBITS.

.weak Reset_Handler

sets the weak attribute for Reset_Handler symbol (and here also creates it because it does not exist yet). The symbol Reset_Handler is used by the linker, so the symbol has to be visible to outside. This can be done either by .global or by .weak directive. I am not sure why Reset_Handler is marked weak here, I do not think it is to support override since this is a very core function. I would use .global here.

If you have not heard of weak concept before, it means a normal or strong symbol can override it. Also, it is not an error to have an undefined weak symbol. This is often used for optional functions or default implementations. .weak directive makes a weak symbol or makes an existing symbol weak.

.type Reset_Handler, %function

sets the type of symbol Reset_Handler as function (which means it is a code).

Reset_Handler:

the actual function starts here. Reset_Handler: is called a label which sets the value of Reset_Handler symbol to the value of current location counter (location counter=PC). Thus, it marks the beginning (address) of the function.

  ldr r0, =_estack
  mov sp, r0

effectively sets sp (stack pointer) to _estack. _estack is defined as the first element of the vector table as this is defined by Armv8-M architecture as the initial value of the stack pointer. An important point here is that although Reset_Handler is an exception handler, it runs in thread mode like an application code, because PE (processing element e.g. processor) mode is thread mode after reset. It is like an unconditional branch (jump) is taken to Reset_Handler which contains an application code rather than exception handler.

In thread mode, sp can be either msp (main stack pointer) or psp (process stack pointer), and this is controlled by CONTROL.SPSEL bit, which is 0 after reset, indicating sp is msp in thread mode. In handler mode, always msp is used. Thus, here in this code, msp is set to _estack, the initial value of the stack pointer.

I do not understand why there is a need to set sp explicitly here, because Armv8-M Reset sequence dictate that the first entry of the vector table (_estack here) is copied to msp before even jumping to the second entry (Reset_Handler). I have checked this with the debugger and sp is, not surprisingly, equal to _estack when Reset_Handler is called (before ldr and move instructions above are executed). To me it looks like these two instructions are unnecessary.

  bl SystemInit

branches/calls SystemInit function. This is normally defined in c, for example in system_stm32h5xx.c in STM32CubeH5 and copied to the project. In a project with empty project type, this file is not copied, so you have to define one as void SystemInit(void). This is a very early call; data and bss are not initialized yet, so it should not use these (i.e. do not use global variables). Typically, this function only accesses the system registers and does very early initialization like enabling FPU, configuring power and clock etc. The other initializations (for example of peripherals) can be done later. I think the main reason why there is a need for such a C function is that it is easier to write such things in C using the definitions and structures, otherwise it could also be done in assembly.

  ldr r0, =_sdata
  ldr r1, =_edata
  ldr r2, =_sidata
  movs r3, #0
  b LoopCopyDataInit

CopyDataInit:
  ldr r4, [r2, r3]
  str r4, [r0, r3]
  adds r3, r3, #4

LoopCopyDataInit:
  adds r4, r0, r3
  cmp r4, r1
  bcc CopyDataInit

using _sdata, _edata and _sidata, copies data segment (initializers) from flash to RAM. When the microcontroller is programmed, the binary program is stored in the flash memory since it is the non-volatile storage, and naturally flash memory is writable, however:

it has a finite lifetime for writes (typically around 10000 cycles), so it should not be used as a scratchpad where writes happen often (such as program variables)
Cortex-M33 has a harvard architecture, meaning the code and data fetches can happen in different buses simultaneously, and typically the flash is the code memory whereas RAM is the data memory.

Not surprisingly, initialized program data (initialized static variables) is copied to RAM before program actually starts, and the code above just does that.

  ldr r2, =_sbss
  ldr r4, =_ebss
  movs r3, #0
  b LoopFillZerobss

FillZerobss:
  str  r3, [r2]
  adds r2, r2, #4

LoopFillZerobss:
  cmp r2, r4
  bcc FillZerobss

using _sbss and _ebss, this code zeroes an area in RAM as big as bss segment. Similar to initalized program data, uninitialized program data (uninitalized static variables) also stays in RAM. However, because there is no initialization, nothing is copied, the memory is just zeroed. Be aware that this is only for static variables, uninitialized local variables stays in the stack, thus their initial value might not be zero. Because there is no initialization, the bss section consumes no space in Flash memory.

  bl __libc_init_array

calls/branches __libc_init_array C function. This function is defined in the C runtime library that comes and used with the toolchain (not related to CMSIS). Since newlib nano is used for this build, looking at the source code of newlib nano, the function can be found in libc/misc.init.c:

void
__libc_init_array (void)
{
  size_t count;
  size_t i;

  count = __preinit_array_end - __preinit_array_start;
  for (i = 0; i < count; i++)
    __preinit_array_start[i] ();

  _init ();

  count = __init_array_end - __init_array_start;
  for (i = 0; i < count; i++)
    __init_array_start[i] ();
}

The mentioned arrays __preinit_array and __init_array contains pointers to functions. Thus, __libc_init_array calls these functions in each array and there is also a call to a function named _init between.

The first thing that comes to mind is these functions can be for example static initializers and constructors that has to be run before the program starts. However, in a C application there are no such things, so I am not sure if this is needed at all. If you have a definite answer, please let me know.

_init (and also _fini) comes with libgcc in the toolchain. These are actually not a standalone function on its own, but a wrapper, so it contains a top part and bottom part that sandwiches anything between. The top and the bottom parts are defined in crti.S and crtn.S respectively. If we look at the debugger disassembler, the _init function looks like this:

_init:
push    {r3, r4, r5, r6, r7, lr}
nop     
pop     {r3, r4, r5, r6, r7}
pop     {r3}
mov     lr, r3
bx      lr

crti.S contains only the first push instruction which is the top part, and crtn.S contains the third pop instruction and following instructions including the last bx lr. The nop instruction between is sandwiched between these. It is nop because there is no _init function given to gcc. gcc allows one to define an _init (and _fini) function to be called before (and after) the application starts. Because this is an embedded application, it does not terminate, so _fini does not make sense but _init may. The _init above is just a wrapper for a user supplied _init function, to save the registers and then load them back and return.

  bl main

finally, calls/branches main C function (in the application code).

LoopForever:
  b LoopForever

normally, the main function should not return, that is why there is always a while (1) loop there. However, for some reason, if main returns, this infinite loop runs and does nothing.

Maybe funny to discuss this but you may ask what happens if the code terminates totally (e.g. there is no LoopForever and main terminates). There is actually no concept of program termination because there is no operating system. The processor cannot pause/sleep unless it is explicitly requested, so it continues executing and if main terminates and there is no LoopForever, I think three things can happen when it tries to execute the next instruction (in next uninitialized memory location):

if the memory location is not in code memory (not in an executable memory region), then a fault (MemManageFault or HardFault) happens
if it is a valid instruction, then it executes that and continues
if it is not a valid instruction, then a fault (UsageFault or HardFault) happens

In order to prevent this ambiguous behavior, having a LoopForever is probably a must.

.size Reset_Handler, .-Reset_Handler

sets the size of Reset_Handler. The dot . means the current location pointer. Subtracting the beginning (address) of the function from current location (address) is the size.

.section .text.Default_Handler,"ax",%progbits
Default_Handler:
Infinite_Loop:
  b Infinite_Loop
  .size Default_Handler, .-Default_Handler

defines a section .text.Default_Handler and a symbol Default_Handler containing an infinite loop, and sets its size. Default_Handler is used when no specific exception handler is defined for an exception or interrupt (soon you will see how). The section is defined with ax flags, a for allocatable and x for executable, and with %progbits as a section containing data which does not mean it is only data (as it is executable) but it means there is information in it, it is not empty etc. Because it starts with .text, the default flags are already ax and type is %progbits, so I do not think these are necessary.

.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors

defines a section .isr_vector as allocatable (a) and containing data (%progbits), sets its type to object and sets its size. Data type object actually means it is a data object (it does not contain code).

There is a mistake here. The position of .size directive is not correct because the vector table entries follows it, so .-g_fpnVectors is always 0, this can be checked with objdump. In order to correct this, .size directive should be moved after the vector table definitions. I have submitted a patch to STM32CubeH5 to correct this.

Symbol table entry when .size is before the vector table definitions:

08000000 g     O .isr_vector	00000000 g_pfnVectors

Symbol table entry after moving .size to end of the vector table definitions:

08000000 g     O .isr_vector	0000024c g_pfnVectors

g_pfnVectors:
  .word	_estack
  .word	Reset_Handler
  .word	NMI_Handler
  .word	HardFault_Handler
  .word	MemManage_Handler
  .word	BusFault_Handler
  .word	UsageFault_Handler
  .word	SecureFault_Handler
  .word	0
  .word	0
  .word	0
  .word	SVC_Handler
  .word	DebugMon_Handler
  .word	0
  .word	PendSV_Handler
  .word	SysTick_Handler
  .word	WWDG_IRQHandler
...
  .word	LPTIM6_IRQHandler

the listing above is truncated between the … marks because there are a lot of entries (147 entries) since there are a lot of external interrupts. The first entry here is the initial value of the stack pointer. All other entries starting with the second one are either an exception or an interrupt handler but some may be unused or reserved, which are set to 0 above.

Cortex-M33 supports up to 480 external interrupts through NVIC plus 16 internal lines. Each one of these have an exception number starting from 1 (not 0, and it is logical since the first entry, entry number 0, above is not an exception but the start of stack). STM32H563 NVIC has 131 external interrupts. These interrupts start just after SysTick whose exception number is 15 (so it is the last of 16 internal ones). The WWDG_IRQHandler above is the external interrupt 0 and including and after that and ending with LPTIM6_IRQHandler, there are 131 entries. Not all entries are populated, some are 0 since they are unavailable in this processor. For example, maskable interrupt 36 is for SAES (Secure AES) co-processor, but STM32H563 does not have it, so this interrupt is not used.

The first 15 exceptions, starting with Reset_Handler and including and ending with SysTick_Handler does not pass through NVIC, they have individual lines. Each exception (including interrupts) have an exception number as follows:

1-15: Reset to SysTick
16-495: IRQ0 to IRQ479 (NVIC interrupts)

Exception number 0 does not exist and it might be used to indicate thread mode (not in an exception).

.weak	    NMI_Handler
.thumb_set  NMI_Handler,Default_Handler

.weak	    HardFault_Handler
.thumb_set  HardFault_Handler,Default_Handler
...
.weak	    LPTIM6_IRQHandler
.thumb_set  LPTIM6_IRQHandler,Default_Handler

this listing is truncated (… marks) in between since it repeats the same operation for each exception handler symbol (137 symbols). The first line of each entry marks the handler as a weak symbol with .weak directive. The second line with .thumb_set directive does two things:

creates a symbol as an alias to another
marks the created symbol as a thumb function

as expected, it is not meaningful to define all these handlers in the code. When a handler is not defined, Default_Handler is called. Thus, the linker, when an exception or interrupt handler symbol is not defined, replaces it with Default_Handler symbol.

This is the end of the file. To sum up, the startup code:

defines Reset_Handler function, which:
- unnecessarily sets the sp at the beginning
- calls SystemInit
- copies the data segment (the data of initialized static variables) to RAM
- zeroes the bss segment (location of uninitialized static variables) in RAM
- calls __libc_init_array
- calls main
defines Default_Handler function
defines the vector table but sets its size wrong
defines weak symbols for each exception handler and sets its default as Default_Handler

CMSIS Startup Code

CMSIS Core(M) also has a startup file named startup_ARMCM33.S. As I mentioned before CMSIS has deprecated the assembly startup files, and it also provides a startup_ARMCM33.c. I will first compare the assembly startup code to the one provided with STM32CubeH5. Then, I will describe the CMSIS C startup code.

I have removed the comments and the TrustZone related statements from the CMSIS startup code below and changed the indentation for display purposes when needed.

.syntax  unified
.arch    armv8-m.main

#define __INITIAL_SP     __StackTop
#define __STACK_LIMIT    __StackLimit

.section .vectors
.align   2
.globl   __Vectors
.globl   __Vectors_End
.globl   __Vectors_Size

Like before, the syntax is unified but the architecture is armv8-m.main which may have same result since cortex-m33 has this architecture. Then, it defines the vector table section called .vectors and aligns it to 2 bytes.

__Vectors:
  .long    __INITIAL_SP
  .long    Reset_Handler
  .long    NMI_Handler
  .long    HardFault_Handler
  .long    MemManage_Handler
  .long    BusFault_Handler
  .long    UsageFault_Handler
  .long    SecureFault_Handler
  .long    0
  .long    0
  .long    0
  .long    SVC_Handler
  .long    DebugMon_Handler
  .long    0
  .long    PendSV_Handler
  .long    SysTick_Handler

  .long    Interrupt0_Handler
  .long    Interrupt1_Handler
  .long    Interrupt2_Handler
  .long    Interrupt3_Handler
  .long    Interrupt4_Handler
  .long    Interrupt5_Handler
  .long    Interrupt6_Handler
  .long    Interrupt7_Handler
  .long    Interrupt8_Handler
  .long    Interrupt9_Handler

  .space   (470 * 4)

__Vectors_End:
  .equ     __Vectors_Size, __Vectors_End - __Vectors
  .size    __Vectors, . - __Vectors

This is also similar but the table is allocated fully with .space directive. The size is set correctly.

.thumb_func
.type  Reset_Handler, %function
.globl Reset_Handler
.fnstart
Reset_Handler:

No separate section is defined and .thumb_func directive is used. This is to generate correct code to use Arm and Thumb at the same time, but I guess it is unnecessary since Cortex-M33 supports only thumb. .thumb would be enough as in STM32CubeH5 startup code. .fnstart (and .fnend at the end of the function) is used to generate an unwind table entry. If stack unwinding is not used, this is also not necessary.

  ldr r0, =__INITIAL_SP
  msr psp, r0

This is I think a major difference. STM32CubeH5 startup code was loading the initial stack value to sp (which is msp) which is unnecessary because reset already does this. Here, not msp but psp is loaded with the same initial stack value. It is a bit strange the same value is used but at least this is not a redundant statement.

  ldr r0, =__STACK_LIMIT
  msr msplim, r0
  msr psplim, r0

I think this is also important and major difference. Both stack limit registers, MSPLIM and PSPLIM are also loaded.

Both __INITIAL_SP and __STACK_LIMIT are definitions and the values are set by the linker.

  bl SystemInit

calls SystemInit as in STM32CubeH5.

  ldr   r4, =__copy_table_start__
  ldr   r5, =__copy_table_end__

.L_loop0:
  cmp   r4, r5
  bge   .L_loop0_done
  ldr   r1, [r4]
  ldr   r2, [r4, #4]
  ldr   r3, [r4, #8]
  lsls  r3, r3, #2

.L_loop0_0:
  subs  r3, #4
  ittt  ge
  ldrge r0, [r1, r3]
  strge r0, [r2, r3]
  bge   .L_loop0_0

  adds  r4, #12
  b     .L_loop0

.L_loop0_done:
  ldr   r3, =__zero_table_start__
  ldr   r4, =__zero_table_end__

.L_loop2:
  cmp   r3, r4
  bge   .L_loop2_done
  ldr   r1, [r3]
  ldr   r2, [r3, #4]
  lsls  r2, r2, #2
  movs  r0, 0

.L_loop2_0:
  subs  r2, #4
  itt   ge
  strge r0, [r1, r2]
  bge   .L_loop2_0

  adds  r3, #8
  b     .L_loop2

.L_loop2_done:

the implementation is slightly different but effectively it copies the initialization data to RAM and zeroes the uninitialized section as in STM32CubeH5 startup code.

  bl _start

and calls _start. That is quite different. It does not call __libc_init_array and then main but calls default C startup function _start.

.fnend
.size Reset_Handler, . - Reset_Handler

marks the end of the function with .fnend and sets the size of the function.

.thumb_func
.type    HardFault_Handler, %function
.weak    HardFault_Handler
.fnstart
HardFault_Handler:
  b .
.fnend
.size HardFault_Handler, . - HardFault_Handler

.thumb_func
.type Default_Handler, %function
.weak Default_Handler
.fnstart
Default_Handler:
  b .
.fnend
.size Default_Handler, . - Default_Handler

similar to Reset_Handler, HardFault_Handler and Default_Handler is defined. An interesting thing is although it is a weak definition, HardFault_Handler is defined in the startup code here. There was no such definition in STM32CubeH5 startup code. I find defining the HardFault_Handler in startup code more correct because the reset sequence assumes there is an operating HardFault_Handler. Effectively it does not matter if a default handler is used but it is just looking better to me.

.macro   Set_Default_Handler  Handler_Name
.weak    \Handler_Name
.set     \Handler_Name, Default_Handler
.endm

CMSIS startup code makes a clever decision of creating a macro to not repeat these two lines so many times.

Set_Default_Handler  NMI_Handler
Set_Default_Handler  MemManage_Handler
Set_Default_Handler  BusFault_Handler
Set_Default_Handler  UsageFault_Handler
Set_Default_Handler  SecureFault_Handler
Set_Default_Handler  SVC_Handler
Set_Default_Handler  DebugMon_Handler
Set_Default_Handler  PendSV_Handler
Set_Default_Handler  SysTick_Handler

Set_Default_Handler  Interrupt0_Handler
Set_Default_Handler  Interrupt1_Handler
Set_Default_Handler  Interrupt2_Handler
Set_Default_Handler  Interrupt3_Handler
Set_Default_Handler  Interrupt4_Handler
Set_Default_Handler  Interrupt5_Handler
Set_Default_Handler  Interrupt6_Handler
Set_Default_Handler  Interrupt7_Handler
Set_Default_Handler  Interrupt8_Handler
Set_Default_Handler  Interrupt9_Handler

.end

as before, by using the macro just defined, it sets a default for all handlers, naturally except HardFault_Handler since it is already defined.

CMSIS Startup Code in C

The startup code explained below is also reformatted and parts related to TrustZone are omitted.

C startup code starts with:

extern uint32_t __INITIAL_SP;
extern uint32_t __STACK_LIMIT;

these are defined as __StackTop and __StackLimit, which are set by the linker.

extern __NO_RETURN void __PROGRAM_START(void);

__PROGRAM_START is __main for gcc. I believe the reason for abstracting these are to support different compilers.

__NO_RETURN is __declspec(noreturn), marking the function as it will not return. I think it is only used for optimizations.

__NO_RETURN void Reset_Handler  (void);
            void Default_Handler(void);

as in the assembly startup file, these are the Reset and Default handlers.

void HardFault_Handler      (void) __attribute__ ((weak));
void SysTick_Handler        (void) __attribute__ ((weak, alias("Default_Handler")));
void Interrupt0_Handler     (void) __attribute__ ((weak, alias("Default_Handler")));

then the exception handlers are defined and they are marked as weak and an alias to Default_Handler is defined.

extern const VECTOR_TABLE_Type __VECTOR_TABLE[496];
       const VECTOR_TABLE_Type __VECTOR_TABLE[496] __VECTOR_TABLE_ATTRIBUTE = {
  (VECTOR_TABLE_Type)(&__INITIAL_SP),       /*     Initial Stack Pointer */
  Reset_Handler,                            /*     Reset Handler */
...
  SecureFault_Handler,                      /*  -9 Secure Fault Handler */
  0,                                        /*     Reserved */ 
...
  SysTick_Handler,                          /*  -1 SysTick Handler */

  /* Interrupts */
  Interrupt0_Handler,                       /*   0 Interrupt 0 */
...
};

(parts with … are truncated, it contains similar entries)

this defines the vector table similar to assembly startup code including the reserved entries with 0. The important thing here is __VECTOR_TABLE_ATTRIBUTE which is defined as __attribute__((used, section(".vectors"))), to put this table to section .vectors, which is, I think, used by the linker.

__NO_RETURN void Reset_Handler(void)
{
  __set_PSP((uint32_t)(&__INITIAL_SP));

  __set_MSPLIM((uint32_t)(&__STACK_LIMIT));
  __set_PSPLIM((uint32_t)(&__STACK_LIMIT));

  SystemInit();                             /* CMSIS System Initialization */
  __PROGRAM_START();                        /* Enter PreMain (C library entry point) */
}

Reset handler, similar to the assembly startup code, sets PSP, MSPLIM and PSPLIM, and then calls SystemInit. Differently, reset handler does not perform anything for the memory initialization, this is delegated to __main in C runtime. This is I think the most important difference than the assembly startup file.

The functions starting with __set are defined in different places in CMSIS, some are compiler independent (like __set_PSP), others (like __set_MSPLIM) are compiler dependent because they use inline assembly.

void HardFault_Handler(void)
{
  while(1);
}

HardFault is just an infinite loop.

void Default_Handler(void)
{
  while(1);
}

Default handler is also just an infinite loop.

Summary

After reset, the processor assumes there is a valid vector table containing the initial value of stack pointer and the location of reset handler in the first two entries. Then, the reset handler is called and it takes over the startup process.

The startup code has a few basic responsibilities:

define the interrupt vector table
provide a Reset handler implementation (and probably also a HardFault handler implementation because it is activated by the processor after reset)
perform or delegate low level hardware initialization (power, clock etc.)
perform or delegate memory initialization for initialized (data) and uninitialized (bss) data

All three startup files mentioned above delegates the low level hardware initialization to a function called void SystemInit(void) which is called before memory initialization.

The CMSIS C startup file has almost the same structure and functionality as the CMSIS assembly startup file, but the memory initialization is delegated to the __main in C runtime. The startup file is a C file but it has to use non-standard features like attributes and inline assembly as there is no standard way to do these things. I am not very convinced the use of C for the startup code yet.

I find the following meaningful (correct, more correct or looks better) in CMSIS assembly startup file comparing to STM32CubeH5 startup file:

the size of the vector table is set correctly
not necessary and consuming space but allocating the whole vector table might be a good idea
using stack limit register MSPLIM
defining the HardFault handler in the startup code
using a macro when defining the handlers

and these are not necessary, not necessarily correct or not a good idea for me:

no need to use .thumb_func
no need to use .fnstart and .fnend since stack unwinding is not used
I would not set PSP and PSPLIM if I am not going to use PSP

In STM32CubeH5 startup file:

it is unnecessary to load sp at the beginning
the vector table size is not properly set (always 0)

Also, CMSIS assembly code aligns the vector table explicitly, whereas STM32CubeH5 does not. The alignment can be further modified by the linker and I think it makes more sense to use the linker for this.

It is not very difficult to combine the best parts of each startup file which I will do in a later post. The most critical issue is I guess whether a C runtime initialization is required and if so which C runtime initialization function to call, __libc_init_array, _start or __main.

References

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.