Demystifying Arm Cortex-M33 Bare Metal: Startup
Introduction
I had a problem when developing code for an STM32 development board, and I realized there are things I did not fully understand regarding how the processor starts (or boots) and how a program is built (compiled, assembled and linked), and the relation between these two. I decided to dig deeper, and this and subsequent posts are the result of this study.
I initially started writing a post covering all aspects of startup and build as they are very related, however, such a post becomes very long. Thus, I have decided to split this topic to a few posts. The first one is this, about the startup. As I publish new posts in this series, I will list them here.
The second post is published as Demystifying Arm Cortex-M33 Bare Metal: Compile, Assembly and Link.
I assume you have some knowledge regarding Arm Cortex-M and/or Armv8-M architecture and/or developing embedded software in general. You definitely do not need to be an expert, the information in this and subsequent posts are basic, but might not be simple. I am assuming the actual application is going to be written in C not in C++, so I disregard any requirements of a C++ runtime.
I have explicitly named this post for Arm Cortex-M33, and I am using an STM32H563 microcontroller. However, only a few things are specific to the actual processor, most of the information should be valid for any Arm Cortex-M33 processor and useful for any Arm Cortex-M processor.
I am using Arm GNU Toolchain either with STM32CubeIDE or directly on the Linux command line. Some information in this post is only accurate for this toolchain. I did not check Arm compiler for Embedded toolchain or others.
Questions
When I create an STM32 project in STM32CubeIDE with project type set to empty (not STM32Cube), a startup code startup_stm32h563zitx.s and a linker description STM32H563ZITX_FLASH.ld is copied to the project. These come from STM32CubeH5 which is ST’s CMSIS Device MCU component for H5 series.
These are the questions for which I am looking for a complete and accurate answer:
What does the MCU do when it is powered on (cold-reset) or after (warm-)reset ?
What does the startup code do ?
Is the startup code in CMSIS Core(M) as same as STM32CubeH5 ? If not, what are the differences ?
How is the program built (compiled, assembled and linked) ? and how is the build process related to the startup code ?
Is the linker script in CMSIS Core(M) as same as STM32CubeH5 ? If not, what are the differences ?
What changes when TrustZone is enabled ?
I will follow the same order when trying to give the answers. This post is about the questions (1), (2) and (3). I will probably write two more posts covering (4) and (5), and finally (6).
Reset
A reset can be a cold reset (power recycle) or a warm reset (no power recycle) but the difference between them is not important for this post.
The reset procedure is documented as a pseudocode in ARMv8-M Architecture Reference Manual. I did not include everything below, but simply during reset:
- system registers are set to their default/reset values
- LR is set to 0xFFFFFFFF
- all exceptions are inactivated
- stack limit registers (MSPLIM and PSPLIM) are cleared
- first element of the vector table (stack pointer) is loaded as sp
- second element of the vector table (Reset_Handler) is loaded as start
- HardFault exception is activated
- main stack pointer (MSP) is set to sp (first element of the vector table)
- start (Reset_Handler) is called
This raises an extra question: how does the processor know the location of the vector table ?
The Location of the Vector Table
The location of the vector table is in VTOR (Vector Table Offset Register)
. Under VTOR section (D1.2.265), the Armv8-M Architecture Reference Manual says: “This field resets to an IMPLEMENTATION DEFINED value on a Warm reset.”. So its value depends on the silicon vendor (such as ST or NXP).
The default/reset value should either be a fixed value (such as zero) or should be “told” to the processor just after reset before reset procedure described above starts. On Cortex-M33, its initial value is provided to the processor externally through INITVTOR signals.
On STM32H563, there is an SBS block that provides some signals to the processor such as:
sbs_init_vbr_s
sbs_init_vbr_ns
sbs_tz_state
The first two provides the entry point (s is for secure, ns is for non-secure), the last provides the TrustZone state (enabled/disabled).
SBS derives these signals from the option bytes (SECBOOTADD, NSBOOTADD, TZEN) which are stored in flash, thus in a non-volatile area, and from the BOOT0 pin of the processor.
For my particular example, on NUCLEO-H563ZI development board:
- BOOT0 is set to low, thus it is not modifying the default behaviour, NSBOOTADD is used.
- (I guess) NSBOOTADD option byte is read after reset by SBS, and this is provided as
sbs_init_vbr_ns
, thus providingINITVTOR
to the processor. - The processor initializes
VTOR
toINITVTOR
before or during reset but naturally before reading the vector table.
On STM32H563, default value of NSBOOTADD is 0x08000000. This is the start of flash memory.
Startup Code
CMSIS Core(M) defines the file structure of device templates and uses the name startup_<device>.s
or startup_<device>.c
for the startup file that performs the CMSIS device startup and contains interrupt vectors (vector table). Traditionally, an assembly file (.s) is used, however CMSIS deprecated the assembly startup files and also provides C files. Both assembly and C startup files have the same basic functionality. At minimum, the startup file should provide a vector table and the implementation for its reset handler.
CMSIS provides a reference implementation startup_ARMCM33.S and startup_ARMCM33.c.
STM32CubeH5 contains STM32CubeH5 CMSIS Device MCU Component which also provides specific startup files for STM32 H5 MCUs, such as startup_stm32h563xx.s. This file is, I think, copied as startup_stm32h563zitx.s
to my project (there is no other h563xx related startup file in STM32CubeH5).
Each startup code uses a few externally defined variables such as _sdata
in STM32CubeH5, __copy_table_start__
in CMSIS. These are defined by the linker, and I will clarify this in the next post. For this post, please disregard how these are defined.
STM32CubeH5 Startup Code
I will go line by line (without the comments, and reformatted for display purposes) of the file startup_stm32h563zitx.s
:
.syntax unified
it uses the unified syntax (not divided). Before Thumb-2 instruction set, the assembly syntax of Thumb instructions were different than Arm instructions. With Thumb-2, this has changed and now they have a unified syntax. I think divided is the default, so this has to be specified.
.cpu cortex-m33
.fpu softvfp
.thumb
selects the target processor, floating point option and the instruction set to be generated. There is no special directive for Thumb-2 instruction set, but the actual instruction set to be generated is decided based on the target processor architecture. Since Cortex-M33 supports Thumb-2, it will be Thumb-2.
Floating point option specifies both the linkage (or ABI) and if hardware FPU is used. softvfp
means both software linkage (soft-float ABI, FP registers are not used to pass floating-point arguments) and software floating-point library is used (FPU is not used). If you are using floating-point hardware, it makes sense to use the right option here, but it should be the same option in all source files (C etc.), because the linkage has to be compatible. For Arm Compiler Toolchain some information can be found here: Compiler options for floating-point linkage and computations. For GNU Toolchain, this blog post might be useful: https://embeddedartistry.com/blog/2017/10/11/demystifying-arm-floating-point-compiler-options/.
.global g_pfnVectors
.global Default_Handler
makes these global symbols, thus visible to ld. Not sure exactly why, these are not referred by ld.
.word _sidata
.word _sdata
.word _edata
.word _sbss
.word _ebss
allocates a word for each of these symbols. These symbols are defined in the linker script which I will explain next. It allocates space because the values will be loaded in the startup code, so they have to be stored somewhere.
.section .text.Reset_Handler
the following code will be in the .text.Reset_Handler
section.
If you are reading this post, you probably should know but just in case here is a quick summary. The object files and the final output (all are in ELF format) has sections. Traditionally, the code sections (meaning they are read-only and executable) are called .text
, initialized data section is called .data
, uninitialized data section is called .bss
, and read-only data section is called .rodata
. These are defined in ELF format as special sections. Putting Reset_Handler
under .text.Reset_Handler
does not mean it is under .text
, this will be done later but it is not uncommon to put each function into a separate section. GNU C compiler gcc has -ffunction-sections
and -fdata-sections
that does the same by putting the functions or data until separate sections like .text.function1
or .data.mydata1
.
Normally you have to define the flags and type of the section but because this starts with .text
, it inherits the default flags and type of .text
which includes executable
flag and have type PROGBITS.
.weak Reset_Handler
sets the weak attribute for Reset_Handler
symbol (and here also creates it because it does not exist yet). The symbol Reset_Handler
is used by the linker, so the symbol has to be visible to outside. This can be done either by .global
or by .weak
directive. I am not sure why Reset_Handler
is marked weak here, I do not think it is to support override since this is a very core function. I would use .global
here.
If you have not heard of weak
concept before, it means a normal or strong symbol can override it. Also, it is not an error to have an undefined weak
symbol. This is often used for optional functions or default implementations. .weak
directive makes a weak symbol or makes an existing symbol weak.
.type Reset_Handler, %function
sets the type of symbol Reset_Handler
as function (which means it is a code).
Reset_Handler:
the actual function starts here. Reset_Handler:
is called a label which sets the value of Reset_Handler
symbol to the value of current location counter (location counter=PC). Thus, it marks the beginning (address) of the function.
ldr r0, =_estack
mov sp, r0
effectively sets sp
(stack pointer) to _estack
. _estack
is defined as the first element of the vector table as this is defined by Armv8-M architecture as the initial value of the stack pointer. An important point here is that although Reset_Handler
is an exception handler, it runs in thread mode like an application code, because PE (processing element e.g. processor) mode is thread mode after reset. It is like an unconditional branch (jump) is taken to Reset_Handler
which contains an application code rather than exception handler.
In thread mode, sp
can be either msp
(main stack pointer) or psp
(process stack pointer), and this is controlled by CONTROL.SPSEL
bit, which is 0 after reset, indicating sp
is msp
in thread mode. In handler mode, always msp
is used. Thus, here in this code, msp
is set to _estack
, the initial value of the stack pointer.
I do not understand why there is a need to set sp
explicitly here, because Armv8-M Reset sequence dictate that the first entry of the vector table (_estack
here) is copied to msp
before even jumping to the second entry (Reset_Handler
). I have checked this with the debugger and sp
is, not surprisingly, equal to _estack
when Reset_Handler
is called (before ldr and move instructions above are executed). To me it looks like these two instructions are unnecessary.
bl SystemInit
branches/calls SystemInit
function. This is normally defined in c, for example in system_stm32h5xx.c
in STM32CubeH5 and copied to the project. In a project with empty project type, this file is not copied, so you have to define one as void SystemInit(void)
. This is a very early call; data and bss are not initialized yet, so it should not use these (i.e. do not use global variables). Typically, this function only accesses the system registers and does very early initialization like enabling FPU, configuring power and clock etc. The other initializations (for example of peripherals) can be done later. I think the main reason why there is a need for such a C function is that it is easier to write such things in C using the definitions and structures, otherwise it could also be done in assembly.
ldr r0, =_sdata
ldr r1, =_edata
ldr r2, =_sidata
movs r3, #0
b LoopCopyDataInit
CopyDataInit:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
LoopCopyDataInit:
adds r4, r0, r3
cmp r4, r1
bcc CopyDataInit
using _sdata
, _edata
and _sidata
, copies data segment (initializers) from flash to RAM. When the microcontroller is programmed, the binary program is stored in the flash memory since it is the non-volatile storage, and naturally flash memory is writable, however:
- it has a finite lifetime for writes (typically around 10000 cycles), so it should not be used as a scratchpad where writes happen often (such as program variables)
- Cortex-M33 has a harvard architecture, meaning the code and data fetches can happen in different buses simultaneously, and typically the flash is the code memory whereas RAM is the data memory.
Not surprisingly, initialized program data (initialized static variables) is copied to RAM before program actually starts, and the code above just does that.
ldr r2, =_sbss
ldr r4, =_ebss
movs r3, #0
b LoopFillZerobss
FillZerobss:
str r3, [r2]
adds r2, r2, #4
LoopFillZerobss:
cmp r2, r4
bcc FillZerobss
using _sbss
and _ebss
, this code zeroes an area in RAM as big as bss segment. Similar to initalized program data, uninitialized program data (uninitalized static variables) also stays in RAM. However, because there is no initialization, nothing is copied, the memory is just zeroed. Be aware that this is only for static variables, uninitialized local variables stays in the stack, thus their initial value might not be zero. Because there is no initialization, the bss section consumes no space in Flash memory.
bl __libc_init_array
calls/branches __libc_init_array
C function. This function is defined in the C runtime library that comes and used with the toolchain (not related to CMSIS). Since newlib nano is used for this build, looking at the source code of newlib nano, the function can be found in libc/misc.init.c
:
void
__libc_init_array (void)
{
size_t count;
size_t i;
count = __preinit_array_end - __preinit_array_start;
for (i = 0; i < count; i++)
__preinit_array_start[i] ();
_init ();
count = __init_array_end - __init_array_start;
for (i = 0; i < count; i++)
__init_array_start[i] ();
}
The mentioned arrays __preinit_array
and __init_array
contains pointers to functions. Thus, __libc_init_array
calls these functions in each array and there is also a call to a function named _init
between.
The first thing that comes to mind is these functions can be for example static initializers and constructors that has to be run before the program starts. However, in a C application there are no such things, so I am not sure if this is needed at all. If you have a definite answer, please let me know.
_init
(and also _fini
) comes with libgcc
in the toolchain. These are actually not a standalone function on its own, but a wrapper, so it contains a top part and bottom part that sandwiches anything between. The top and the bottom parts are defined in crti.S
and crtn.S
respectively. If we look at the debugger disassembler, the _init
function looks like this:
_init:
push {r3, r4, r5, r6, r7, lr}
nop
pop {r3, r4, r5, r6, r7}
pop {r3}
mov lr, r3
bx lr
crti.S
contains only the first push instruction
which is the top part, and crtn.S
contains the third pop
instruction and following instructions including the last bx lr
. The nop
instruction between is sandwiched between these. It is nop
because there is no _init
function given to gcc. gcc allows one to define an _init
(and _fini
) function to be called before (and after) the application starts. Because this is an embedded application, it does not terminate, so _fini
does not make sense but _init
may. The _init
above is just a wrapper for a user supplied _init
function, to save the registers and then load them back and return.
bl main
finally, calls/branches main
C function (in the application code).
LoopForever:
b LoopForever
normally, the main
function should not return, that is why there is always a while (1)
loop there. However, for some reason, if main returns, this infinite loop runs and does nothing.
Maybe funny to discuss this but you may ask what happens if the code terminates totally (e.g. there is no LoopForever and main terminates). There is actually no concept of program termination because there is no operating system. The processor cannot pause/sleep unless it is explicitly requested, so it continues executing and if main terminates and there is no LoopForever, I think three things can happen when it tries to execute the next instruction (in next uninitialized memory location):
if the memory location is not in code memory (not in an executable memory region), then a fault (MemManageFault or HardFault) happens
if it is a valid instruction, then it executes that and continues
if it is not a valid instruction, then a fault (UsageFault or HardFault) happens
In order to prevent this ambiguous behavior, having a LoopForever is probably a must.
.size Reset_Handler, .-Reset_Handler
sets the size of Reset_Handler
. The dot .
means the current location pointer. Subtracting the beginning (address) of the function from current location (address) is the size.
.section .text.Default_Handler,"ax",%progbits
Default_Handler:
Infinite_Loop:
b Infinite_Loop
.size Default_Handler, .-Default_Handler
defines a section .text.Default_Handler
and a symbol Default_Handler
containing an infinite loop, and sets its size. Default_Handler
is used when no specific exception handler is defined for an exception or interrupt (soon you will see how). The section is defined with ax
flags, a for allocatable and x for executable, and with %progbits
as a section containing data which does not mean it is only data (as it is executable) but it means there is information in it, it is not empty etc. Because it starts with .text
, the default flags are already ax
and type is %progbits
, so I do not think these are necessary.
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
defines a section .isr_vector
as allocatable (a
) and containing data (%progbits
), sets its type to object
and sets its size. Data type object
actually means it is a data object (it does not contain code).
There is a mistake here. The position of .size
directive is not correct because the vector table entries follows it, so .-g_fpnVectors
is always 0, this can be checked with objdump
. In order to correct this, .size
directive should be moved after the vector table definitions. I have submitted a patch to STM32CubeH5 to correct this.
Symbol table entry when .size
is before the vector table definitions:
08000000 g O .isr_vector 00000000 g_pfnVectors
Symbol table entry after moving .size
to end of the vector table definitions:
08000000 g O .isr_vector 0000024c g_pfnVectors
g_pfnVectors:
.word _estack
.word Reset_Handler
.word NMI_Handler
.word HardFault_Handler
.word MemManage_Handler
.word BusFault_Handler
.word UsageFault_Handler
.word SecureFault_Handler
.word 0
.word 0
.word 0
.word SVC_Handler
.word DebugMon_Handler
.word 0
.word PendSV_Handler
.word SysTick_Handler
.word WWDG_IRQHandler
...
.word LPTIM6_IRQHandler
the listing above is truncated between the … marks because there are a lot of entries (147 entries) since there are a lot of external interrupts. The first entry here is the initial value of the stack pointer. All other entries starting with the second one are either an exception or an interrupt handler but some may be unused or reserved, which are set to 0 above.
Cortex-M33 supports up to 480 external interrupts through NVIC plus 16 internal lines. Each one of these have an exception number starting from 1 (not 0, and it is logical since the first entry, entry number 0, above is not an exception but the start of stack). STM32H563 NVIC has 131 external interrupts. These interrupts start just after SysTick whose exception number is 15 (so it is the last of 16 internal ones). The WWDG_IRQHandler above is the external interrupt 0 and including and after that and ending with LPTIM6_IRQHandler, there are 131 entries. Not all entries are populated, some are 0 since they are unavailable in this processor. For example, maskable interrupt 36 is for SAES (Secure AES) co-processor, but STM32H563 does not have it, so this interrupt is not used.
The first 15 exceptions, starting with Reset_Handler
and including and ending with SysTick_Handler
does not pass through NVIC, they have individual lines. Each exception (including interrupts) have an exception number as follows:
- 1-15: Reset to SysTick
- 16-495: IRQ0 to IRQ479 (NVIC interrupts)
Exception number 0 does not exist and it might be used to indicate thread mode (not in an exception).
.weak NMI_Handler
.thumb_set NMI_Handler,Default_Handler
.weak HardFault_Handler
.thumb_set HardFault_Handler,Default_Handler
...
.weak LPTIM6_IRQHandler
.thumb_set LPTIM6_IRQHandler,Default_Handler
this listing is truncated (… marks) in between since it repeats the same operation for each exception handler symbol (137 symbols). The first line of each entry marks the handler as a weak symbol with .weak
directive. The second line with .thumb_set
directive does two things:
- creates a symbol as an alias to another
- marks the created symbol as a thumb function
as expected, it is not meaningful to define all these handlers in the code. When a handler is not defined, Default_Handler
is called. Thus, the linker, when an exception or interrupt handler symbol is not defined, replaces it with Default_Handler
symbol.
This is the end of the file. To sum up, the startup code:
- defines
Reset_Handler
function, which:- unnecessarily sets the
sp
at the beginning - calls SystemInit
- copies the data segment (the data of initialized static variables) to RAM
- zeroes the bss segment (location of uninitialized static variables) in RAM
- calls
__libc_init_array
- calls
main
- unnecessarily sets the
- defines
Default_Handler
function - defines the vector table but sets its size wrong
- defines weak symbols for each exception handler and sets its default as
Default_Handler
CMSIS Startup Code
CMSIS Core(M) also has a startup file named startup_ARMCM33.S. As I mentioned before CMSIS has deprecated the assembly startup files, and it also provides a startup_ARMCM33.c. I will first compare the assembly startup code to the one provided with STM32CubeH5. Then, I will describe the CMSIS C startup code.
I have removed the comments and the TrustZone related statements from the CMSIS startup code below and changed the indentation for display purposes when needed.
.syntax unified
.arch armv8-m.main
#define __INITIAL_SP __StackTop
#define __STACK_LIMIT __StackLimit
.section .vectors
.align 2
.globl __Vectors
.globl __Vectors_End
.globl __Vectors_Size
Like before, the syntax is unified but the architecture is armv8-m.main which may have same result since cortex-m33 has this architecture. Then, it defines the vector table section called .vectors
and aligns it to 2 bytes.
__Vectors:
.long __INITIAL_SP
.long Reset_Handler
.long NMI_Handler
.long HardFault_Handler
.long MemManage_Handler
.long BusFault_Handler
.long UsageFault_Handler
.long SecureFault_Handler
.long 0
.long 0
.long 0
.long SVC_Handler
.long DebugMon_Handler
.long 0
.long PendSV_Handler
.long SysTick_Handler
.long Interrupt0_Handler
.long Interrupt1_Handler
.long Interrupt2_Handler
.long Interrupt3_Handler
.long Interrupt4_Handler
.long Interrupt5_Handler
.long Interrupt6_Handler
.long Interrupt7_Handler
.long Interrupt8_Handler
.long Interrupt9_Handler
.space (470 * 4)
__Vectors_End:
.equ __Vectors_Size, __Vectors_End - __Vectors
.size __Vectors, . - __Vectors
This is also similar but the table is allocated fully with .space
directive. The size is set correctly.
.thumb_func
.type Reset_Handler, %function
.globl Reset_Handler
.fnstart
Reset_Handler:
No separate section is defined and .thumb_func
directive is used. This is to generate correct code to use Arm and Thumb at the same time, but I guess it is unnecessary since Cortex-M33 supports only thumb. .thumb
would be enough as in STM32CubeH5 startup code. .fnstart
(and .fnend
at the end of the function) is used to generate an unwind table entry. If stack unwinding is not used, this is also not necessary.
ldr r0, =__INITIAL_SP
msr psp, r0
This is I think a major difference. STM32CubeH5 startup code was loading the initial stack value to sp
(which is msp
) which is unnecessary because reset already does this. Here, not msp
but psp
is loaded with the same initial stack value. It is a bit strange the same value is used but at least this is not a redundant statement.
ldr r0, =__STACK_LIMIT
msr msplim, r0
msr psplim, r0
I think this is also important and major difference. Both stack limit registers, MSPLIM
and PSPLIM
are also loaded.
Both __INITIAL_SP
and __STACK_LIMIT
are definitions and the values are set by the linker.
bl SystemInit
calls SystemInit as in STM32CubeH5.
ldr r4, =__copy_table_start__
ldr r5, =__copy_table_end__
.L_loop0:
cmp r4, r5
bge .L_loop0_done
ldr r1, [r4]
ldr r2, [r4, #4]
ldr r3, [r4, #8]
lsls r3, r3, #2
.L_loop0_0:
subs r3, #4
ittt ge
ldrge r0, [r1, r3]
strge r0, [r2, r3]
bge .L_loop0_0
adds r4, #12
b .L_loop0
.L_loop0_done:
ldr r3, =__zero_table_start__
ldr r4, =__zero_table_end__
.L_loop2:
cmp r3, r4
bge .L_loop2_done
ldr r1, [r3]
ldr r2, [r3, #4]
lsls r2, r2, #2
movs r0, 0
.L_loop2_0:
subs r2, #4
itt ge
strge r0, [r1, r2]
bge .L_loop2_0
adds r3, #8
b .L_loop2
.L_loop2_done:
the implementation is slightly different but effectively it copies the initialization data to RAM and zeroes the uninitialized section as in STM32CubeH5 startup code.
bl _start
and calls _start
. That is quite different. It does not call __libc_init_array
and then main
but calls default C startup function _start
.
.fnend
.size Reset_Handler, . - Reset_Handler
marks the end of the function with .fnend
and sets the size of the function.
.thumb_func
.type HardFault_Handler, %function
.weak HardFault_Handler
.fnstart
HardFault_Handler:
b .
.fnend
.size HardFault_Handler, . - HardFault_Handler
.thumb_func
.type Default_Handler, %function
.weak Default_Handler
.fnstart
Default_Handler:
b .
.fnend
.size Default_Handler, . - Default_Handler
similar to Reset_Handler
, HardFault_Handler
and Default_Handler
is defined. An interesting thing is although it is a weak definition, HardFault_Handler
is defined in the startup code here. There was no such definition in STM32CubeH5 startup code. I find defining the HardFault_Handler
in startup code more correct because the reset sequence assumes there is an operating HardFault_Handler
. Effectively it does not matter if a default handler is used but it is just looking better to me.
.macro Set_Default_Handler Handler_Name
.weak \Handler_Name
.set \Handler_Name, Default_Handler
.endm
CMSIS startup code makes a clever decision of creating a macro to not repeat these two lines so many times.
Set_Default_Handler NMI_Handler
Set_Default_Handler MemManage_Handler
Set_Default_Handler BusFault_Handler
Set_Default_Handler UsageFault_Handler
Set_Default_Handler SecureFault_Handler
Set_Default_Handler SVC_Handler
Set_Default_Handler DebugMon_Handler
Set_Default_Handler PendSV_Handler
Set_Default_Handler SysTick_Handler
Set_Default_Handler Interrupt0_Handler
Set_Default_Handler Interrupt1_Handler
Set_Default_Handler Interrupt2_Handler
Set_Default_Handler Interrupt3_Handler
Set_Default_Handler Interrupt4_Handler
Set_Default_Handler Interrupt5_Handler
Set_Default_Handler Interrupt6_Handler
Set_Default_Handler Interrupt7_Handler
Set_Default_Handler Interrupt8_Handler
Set_Default_Handler Interrupt9_Handler
.end
as before, by using the macro just defined, it sets a default for all handlers, naturally except HardFault_Handler
since it is already defined.
CMSIS Startup Code in C
The startup code explained below is also reformatted and parts related to TrustZone are omitted.
C startup code starts with:
extern uint32_t __INITIAL_SP;
extern uint32_t __STACK_LIMIT;
these are defined as __StackTop
and __StackLimit
, which are set by the linker.
extern __NO_RETURN void __PROGRAM_START(void);
__PROGRAM_START
is __main
for gcc. I believe the reason for abstracting these are to support different compilers.
__NO_RETURN
is __declspec(noreturn)
, marking the function as it will not return. I think it is only used for optimizations.
__NO_RETURN void Reset_Handler (void);
void Default_Handler(void);
as in the assembly startup file, these are the Reset and Default handlers.
void HardFault_Handler (void) __attribute__ ((weak));
void SysTick_Handler (void) __attribute__ ((weak, alias("Default_Handler")));
void Interrupt0_Handler (void) __attribute__ ((weak, alias("Default_Handler")));
then the exception handlers are defined and they are marked as weak and an alias to Default_Handler
is defined.
extern const VECTOR_TABLE_Type __VECTOR_TABLE[496];
const VECTOR_TABLE_Type __VECTOR_TABLE[496] __VECTOR_TABLE_ATTRIBUTE = {
(VECTOR_TABLE_Type)(&__INITIAL_SP), /* Initial Stack Pointer */
Reset_Handler, /* Reset Handler */
...
SecureFault_Handler, /* -9 Secure Fault Handler */
0, /* Reserved */
...
SysTick_Handler, /* -1 SysTick Handler */
/* Interrupts */
Interrupt0_Handler, /* 0 Interrupt 0 */
...
};
(parts with … are truncated, it contains similar entries)
this defines the vector table similar to assembly startup code including the reserved entries with 0. The important thing here is __VECTOR_TABLE_ATTRIBUTE
which is defined as __attribute__((used, section(".vectors")))
, to put this table to section .vectors
, which is, I think, used by the linker.
__NO_RETURN void Reset_Handler(void)
{
__set_PSP((uint32_t)(&__INITIAL_SP));
__set_MSPLIM((uint32_t)(&__STACK_LIMIT));
__set_PSPLIM((uint32_t)(&__STACK_LIMIT));
SystemInit(); /* CMSIS System Initialization */
__PROGRAM_START(); /* Enter PreMain (C library entry point) */
}
Reset handler, similar to the assembly startup code, sets PSP, MSPLIM and PSPLIM, and then calls SystemInit. Differently, reset handler does not perform anything for the memory initialization, this is delegated to __main
in C runtime. This is I think the most important difference than the assembly startup file.
The functions starting with __set
are defined in different places in CMSIS, some are compiler independent (like __set_PSP
), others (like __set_MSPLIM
) are compiler dependent because they use inline assembly.
void HardFault_Handler(void)
{
while(1);
}
HardFault is just an infinite loop.
void Default_Handler(void)
{
while(1);
}
Default handler is also just an infinite loop.
Summary
After reset, the processor assumes there is a valid vector table containing the initial value of stack pointer and the location of reset handler in the first two entries. Then, the reset handler is called and it takes over the startup process.
The startup code has a few basic responsibilities:
- define the interrupt vector table
- provide a Reset handler implementation (and probably also a HardFault handler implementation because it is activated by the processor after reset)
- perform or delegate low level hardware initialization (power, clock etc.)
- perform or delegate memory initialization for initialized (data) and uninitialized (bss) data
All three startup files mentioned above delegates the low level hardware initialization to a function called void SystemInit(void)
which is called before memory initialization.
The CMSIS C startup file has almost the same structure and functionality as the CMSIS assembly startup file, but the memory initialization is delegated to the __main
in C runtime. The startup file is a C file but it has to use non-standard features like attributes and inline assembly as there is no standard way to do these things. I am not very convinced the use of C for the startup code yet.
I find the following meaningful (correct, more correct or looks better) in CMSIS assembly startup file comparing to STM32CubeH5 startup file:
- the size of the vector table is set correctly
- not necessary and consuming space but allocating the whole vector table might be a good idea
- using stack limit register
MSPLIM
- defining the HardFault handler in the startup code
- using a macro when defining the handlers
and these are not necessary, not necessarily correct or not a good idea for me:
- no need to use
.thumb_func
- no need to use
.fnstart
and.fnend
since stack unwinding is not used - I would not set
PSP
andPSPLIM
if I am not going to usePSP
In STM32CubeH5 startup file:
- it is unnecessary to load
sp
at the beginning - the vector table size is not properly set (always 0)
Also, CMSIS assembly code aligns the vector table explicitly, whereas STM32CubeH5 does not. The alignment can be further modified by the linker and I think it makes more sense to use the linker for this.
It is not very difficult to combine the best parts of each startup file which I will do in a later post. The most critical issue is I guess whether a C runtime initialization is required and if so which C runtime initialization function to call, __libc_init_array
, _start
or __main
.
References
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.