Bare Metal Raspberry Pi 3B+: Programming

May 13, 2019

Introduction

We will write a few bare-metal RPi programs in this post. I will use both Network Boot and JTAG in this post, so I recommended you to check the previous posts if you did not already.

Prerequisites

  • Raspberry Pi 3 Model B+
  • A linux computer, I am using my desktop computer running Ubuntu 18.04
  • USB Console Cable, between RPi and the computer
  • JTAG-USB Cable, between RPi and the computer

AArch64 Compiler

I am using AArch64 ELF bare-metal target (aarch64-elf) GNU Toolchain.

Example 1 - infinite loop

https://github.com/metebalci/baremetal-rpi/tree/master/01-infinite-loop

The simplest example can be an infinite loop, and it can teach us the basics, which I summarized in the previous JTAG post.

Normally, bare-metal examples always use a linker script or memory map, which is unnecessary for the simplest example. What we need is only an assembled binary file. Since the compiler (and assembler included) only produces ELF output (object or executable), we have to use objcopy to extract pure assembled binary/machine code. You can also just write the machine code in bytes to a file, it is just a single instruction, 4 bytes.

infloop is:

l: b l

l above is label. b is branch instruction. This is encoded as a relative branch.

The Makefile is like this:

APP = infloop
all:
	rm -rf $(APP).elf $(APP).bin
	aarch64-none-elf-as -o $(APP).elf $(APP).s
	aarch64-none-elf-objcopy -O binary $(APP).elf $(APP).bin

and it produces an ELF object, then objcopy extracts the code from ELF object into a pure binary file.

I think objdump cannot disassemble a binary file, so you can use my capstone based simple a64disassembler, a64dis.

$ a64dis infloop.bin

0x80000:	b		#0x80000

infloop.bin does not know actually 0x80000, if we look at the a64dis.c, it is something I defined, since kernel8.img is loaded into 0x80000 by default by the Raspberry Pi 3B+ firmware.

Example 2 - counter

https://github.com/metebalci/baremetal-rpi/tree/master/02-counter

This example is a simple counter, running on the first 64-bit register, x0, starting from 0, incrementing by 1 continuously.

mov x0, xzr
l:
add x0, x0, 1
b l

Instead of rebooting the board, we can use the load_image command in openocd, to load this binary (counter.bin) into the memory of the board, and then jump there. Very convenient.

I am not showing the openocd command to connect to the board through JTAG in this post, I will only show the telnet commands.

$ telnet localhost 4444

> help load_image
load_image filename address ['bin'|'ihex'|'elf'|'s19'] [min_address] [max_length]

> load_image 02-counter/counter.bin 0x81000 bin
12 bytes written at address 0x00081000
downloaded 12 bytes in 0.002915s (4.020 KiB/s)

The path is relative to where we started openocd.

When we resume execution from 0x81000, x0 will be reset first, then start increasing very rapidly.

> resume 0x81000
> halt
target halted in AArch64 state due to debug-request, current mode: EL2H
cpsr: 0x000003c9 pc: 0x81004
MMU: disabled, D-Cache: disabled, I-Cache: disabled
> reg x0
x0 (/64): 0x0000000002AFB8D6

We need to halt the core in order to access registers, otherwise we will not read the correct x0 value.

Other option is to use step command.

> step 0x81000
target halted in AArch64 state due to single-step, current mode: EL2H
cpsr: 0x000003c9 pc: 0x81004
MMU: disabled, D-Cache: disabled, I-Cache: disabled
> reg x0
x0 (/64): 0x0000000000000000
> step
target halted ...
> reg x0
x0 (/64): 0x0000000000000001
> step
target halted ...
> reg x0
x0 (/64): 0x0000000000000001
> step
target halted ...
> reg x0
x0 (/64): 0x0000000000000002
> step
target halted ...
> reg x0
x0 (/64): 0x0000000000000002
> step
target halted ...
> reg x0
x0 (/64): 0x0000000000000003

So it does what it exactly is, step through execution of code instruction by instruction.

Example 3 - cryptography extension

I wanted to make a simple example of using sha instructions of ARM core. However, the processor (BCM2837) of RPi 3B+ does not support sha1, sha2 and aes instructions. How can we check this ?

If we look to the ARM Cortex-A53 Cryptography Extension Technical Reference Manual, it says we can check ID_AA64ISAR0_EL1 system register.

aarch64 mrs command below is not currently available in openocd distribution, I provided this patch, and if it is merged, you can see it in the next releases.

> aarch64 mrs ID_AA64ISAR0_EL1

ID_AA64ISAR0_EL1 [10_000_0000_0100_000]: 0x0000000000010000
.RNDR =0x0 =0b0000
.TLB =0x0 =0b0000
.TS =0x0 =0b0000
.FHM =0x0 =0b0000
.DP =0x0 =0b0000
.SM4 =0x0 =0b0000
.SM3 =0x0 =0b0000
.SHA3 =0x0 =0b0000
.RDM =0x0 =0b0000
.Atomic =0x0 =0b0000
.CRC32 =0x1 =0b0001
.SHA2 =0x0 =0b0000
.SHA1 =0x0 =0b0000
.AES =0x0 =0b0000

SHA2, SHA1 and AES fields are 0x0, meaning they are not implemented. CRC32, on the other hand, is implemented.

Example 4 - uart - hello world

https://github.com/metebalci/baremetal-rpi/tree/master/04-uart-hello-world

The last example is to print something to console, through uart, the serial console cable that we connected from RPi to desktop.

First, we need to setup the UART and when needed write a byte(char) to it.

If we look at the BCM2837 Peripherals document, the I/O Base is at 0x3F000000 at physical address space and the MMU is not setup yet, so we can use that.

At this point, it is becoming complicated to write programs in assembly, so it is better we move to C now. However, we cannot use normal C, because there is no operating system, so no standard C library. The compiler CLFAGS and linker LDFLAGS to achieve this is:

CFLAGS = -Wall -O2 -ffreestanding
LDFLAGS = -nostdlib -T link.ld

With -ffreestanding only float.h, limits.h, stdarg.h, stddef.h, iso646.h, stdbool.h, stdint.h, stdalign.h and stdnoreturn.h can be used. -nostdlib does what it says, assumes no standard libraries.

link.ld contains:

ENTRY(main)
  
SECTIONS
{
        . = 0x8000;

        __start = .;

        __text_start = .;
        .text :
        {
                *(.text.startup)
                *(.text)
        }
        . = ALIGN(4096);
        __text_end = .;

        __rodata_start = .;
        .rodata :
        {
                *(.rodata)
        }
        . = ALIGN(4096);
        __rodata_end = .;

        __data_start = .;
        .data :
        {
                *(.data)
        }
        . = ALIGN(4096);
        __data_end = .;

        __bss_start = .;
        .bss :
        {
                bss = .;
                *(.bss)
        }
        . = ALIGN(4096);
        __bss_end = .;

        __end = .;

        /DISCARD/ :
        {
                *(.comment)
        }
}

This script basically puts the code in .text.startup section (which contains the main method) to 0x80000, and then puts all other sections one after another, and discards the comment section.

The source code simply first sets up the UART (and GPIO pins as UART function) and then sends ‘Hello World!’ char by char. The output can be seen on the desktop through USB-Console cable.

Final Notes

With this setup, anything on bare metal can be easily programmed. We can reboot the Raspberry Pi through network with ease, we can use JTAG for hardware debugging and loading of images without rebooting, and we can use AArch64 Toolchain to build programs that targets bare metal execution.