What is GUID Partition Table, GPT ? I will demonstrate it on Linux.
Disclaimer: I am not an expert on this field, this is a basic overview not a comprehensive one and there can be mistakes.
Warning: There are commands used in this article that may harm your computer pretty bad if used improperly (e.g. if you write, instead of read, to the LBA 0), and you may lose data. If you would like to try these on your own, be careful to not write anything to the disk. The best is to install a fresh Linux on a VM and use that for experimenting.
How does a computer boot ? What happens until you see the login prompt on Linux ? I am sure many, including myself, does not know all the steps. A firmware known as or compliant to BIOS (Basic Input/Output System) before (2000s), and its successor UEFI (Unified Extensible Firmware Interface) now (after 2000s), loads the OS (Operating System). For this, it needs to know where the first executable code is on the storage media. Exactly at this point, the boot record and/or partition table enters into the picture and that is also why GPT (GUID Partition Table) is described in the UEFI specification.
If you are older than 30, like me, you probably remember BIOS and MBR (Master Boot Record) terms. I have learned all about computers when BIOS and MBR was in use, and then they are replaced by UEFI and GPT, and I knew nothing about them for a long time. If you are in the same position or younger and did not learn about these before, this article will help you.
In order to load the OS, the system files of OS has to be found, accessed, read and execution should be transferred to OS. It is becoming like a chicken and egg scenario, you need to load a file but file is in a file system which is in a storage media, both of which requires a driver to be loaded, and driver is also a file.
The solution is to define something fixed so firmware can depend on that without thinking about individual files. The fixed concept is start booting by loading data from the very start of the storage. This is same both for BIOS and UEFI.
Probably not very well known, it would be good to say, in order to do things with the storage hardware, you send commands to it and it replies. So you can send a read or write command, together with its location, and it just does that - in a very complicated way -. There are of course many other command types other than read and write. These are either ATA/ATAPI Commands for SATA drives (specified in ATA/ATAPI specs) or NVM Commands for NVM Express drives (specified in NVM Express spec).
We still need to know the size or length of data to load. Here some hardware terms related to storage media enters into the picture. In the past (before 2000s), storage layout was according to C/H/S (Cylinder / Head / Sector). Basically you had to specify 3 numbers (C,H,S) to read from or write to something to storage. These were first corresponding the actual physical structure of storage (e.g. floppy and hard drives) then after embedding controllers (after 80s) to the storage devices, they were logical numbers but being translated to physical numbers by the storage device before doing the actual work. Coming back to our question about the length of data to be read, it is the size of one sector. So BIOS had to load 0/0/1 (Cylinder=0, Head=0, Sector=1). Cylinder and Head numbers start from 0, Sector number starts from 1. The actual size (in bytes) of a sector depends on the storage media, it can be queried from the hardware (e.g. ATA Command Identify Device). Sector size of 512 bytes is a common value.
Instead of C/H/S, another system is developed after 90s, and basically this is the system in use today in all computers, and it is called Logical Block Addressing (LBA). LBA is simpler, it is basically used with a single number, e.g. the start of storage media is LBA 0, then it goes like LBA 1, 2. Each individual number is said to address a Logical Block, whose size can also be queried from the hardware. 512 bytes is also a common Logical Block size.
You may wonder where LBA ends or what is the maximum it can support. Initially specified in IDE spec, it was a 22 bits number, so maximum would be 2GB which is very small in today’s standard. In ATA-1 spec, it is increased to 28 bits (max. 128 GB) then with ATA-6 spec, it is increased to 48 bits (max. 128 PB).
128 PB is possible only with GPT because it allows 64-bit LBA values, whereas MBR allows only 32-bit. So 2TB is the maximum partition size for MBR.
All these maximum values are of course assuming a sector / logical block size of 512 bytes. Now there are hard drives supporting 4K also, which is called Advanced Format.
An NVM device (not all devices but if supported) may have variable sector sizes / different LBA formats. This is specified when NVM device is formatted and it can be queried what LBA formats NVM device supports.
Since I have an NVMe SSD device, I realized it is working quite different than traditional SATA drives. NVM express is a different standard, so it is not using same commands or have same terminology. It requires a longer explanation but very briefly: (You need to install nvme-cli package for nvme commands)
$ sudo nvme get-ns-id /dev/nvme0n1 nvme0n1: namespace-id:1
/dev/nvme0n1 uses namespace id=1
$ sudo nvme id-ns /dev/nvme0n1 -n 1 NVME Identify Namespace 1: nsze : 0x3b9e12b0 ncap : 0x3b9e12b0 nuse : 0x12d5b870 nsfeat : 0 nlbaf : 0 flbas : 0 mc : 0 dpc : 0 dps : 0 nmic : 0 rescap : 0 fpi : 0 nawun : 0 nawupf : 0 nacwu : 0 nabsn : 0 nabo : 0 nabspf : 0 nvmcap : 0 nguid : 00000000000000000000000000000000 eui64 : 0025386b610046b6 lbaf 0 : ms:0 ds:9 rp:0 (in use) nlbaf=0 number of supported LBA formats=0 meaning only LBA Format 0 is supported. flbas=0 formatted LBA size last 3 bits indicate LBA Format, which is 0 here. at the bottom, lbaf=0 LBA Format 0 next to lbaf=0, ds=9 (LBADS in NVM Spec) LBA Data Size (2^ds) so here, 2^9=512, that is the value we are looking for. So to sum up, this SSD is formatted with LBA Format 0 (and also supports only LBA Format 0), and LBA Format 0 has Logical Block size of 512 bytes.
Lets summarize now. In order to load the OS, we probably first need to query the storage media, learn about its parameters (maybe not needed now but will be needed anyway), and then request a read of LBA 0 from the storage media.
The data in LBA 0 is MBR and it can be:
- Legacy MBR or
- Protective MBR
The difference between two is what partitions they contain.
- Legacy MBR contains max. 4 (primary/extended) partitions, if one of them is marked as an EFI System Partition (partition type 0xEF), you can think of this as bootable flag, it is loaded by the UEFI firmware. Legacy MBR may also not contain any EFI System Partition and can be a currently obsolete pure MBR system. In that case there is a boot code in the beginning of MBR, that is loaded and executed. That is what BIOS was doing.
- Protective MBR contains (probably) only 1 partition, and it spans all the storage media and its type is GPT Protective type 0xEE. I believe this is the current default in modern OSes. You will see an example of this in this article.
Now lets continue with real examples. All examples below is from a Linux desktop computer.
First question, what are my storage devices ?
$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 477G 0 disk ├─nvme0n1p3 259:3 0 31.8G 0 part [SWAP] ├─nvme0n1p1 259:1 0 512M 0 part /boot/efi └─nvme0n1p2 259:2 0 444.7G 0 part /
nvme0n1 is the physical device (TYPE=disk), others are the partitions on it (TYPE=part). It starts with nvme because this is an NVMe SSD.
Lets also see the information from fdisk:
$ sudo fdisk /dev/nvme0n1 Welcome to fdisk (util-linux 2.27.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): p Disk /dev/nvme0n1: 477 GiB, 512110190592 bytes, 1000215216 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: D199E4F8-1D1C-4A56-BE5D-B279BB54EFA8 Device Start End Sectors Size Type /dev/nvme0n1p1 2048 1050623 1048576 512M EFI System /dev/nvme0n1p2 1050624 933529599 932478976 444.7G Linux filesystem /dev/nvme0n1p3 933529600 1000214527 66684928 31.8G Linux swap
- Disk (nvme0n1) is 512110190592 bytes and 1000215216 sectors.
- Each sector / logical block is 512 bytes.
- There is GPT on it (Disklabel type: gpt). I am actually not sure from where fdisk derives this information, there is no single place it is written. I think fdisk derives this because there is a Protective MBR on this drive and then GPT data follows.
- There are 3 partitions, starting from sector 2048 to sector 1000214527.
According to GPT spec.:
- LBA Block 0 is either Legacy MBR or Protective MBR
- LBA Block 1 is Primary GPT Header
- Last LBA Block is Backup GPT Header (a copy of Primary GPT Header)
Having a backup copy of GPT Header and also GPT Partition Entries is a feature of GPT. MBR only exists on LBA 0.
Lets look into these. First, lets dump their contents:
sudo dd if=/dev/nvme0n1 bs=512 count=1 skip=0 status=none > lba.0 sudo dd if=/dev/nvme0n1 bs=512 count=1 skip=1 status=none > lba.1
If you have NVMe device, it is also possible to read data directly with nvme-cli tools:
$ sudo nvme read /dev/nvme0n1 --start-block=0 --block-count=0 --data-size=512 --data=lba.0
This submits a Read command starting from LBA 0 for 1 block to NVMe device, and writes the data to lba.0 file. It is strange but block count is a 0 based value, so 0 means read one block.
If you provide block count 1 (so 2 blocks) and data size 512 (which is incorrect, it has to be 1024), nvme app from nvme-cli package in Ubuntu 16.04 crashes. The problem is fixed on the latest one on GitHub.
Looking into contents of lba.0, the MBR:
$ cat lba.0 | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-pmbr.clj Boot Code: 0x0 [len=440] Unique MBR Disk Signature: 0x5d3f5955 [len=4] Unknown: 0x0 [len=2] Partition Record: Boot Indicator: 0x0 [len=1] StartingCHS: 0x100 [len=3] OSType: 0xee [len=1] [GPT Protective] EndingCHS: 0xfeffff [len=3] StartingLBA: 1 SizeInLBA: 1000215215 Boot Indicator: 0x0 [len=1] StartingCHS: 0x0 [len=3] OSType: 0x0 [len=1] [Unknown Type] EndingCHS: 0x0 [len=3] StartingLBA: 0 SizeInLBA: 0 Boot Indicator: 0x0 [len=1] StartingCHS: 0x0 [len=3] OSType: 0x0 [len=1] [Unknown Type] EndingCHS: 0x0 [len=3] StartingLBA: 0 SizeInLBA: 0 Boot Indicator: 0x0 [len=1] StartingCHS: 0x0 [len=3] OSType: 0x0 [len=1] [Unknown Type] EndingCHS: 0x0 [len=3] StartingLBA: 0 SizeInLBA: 0 Signature: 0xaa55 [len=2] Remaning of Block Reserved as Zero: 0x0 [len=0]
I could not find a tool that parses the raw MBR and shows the raw information, so I wrote the one above looking at UEFI Spec (which also shows MBR format).
This is a Protective MBR, as it contains a single partition, starting from LBA 1 ending at LBA 1000215215 which is the entire drive and marked as GPT Protective (OSType=0xEE).
Looking into contents of lba.1, which contains the Primary GPT Header:
$ cat lba.1 | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-gpt-header.clj Signature: 0x4546492050415254 [len=8] [EFI PART] Revision: 0x100 [len=4] [1.0] HeaderSize: 92 HeaderCRC32: 0x711f61b9 [len=4] Reserved as Zero: 0x0 [len=4] MyLBA: 1 AlternateLBA: 1000215215 FirstUsableLBA: 34 LastUsableLBA: 1000215182 DiskGUID: d199e4f8-1d1c-4a56-be5d-b279bb54efa8 PartitionEntryLBA: 2 NumberOfPartitionEntries: 128 SizeOfPartitionEntry: 128 PartitionEntryArrayCRC32: 0x652340cf [len=4] Remaning Block Reserved as 0x0 [len=420]
I also could not find a tool that parses the raw GPT structures and shows the raw information, so I wrote the one above looking at UEFI Spec.
The header says:
- This header is on LBA 1 (MyLBA)
- Its copy is on LBA 1000215215 (AlternateLBA)
- The first usable LBA is 34 (FirstUsableLBA)
- The last usable LBA is 1000215182 (LastUsableLBA)
- Partition Entries start on LBA 2 (PartitionEntryLBA)
- There are 128 Partition Entries (NumberOfPartitionEntries), each having size of 128 bytes (SizeOfPartitionEntry)
The reason of first usable LBA being 34 is simple. LBA 0 is Protective MBR (PMBR), LBA 1 is GPT Header and the required space for partition entries are:
128 partition * 128 bytes/partition / 512 bytes/sector = 32 sectors
So 1 + 1 + 32 = 34 sectors are needed to store all GPT information, so first usable LBA can be minimum 34. As you might realize, these numbers change if the logical sector size is not 512 bytes.
A logical sector size of 4K is possible now and it is called Advanced Format. For example you can see at the [WD Red Hard Drive Specs](http://products.wdc.com/library/SpecSheet/ENG/2879-800002.pdf), Advanced Format is supported. As an example, lets calculate above for a logical block size of 4K. So LBA 0 is MBR. LBA 1 is GPT Header. 128 partition * 128 bytes/partition / 4096 bytes/block = 4 blocks So required blocks are 1+1+4 = 6. So in this case FirstUsableLBA would be 6 instead of 34.
Lets dump also the backup copy of GPT header at the last block:
$ sudo dd if=/dev/nvme0n1 bs=512 count=1 skip=1000215215 status=none > lba.last
Looking into contents of lba.last:
$ cat lba.last | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-gpt-header.clj Signature: 0x4546492050415254 [len=8] [EFI PART] Revision: 0x100 [len=4] [1.0] HeaderSize: 92 HeaderCRC32: 0x2bec605e [len=4] Reserved as 0x0 [len=4] MyLBA: 1000215215 AlternateLBA: 1 FirstUsableLBA: 34 LastUsableLBA: 1000215182 DiskGUID: d199e4f8-1d1c-4a56-be5d-b279bb54efa8 PartitionEntryLBA: 1000215183 NumberOfPartitionEntries: 128 SizeOfPartitionEntry: 128 PartitionEntryArrayCRC32: 0x652340cf [len=4] Remaning Block Reserved as 0x0 [len=420]
It is basically same, but of course MyLBA and AlternateLBA is reversed and HeaderCRC32 is different. Not only the header but also the partition entries are backed up like this.
As you see, there is also CRC32 error detecting codes on GPT data. This is also a feature of GPT, MBR has no error detection mechanism.
The figure above from the original UEFI Spec. shows the structure quite nice. So basically the usable space for OS is between the first usable block to last usable block.
Now lets look at the GPT Partition Table entries. PartitionEntryLBA is 2, so first partition table is at LBA 2:
$ sudo dd if=/dev/nvme0n1 bs=512 count=1 skip=2 status=none > lba.2
Now lets look at the first partition entry:
$ dd if=lba.2 bs=128 count=1 skip=0 status=none | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-gpt-partition-entry.clj PartitionTypeGUID: c12a7328-f81f-11d2-ba4b-00a0c93ec93b [Type=EFI System Partition] UniquePartitionGUID: c08a4141-302f-4e69-8af1-c0c83cc3ab9e StartingLBA: 2048 EndingLBA: 1050623 Attributes: PartitionName: 0x4500460049002000530079007300.. [EFI System Partition] Remaning Partition Entry Reserved as Zero: 0x0
We are taking the first 128 bytes of LBA 2. The second and third partition entries are also as expected:
$ dd if=lba.2 bs=128 count=1 skip=1 status=none | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-gpt-partition-entry.clj PartitionTypeGUID: 0fc63daf-8483-4772-8e79-3d69d8477de4 UniquePartitionGUID: c02b77ca-9ac3-4ad7-827e-dd51b91f34c2 StartingLBA: 1050624 EndingLBA: 933529599 Attributes: PartitionName: 0x0  Remaning Partition Entry Reserved as Zero: 0x0 $ dd if=lba.2 bs=128 count=1 skip=2 status=none | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-gpt-partition-entry.clj PartitionTypeGUID: 0657fd6d-a4ab-43c4-84e5-0933c84b4f4f UniquePartitionGUID: bff8c1c6-7ba6-4328-b1eb-40f3aaecf1a3 StartingLBA: 933529600 EndingLBA: 1000214527 Attributes: PartitionName: 0x0  Remaning Partition Entry Reserved as Zero: 0x0
These fits to fdisk output as expected. If we look at the 4th entry:
$ dd if=lba.2 bs=128 count=1 skip=3 status=none | lein run -m clojure.main/main -- src/antelabs/ext4/tools/dump-gpt-partition-entry.clj PartitionTypeGUID: 00000000-0000-0000-0000-000000000000 [Type=Unused Entry] UniquePartitionGUID: 00000000-0000-0000-0000-000000000000 StartingLBA: 0 EndingLBA: 0 Attributes: PartitionName: 0x0  Remaning Partition Entry Reserved as Zero: 0x0
The GUID with all zeroes means this partition entry is not used. It is same for all other (128–4=124) partitions. The space for partitions are preallocated (max. 128 partitions) and in this case only 3 is used.
Now you understand why it is called GUID Partition Table, as partitions are identified by a GUID. Also, each partition has a name defined in the partition table. These are features of GPT and does not exist in MBR.
To combine this with Protective MBR, we have this structure, again shown nicely from the original UEFI Spec:
ou may ask what happens if the storage is larger than what MBR can address. In this case GPT Protective partition covers up to maximum size that can be addressed, but GPT Partition Entries, as expected, can use the whole storage.
You may also ask why the first partition starts from LBA 2048, not from LBA 34 since it is the first usable LBA as indicated in GPT. I think the reason is to align partition boundaries. Quoting directly from GPT spec:
“To avoid the need to determine the physical block size and the optimal transfer length granularity, software may align GPT partitions at significantly larger boundaries. For example, assuming logical block 0 is aligned, it may use LBAs that are multiples of 2,048 to align to 1,048,576 byte (1 MiB) boundaries, which supports most common physical block sizes and RAID stripe sizes.”
fdisk on Linux is updated at version 2.17.1 to use 1MB partition boundaries. So by default it starts from sector 2048 (512 bytes * 2048 = 1MB) and any partition if sized in multiples of 1MB is aligned to 1MB boundaries.
I plan to write about ext4, NVMe or UEFI next.