A Quick Tour of NVM Express (NVMe)
Aug 24, 2017
19 minute read

Introduction

I will very briefly describe NVM Express (NVMe) with examples in Linux.

Disclaimer: I am not an expert on this field and this is a basic overview not a comprehensive one. If you try the commands used in this article on your own, please be extra careful to not write anything back to storage.

History

If you had a PC in 90s, you will remember the hard drive was connected to the mainboard through a wide (40 or 80 wires) Parallel ATA (PATA) cable based on IDE (Integrated Drive Electronics) and ATA (AT Attachment, AT is short for Advanced Technology from IBM PC/AT era.) standards. In a few years, to support devices like CD-ROM on the same interface, ATAPI (ATA Packet Interface) standard is developed next to ATA. The ATA standard is evolved including Enhanced IDE (EIDE), Ultra DMA (UDMA) and Ultra ATA in the years followed.

After 2000, Serial ATA (SATA) is developed and replaced PATA. SATA uses a smaller cable (7 wires only) and have faster data transfer speeds. Meanwhile, AHCI standard is developed to communicate with SATA controllers.

Form Factor vs. Interface vs. Protocol

Form Factor means the shape and the size of a device. Common form factors for storage devices are:

  • 2.5” or 3.5” Drive (defined in SFF standards)
  • M.2 and PCI Express (PCIe) (defined in PCI-SIG standards)

Interface means how the device communicates with the computer. Common interfaces for storage devices are:

  • SATA interface, used by 2.5” and 3.5” hard drives and some M.2 attached devices.
  • PCI Express (PCIe) interface, used by M.2 attached and PCIe devices.
  • SAS (Serial Attached SCSI) and FC (Fibre Channel) interfaces, which are used exclusively in servers and data centers.

PCIe is much faster than SATA. SATA III maximum speed is 6 Gb/s whereas M.2 connector using 4 PCIe v3 lane has a maximum speed of almost 4GB/s = 32 Gb/s.

Protocol specifies how to manage and transfer data from/to device. Common protocols are:

  • AHCI and ATA for SATA interface. AHCI is a protocol to support SATA additional features on the controller.
  • NVMe for PCI Express interface.

In order to understand better, we need to make an explicit distinction between the controller and the storage device. The storage device is the one actually keeps data, however, software does not communicate with the storage device directly. It communicates with the controller. So in SATA case, a storage device may use ATA commands, but controller is used with AHCI. In PCMe case, on the other hand, NVMe specifies both.

So possible and common combinations are:

  • A 2.5” or 3.5” hard drive, connected to SATA port, using SATA interface and communicates with AHCI/ATA. These are the traditional rotating/magnetic hard drives.
  • A 2.5” SSD (Solid State Drive), connected to SATA port, using AHCI/ATA, just like a hard drive.
  • An M.2 SSD, connected to M.2 port, using SATA interface and communicates with AHCI/ATA.
  • An M.2 SSD, connected to M.2 port, using PCIe interface and communicates with NVMe.
  • A PCIe SSD device, connected to PCIe slot, using PCIe interface and communicates with NVMe.

In this article, I will give examples using an M.2 SSD using PCIe interface.

NVM Express (NVMe) Overview

NVM Express Logo

NVMe is developed because AHCI and ATA/ATAPI standard was developed for traditional rotating/magnetic hard drives which are inherently slow due their physical construction comparing to Non-Volatile Memory (NVM) devices such as SSD (Solid State Drive).

The main advantage of NVMe over AHCI/ATA is that NVMe have many more command queues (max. 64K) comparing to only one in AHCI. So parallel/concurrent I/O is possible. Also each command queue can be very deep (max. 64K) comparing to AHCI (only 32).

So with multiple CPU cores, when you have an AHCI/ATA device, each core submits ATA commands (e.g. read/write) to a single queue and there is a need for synchronization and locking when submitting commands. With NVMe, each core can have its own queue, and there is no need for synchronization when submitting commands.

There are actually two type of queues in NVMe, one for submission and the other is for completion. Submission queues may also share same completion queue, so there does not need to be a 1:1 correspondence. Queues reside in memory and each submission queue entry, a command, is normally 64-bytes. A completed command is identified by its submission queue id and its command id when submitted.

There are two types of commands in NVMe:

  • Admin Commands are sent to Admin Submission and Completion Queue (there is only one of this pair with identifier=0).
  • I/O Commands (called NVM Commands) are sent to I/O Submission and Completion Queues. I/O Command Queues has to be explicitly managed (created / deleted etc.).

Recap: An Admin or NVMe command is submitted to a Submission Queue as a new entry. NVMe controller reads the command from there, performs the operation and puts an entry to the Completion Queue so the host/requester software understands the command is completed (or not). If data is read, it is transferred to data buffers directly not to the Completion Queue. Queues are large but limited in size, and are formed as Circular Buffers.

NVMe has advanced/enterprise features like Multi-Path I/O and Namespace Sharing. As these are not usually used in desktops, I am not going to talk about them. We will also see later these features are not supported on the SSD I am using.

In NVMe, namespace is an important concept. From the NVMe spec:

namespace is “A quantity of non-volatile memory that may be formatted into logical blocks. When formatted, a namespace of size n is a collection of logical blocks with logical block addresses from 0 to (n-1).”

So I understand a namespace means an NVMe storage device, I am not sure if a single device can be divided into multiple namespaces (like partitions), maybe in enterprise grade devices.

I am also not sure if this is a general rule but on the M.2 SSD I have, the NVMe controller is built-in to the M.2 device, as opposed to SATA where the SATA controller is on the mainboard.

To sum up:

  • Because SSD is a different technology than hard drives, a new standard is needed, and it is NVMe.
  • If you have a new SSD attached to a PCIe interface, you are most probably using NVMe.
  • The SSD probably has a PCIe NVMe controller on-board.
  • In order to manage the SSD, Admin Commands are sent to the Admin Submission Queue (ASQ) and results are collected from the Admin Completion Queue (ACQ).
  • One or more namespaces are created and formatted before using the SSD. This is implicitly done by the OS (Operating System) you are using.
  • At least one pair of I/O Command Submission and Completion Queue is created. Again, this is done implicitly for you by the OS.
  • In order to perform I/O operations such as read and write on the SSD, OS sends NVM I/O Commands to I/O Submission Queue(s) and collect results from I/O Completion Queue(s).

Now lets go into some details with examples.

NVMe with Examples

I am using a Samsung M.2 attached NVMe SSD (MZVPV512HDGL - SSD SM951 M.2 512 GB PCIe 3.0) for this article and using nvme-cli tools. nvme-cli is provided in Linux distributions but I recommend to get it from github as it is the most up-to-date.

First, lets see if we have any NVMe controllers on the PCI bus.

$ lspci -nn | grep NVMe

02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a802] (rev 01)

Any nvme devices ?

$ ls -l /dev/nvme*

crw------- 1 root root 247, 0 Aug 21 08:15 /dev/nvme0
brw-rw---- 1 root disk 259, 0 Aug 21 08:15 /dev/nvme0n1
brw-rw---- 1 root disk 259, 1 Aug 21 08:15 /dev/nvme0n1p1
brw-rw---- 1 root disk 259, 2 Aug 21 08:15 /dev/nvme0n1p2
brw-rw---- 1 root disk 259, 3 Aug 21 08:15 /dev/nvme0n1p3

There is one character device (nvme0) and four block devices.

$ lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0   477G  0 disk
├─nvme0n1p3 259:3    0  31.8G  0 part [SWAP]
├─nvme0n1p1 259:1    0   512M  0 part /boot/efi
└─nvme0n1p2 259:2    0 444.7G  0 part /

As you guess, nvme0n1 is the disk and others are the partitions.

Another way to see if we have NVMe controllers is:

$ sudo nvme list

Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     S27FNY0HB04880       SAMSUNG MZVPV512HDGL-000H1               1         163.17  GB / 512.11  GB    512   B +  0 B   BXW74H0Q

This is generated from sysfs (e.g. /sys/class/nvme/…).

Lets send Identify Admin Command to the NVMe controller. This returns 4096 bytes of data, so a long output. (-H resolves bit fields, makes it human friendly)

$ sudo nvme id-ctrl -H /dev/nvme0

NVME Identify Controller:
vid     : 0x144d
ssvid   : 0x144d
sn      : S27FNY0HB04880
mn      : SAMSUNG MZVPV512HDGL-000H1
fr      : BXW74H0Q
rab     : 2
ieee    : 002538
cmic    : 0
  [2:2] : 0 PCI
  [1:1] : 0 Single Controller
  [0:0] : 0 Single Port
mdts    : 5
cntlid  : 1
ver     : 0
rtd3r   : 0
rtd3e   : 0
oaes    : 0
  [8:8] : 0 Namespace Attribute Changed Event Not Supported
oacs    : 0x7
  [3:3] : 0 NS Management and Attachment Not Supported
  [2:2] : 0x1 FW Commit and Download Supported
  [1:1] : 0x1 Format NVM Supported
  [0:0] : 0x1 Sec. Send and Receive Supported
acl     : 7
aerl    : 3
frmw    : 0x6
  [4:4] : 0 Firmware Activate Without Reset Not Supported
  [3:1] : 0x3 Number of Firmware Slots
  [0:0] : 0 Firmware Slot 1 Read/Write
lpa     : 0x1
  [1:1] : 0 Command Effects Log Page Not Supported
  [0:0] : 0x1 SMART/Health Log Page per NS Supported
elpe    : 63
npss    : 4
avscc   : 0x1
  [0:0] : 0x1 Admin Vendor Specific Commands uses NVMe Format
apsta   : 0x1
  [0:0] : 0x1 Autonomous Power State Transitions Supported
wctemp  : 0
cctemp  : 0
mtfa    : 0
hmpre   : 0
hmmin   : 0
tnvmcap : 0
unvmcap : 0
rpmbs   : 0
 [31:24]: 0 Access Size
 [23:16]: 0 Total Size
  [5:3] : 0 Authentication Method
  [2:0] : 0 Number of RPMB Units
sqes    : 0x66
  [7:4] : 0x6 Max SQ Entry Size (64)
  [3:0] : 0x6 Min SQ Entry Size (64)
cqes    : 0x44
  [7:4] : 0x4 Max CQ Entry Size (16)
  [3:0] : 0x4 Min CQ Entry Size (16)
nn      : 1
oncs    : 0x1f
  [5:5] : 0 Reservations Not Supported
  [4:4] : 0x1 Save and Select Supported
  [3:3] : 0x1 Write Zeroes Supported
  [2:2] : 0x1 Data Set Management Supported
  [1:1] : 0x1 Write Uncorrectable Supported
  [0:0] : 0x1 Compare Supported
fuses   : 0
  [0:0] : 0 Fused Compare and Write Not Supported
fna     : 0
  [2:2] : 0 Crypto Erase Not Supported as part of Secure Erase
  [1:1] : 0 Crypto Erase Applies to Single Namespace(s)
  [0:0] : 0 Format Applies to Single Namespace(s)
vwc     : 0x1
  [0:0] : 0x1 Volatile Write Cache Present
awun    : 255
awupf   : 0
nvscc   : 1
  [0:0] : 0x1 NVM Vendor Specific Commands uses NVMe Format
acwu    : 0
sgls    : 0
  [0:0] : 0 Scatter-Gather Lists Not Supported
ps    0 : mp:9.00W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:4.00W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.00W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Some fields are optional and reported as 0. Meaning of some fields are:

  • vid is PCI Vendor ID (so Samsung in this case), 0x144d as in the lspci output.
  • ssvid is PCI Subsystem Vendor ID.
  • SN is Serial Number, MN is Model Number, FR is Firmware Revision.
  • cmic is Controller Multi-Path I/O and Namespace Sharing Capabilities.
Bit 2 = 0: Controller is associated with a PCI Function not SR-IOV.
Bit 1 = 0: Controller can only be connected to a single host.
Bit 0 = 0: Controller contains only a single NVM port.
These basically mean there is no Multi-Path I/O and Namespace Sharing support on this controller.
  • mdts is Maximum Data Transfer Size. It is reported as a power of two and in units of minimum memory page size. mdts=5 here, so 2⁵ = 32. We will see the minimum memory page size (MPSMIN) as 4K below. So MDTS is 32 * 4K = 128K.
  • oacs is Optional Admin Command Support.
Bit 5, 4, 3 = 0: Directives, Device Self-test, NVMe-MI Send, NVMe-MI Receive and Namespace Management and Attachment commands not supported.
Bit 2, 1, 0 =1: Firmware Commit and Firmware Image Download, Format NVM, Security Send and Receive commands are supported.
  • frmw is Firmware Updates.
Bit 4 = 0: A reset is required to activate a firmware.
Bit 3:1 = 3: There are 3 firmware slots.
Bit 0 = 0: Firmware Slot 1 is read/write.
  • tnvmcap is Total NVM Capacity and it is an optional field. As it is reported as zero, it is not supported.
  • sqes is (Maximum and Required) Submission Queue Entry (SQE) Size. It is expected to be 64 but with proprietary extensions it can be modified. The value is 0x66 here.
Bits 7:4 = 6, indicates the maximum entry size as a power of two, so it is 2⁶ = 64 bytes.
Bits 3:0 = 6, indicates the required entry size as a power of two, so it is 2⁶ = 64 bytes.
  • cqes is (Maximum and Required) Completion Queue Entry (CQE) Size. Similar to calculation of SQE size, its value is 0x44 here, so both maximum and required completion queue entry size is 2⁴=16 bytes.
  • nn is (Maximum) Number of Namespaces supported by the controller. Although theoretical maximum is high, here it is 1, so only one namespace is possible.
  • oncs is Optional NVM Command Support.
Bits 6:5 = 0: Timestamp feature and Reservation commands are not supported.
Bits 4:0 = 1: Save and Select, Write Zeroes, Dataset Management, Write Uncorrectable and Compare commands are supported.
fna is Format NVM Attributes.
Bit 2 = 0: Cryptographic erase is not supported.
Bit 1 = 0: Crytographic erase (if supported) applies to single namespace.
Bit 0 = 0: Format applies to single namespace.

Now, lets see if we have any namespaces by sending Identify List Namespaces command:

$ sudo nvme list-ns /dev/nvme0

[   0]:0x1

There is one namespace, with id 0x1.

Lets look at the details of this namespace:

$ sudo nvme id-ns /dev/nvme0 --namespace-id=0x1

NVME Identify Namespace 1:
nsze    : 0x3b9e12b0
ncap    : 0x3b9e12b0
nuse    : 0x130892c0
nsfeat  : 0
nlbaf   : 0
flbas   : 0
mc      : 0
dpc     : 0
dps     : 0
nmic    : 0
rescap  : 0
fpi     : 0
nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
nvmcap  : 0
nguid   : 00000000000000000000000000000000
eui64   : 0025386b610046b6
lbaf  0 : ms:0   ds:9  rp:0 (in use)

Some fields are optional as in Identify controller output. Meaning of some:

  • nsze, Namespace Size (NSZE). Indicates the number of logical blocks, 0x3b9e12b0 = 1000215216. We can validate this by running lsblk -b , I see the disk size as 512110190592, and block size is 512B, so this is equal to NSZE * 512.
  • ncap, Namespace Capacity (NCAP). The maximum number of logical blocks allocated to the namespace. If there is a thin provisioning (if disk space is allocated on demand), this value can be less than NSZE. Here, it is not the case so its value is same as NSZE.
  • nuse, Namespace Utilization (NUSE). The current number of logical blocks allocated to the namespace. If there is a thin provisioning (if disk space is allocated on demand), this value can be less than NSZE. Here, it is not the case so its value is same as NSZE.
  • nsfeat, Namespace Features (NSFEAT).
Bit 3 = 0, NGUID and EUI64 values may be reused by the controller after this namespace is deleted.
Bit 2 = 0, the controller does not support the Deallocated or Unwritten Logical Block error.
Bit 1 = 0, AWUN, AWUPF and ACWU fields should be used instead of NAWUN, NAWUPF, NACWU.
Bit 0 = 0, meaning the namespace does not support Thin Provisioning.
  • nlbaf, Number of LBA Formats (NLBAF). Indicates the number of supported LBA data size and metadata size combinations supported by the namespace. It is a zero based value and its value here is 0, so there is only 1 format supported.
  • flbas, Formatted LBA Size (FLBAS). Indicates the LBA data size and metadata size combination this namespace has been formatted with. First 3 bits = 0 is the index, so it is formatted with Format 0, and Bit 4 = 0, indicating metadata for a command is transferred separately.
  • nguid, Namespace Globally Unique Identifier (NGUID) and, eui64, IEEE Extended Unique Identifier (EUI64) are assigned when the namespace is created and preserved across namespace and controller operations (e.g. reset, format).
  • lbaf 0 is the LBA Format 0.

LBA Format fields:

  • ms, Metadata Size (MS). Since its value is 0, metadata is not supported.
  • ds, LBA Data Size (LBADS). It is reported in terms of a power of two, so LBA data size is 2⁹ = 512 bytes.
  • rp, Releative Performance (RP). Indicates the relative performance of this format to other formats. Value of 0 indicates it is the best performance. It does not make much sense here because we have only one LBA Format supported.

Now, lets look at what actually means to send these commands. NVMe currently defines an Admin and NVM Command Set. Above, I have used the nvme utilities as helper to send the commands. We can send the command manually as well and that is what we will do now.

NVMe Admin Commands

Here are the NVMe admin commands with opcodes:

  • Create and Delete I/O Submission Queue (01h, 00h)
  • Get Log Page (02h)
  • Create and Delete I/O Completion Queue (05h, 04h)
  • Identify (06h)
  • Abort (08h)
  • Get and Set Features (0Ah, 09h)
  • Asynchronous Event Request (0Ch)
  • Namespace Management (0Dh)
  • Firmware Commit and Image Download (10h, 11h)
  • Device Self-test (14h)
  • Namespace Attachment (15h)
  • Keep Alive (18h)
  • Directive Send and Receive (19h, 1Ah)
  • Virtualization Management (1Ch)
  • NVMe-MI Send and Receive (1Dh, 1Eh)
  • Doorbell Buffer Config (7Ch)
  • Format NVM (80h)
  • Security Send and Receive (81h, 82h)
  • Sanitize (84h)

As we have seen from the id-ctrl output, submission queue entry (SQE) size is 64 bytes. What is in this 64-bytes ? It consists of:

  • Command Dword 0 (CDW0), 4 bytes: Includes Command Identifier (2 bytes) and Opcode (1 byte)
  • Namespace Identifier (NSID), 4 bytes. If it is not used, it should be cleared to zero. If it is set to 0xFFFFFFFF, it causes command to be applied to all namespaces.
  • Following 8 bytes are reserved.
  • Metadata Pointer (MPTR), 8 bytes.
  • Data Pointer (DPTR), 16 bytes.
  • Command Dword 10 (CDW10), 4 bytes
  • Command Dword 11 (CDW11), 4 bytes
  • Command Dword 12 (CDW12), 4 bytes
  • Command Dword 13 (CDW13), 4 bytes
  • Command Dword 14 (CDW14), 4 bytes
  • Command Dword 15 (CDW15), 4 bytes

Lets now send the Identify command directly. We use nvme admin-passthru utility for this.

$ sudo nvme admin-passthru /dev/nvme0 --opcode=0x06 --cdw10=0x0001

--data-len=4096 -r -d -s
opcode       : 06
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 563d781e5fe0
metadata     : 0
cdw10        : 00000001
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000

Identify command has opcode 06h ( — opcode=0x06). It outputs 4096 bytes so we set the data length as such ( — data-len=4096) and we indicate the command is a read command (-r). We do not want to send command now so using dry run (-d) and we want to see command to be sent first (-s).

The output is a summary of command fields. Data Pointer (DPTR) (addr in the output) is set by the command as it allocates the data buffers.

NVM Commands

Here are the NVM commands with opcodes:

  • Flush (00h)
  • Write and Read (01h, 02h)
  • Write Uncorrectable (04h)
  • Compare (05h)
  • Write Zeroes (08h)
  • Dataset Management (09h)
  • Reservation Register and Report (0Dh, 0Eh)
  • Reservation Acquire and Release (11h, 15h)

NVM commands should only be sent if controller is ready (indicated by Controller Status register CSTS.RDY) and after I/O submission and completion queue have been created.

Lets see the registers:

$ sudo nvme show-regs -H /dev/nvme0

cap     : f000203c013fff
 Memory Page Size Maximum      (MPSMAX): 134217728 bytes
 Memory Page Size Minimum      (MPSMIN): 4096 bytes
 Command Sets Supported           (CSS): NVM command set is supported
 NVM Subsystem Reset Supported  (NSSRS): No
 Doorbell Stride                (DSTRD): 4 bytes
 Timeout                           (TO): 30000 ms
 Arbitration Mechanism Supported  (AMS): Weighted Round Robin with Urgent Priority Class is not supported
 Contiguous Queues Required       (CQR): Yes
 Maximum Queue Entries Supported (MQES): 16384
version : 10100
 NVMe specification 1.1
intms   : 0
 Interrupt Vector Mask Set (IVMS): 0
intmc   : 0
 Interrupt Vector Mask Clear (IVMC): 0
cc      : 460001
 I/O Completion Queue Entry Size (IOSQES): 16 bytes
 I/O Submission Queue Entry Size (IOSQES): 64 bytes
 Shutdown Notification              (SHN): No notification; no effect
 Arbitration Mechanism Selected     (AMS): Round Robin
 Memory Page Size                   (MPS): 4096 bytes
 I/O Command Sets Selected          (CSS): NVM Command Set
 Enable                              (EN): Yes
csts    : 1
 Processing Paused               (PP): No
 NVM Subsystem Reset Occurred (NSSRO): No
 Shutdown Status               (SHST): Normal operation (no shutdown has been requested)
 Controller Fatal Status        (CFS): False
 Ready                          (RDY): Yes
nssr    : 0
 NVM Subsystem Reset Control (NSSRC): 0
aqa     : ff00ff
 Admin Completion Queue Size (ACQS): 256
 Admin Submission Queue Size (ASQS): 256
asq     : 813b18000
 Admin Submission Queue Base (ASQB): 813b18000
acq     : 814a13000
 Admin Completion Queue Base (ACQB): 814a13000
cmbloc  : 0
 Controller Memory Buffer feature is not supported
cmbsz   : 0
 Controller Memory Buffer feature is not supported

Meaning of a few are:

  • Memory Page Size Maximum CAP.MPSMAX = 128K and Minimum CAP.MPSMIN = 4K
  • Timeout CAP.TO = 30000 ms, the worst case time to wait for for CSTS.RDY to change state from 0 to 1 and 1 to 0.
  • Maximum Queue Entries Supported (CAP.MQES) = 16K. In each I/O submission and completion queue, there can be maximum 16K entries.
  • Version (VS) = 1.1, the NVMe spec version controller supports. The latest version is 1.3 as of 22.08.2017.
  • Memory Page Size (CC.MPS) = 4K.
  • I/O Command Set Selected (CC.CSS)= NVM Command Set.
  • Ready (CSTS.RDY) = 1, the controller is ready to accept command submissions.
  • Admin Submission and Completion Queue Size (ASQS and ACQS) = 256

There is a valid reason why there are separate passthru utilities. There is a separate pair of submission and completion queue for Admin Commands, they cannot be sent to normal queues.

As the last example, we will issue a Read command both using the helper and manually. We will read LBA 0 and compare it to what we have with dd.

First, dump LBA 0 to file lba.0 with dd:

$ sudo dd if=/dev/nvme0n1 of=lba.0.dd bs=512 count=1

Second, read LBA 0 through nvme utility:

$ sudo nvme read /dev/nvme0n1 --start-block=0 --block-count=0 --data-size=512 --data=lba.0.read

Third, read LBA 0 by submitting a command manually through nvme io passthru:

$ sudo nvme io-passthru /dev/nvme0 --opcode=0x02 --namespace-id=0x1 --data-len=512 --read --cdw10=0 --cdw11=0 --cdw12=0 -b > lba.0.io

Read is opcode 0x02, we are sending command to namespace 0x1, reading 512 bytes, code word 10 and 11 (cdw10 and cdw11) specifies the start of reading in LBA blocks, so from 0 (both are 0), and the Bits 15:00 of code word 12 (cdw12) is number of blocks to read (set to 0 to indicate 1 block will be read).

If we compare them:

$ cmp lba.0.dd lba.0.read
$ cmp lba.0.dd lba.0.io

They are same.

NVMe is probably going to be much more common with SSDs. I hope this article helps you to understand the basics of this technology.



The best way to receive blog updates is to follow me on Twitter: @metebalci