Why and Why LTO-5
Why tape backup, why LTO and why now ? This started because I wanted to move my QNAP NAS to a TrueNAS server. I thought I could use Google Cloud Storage for transferring the data as I have a pretty fast WAN connection but I realized the network egress (download from Google) cost is very high. At the end, I purchased extra HDDs and used my PC for the transfer, but meanwhile I realized I could have used a tape backup. I was also thinking about having a backup other than cloud backup solutions I am using, so it made me consider tape backup seriously.
I used tapes for backup in late 90s when I was working at data centers but I havent used them afterwards. So I had no idea what the current status was. It seems now we have LTO tape drives and media (cartridges), and the latest generation is LTO-9 which can store 18TB uncompressed/45TB compressed to one media. That is a lot, but also very expensive for home use. To give you an idea, HP external SAS LTO-9 drive costs around 5000 USD, and a single media is around 150 USD.
Many previous LTO generation drives and media can still be found in second hand market. I did some research and I find LTO-5 at a sweet spot. It has 1.5TB uncompressed/3TB compressed capacity and encryption capable. LTO-5 media can also be found at acceptable prices. I purchased the drive at around 300 USD, new media around 20 USD/media, and used media around 7 USD/media.
There are a few different versions of the tape drives of same generation from the same manufacturer. These are internal SAS drives and external SAS drives in half height (HH) or full height (FH). Because I am going to use it only sometimes, I did not want an internal drive that will be powered all the time (and it might be noisy). External drives have a power on/off button so they can be run anytime. All these drives are SAS, so also a SAS controller card is needed. I purchased a used LSI 9207e for this purpose (around 50 USD) and SAS cables (around 30USD for 2x 1.8m cables). HH drives are cheaper, so I limited my search to external HH LTO-5 drives.
HP, IBM and Quantum branded products can be found in second hand. I think the software and firmware of HP and IBM drives cannot be accessed freely (you need a support contract), so I decided to get Quantum. Particularly the LTO-5 generation of the drives look alike, I do not know if one vendor is the main manufacturer, or if they are co-developed.
I purchased a used Quantum Ultrium HH5 Model C (3580 H5S) and later a used Model B (TC-L52BN). The model C is similar to IBM models, and model B is similar to HP models looking from outside. I actually like the model C, but its fan is noisier than model B. Model B is a little bit bigger, I think it might have a bigger fan so less noisy. So I am using the model B at the moment, and keeping model C just as a backup. From the usage point of view, they are same. The only difference other than noise I observed is model C returns more diagnostic data, but I do not know if that is really the case or I am using the tool wrong or wrong tool. Both units I purchased came with the latest firmware, so there was no need to upgrade the firmware.
After some months of use, first the front panel of Model B is broken, and then I started to experience errors. So I switched to Model C. I run tests on Model B and SCSI Interconnect test returns error. As I am sure the cable and HBA is working fine, I think the unit has a problem.
For media, I first purchased a small box of 5 of new Quantum LTO-5 cartridges. Then I purchased a used lot of 40x IBM LTO-5 cartridges. I also purchased a new Quantum tape head cleaning cartridge, which I havent used yet. A disadvantage of getting used media is they come without the plastic cover, so you should find a good container for storing them.
Instead of LTO-5, it is also possible to go with LTO-6. LTO-6 tape drives can be three times more (or even more) expensive than LTO-5 drives, but if you find a good offer, the cost of new LTO-6 media is not very different (maybe it is harder to find used media, I do not know) and you can still write to LTO-5 media if needed. LTO drives can read two generations back and write one generation back, so after LTO-5 my upgrade path is LTO-7.
How I use the tape backup
I am still thinking about how to use the tape easily but this is how I use it now. I know this is not how the tape backups are used in enterprises, it is just my simple way of using the tape technology, at least for the moment.
I have a Linux VM that I solely use for the tape, so the LSI 9207e is assigned to this VM. I have installed the usual tape related packages.
Tape is an interesting technology. There is a very long (really very long, 846m) tape inside one LTO-5 media. This tape has multiple bands, and the tape head also can write to multiple bands at the same time (but not to all) and in both directions. So the tape has to be rewound multiple times (I think 80 times to fill all the tape). Rewind means there is a spool inside the media and also there is a spool inside the tape, so the tape is rewound between these when the tape is operating. So it is a lot of movement (~ 67km for the full capacity). It is naturally very bad at random access, but it can sustain a sequential speed for a long time (until all the tape is rewound in one direction), and because it can work in both directions, even this pause is I think a very short time.
For LTO-5, the uncompressed write speed is 140MB/s. Uncompressed speed means if the data cannot be compressed further by the tape drive at all, than you need to supply data at this rate, so it is a minimum. This may sound not much, but this is continuous (similar to sequential disk speed) and when you do not provide enough data, tape has to stop and might rewind etc. so you will lose time. It is I think impossible to do this without using some type of buffer, containing a sequential data. That means it is not possible to just copy a set of files to the tape, because it will require random disk access, and it will be way slower than 140MB/s.
I decided to make my (Windows 11) PC the consolidation point of the data to be sent to tape. This is primarily because most of the files I would like to backup is on my PC. I normally use PCIe M.2 NVMe drives, but these are small capacity (1TB and 2TB), so I use two large capacity (18TB) HDDs (configured as raid-1 mirror). I have a simple
7z script to create a
tar file containing everything I would like to backup to tape.
7z -bb1 -spf2 -sse -ttar a tape.tar @%1
This creates tape.tar file from the files listed in the file given in the first argument (%1). I have different file lists for different backups (e.g. for myself, for my wife, for some projects only, for songs etc.)
Making tar can take a long time depending on how many different files you have, so how many random access is involved etc.
Then I use Ubuntu in WSL on PC for actually writing to tape. The reason I am using Windows for creating tar and not Ubuntu is it is less problematic for accessing some files on cloud (Dropbox etc.) or on network.
On Ubuntu (on WSL), I have a script that does the following:
- enable tape encryption:
ssh $HOST sudo stenc -f $TAPE -e on -d on --no-allow-raw-read --ckod -k tape.key -a 1
- enable tape compression:
ssh $HOST sudo mt -f $TAPE compression 1
- save the encryption and tape status (with stenc and tapeinfo)
- rewind the tape:
ssh $HOST sudo mt -f $TAPE rewind
- send tar to mbuffer and then tape:
cat tape.tar | ssh $HOST "mbuffer -f -m $MBUFFER_SIZE -T $MBUFFER_FILE -P 90 | dd of=$TAPE bs=1M iflag=fullblock" 2>&1
As I said, the tape is actually connected to another Linux VM on the network. So I run all the commands through ssh. Naturally the tape drive should be powered on and a media has to be inserted before these are run. I have a 10G network, so raw network speed is not a bottleneck.
I think there is no need to enable encryption and compression everytime (until the tape is powered down), but I do it everytime just to make sure the script does the same thing always that does not depend on the state of the tape drive.
The compression and encryption on the tape drive has no performance penalty, so I enable both. Encryption is also a must since anything written on the tape media can be read by someone who has it. The encryption is AES-256-GCM-128, so it is pretty strong and industry standard.
tape.key specified in the
stenc command is a file containing a 256-bit (32 bytes) key (a hex string in the file). You have to keep tape.key very safe, otherwise you cannot read the tapes back.
Due to the reasons I wrote above, I use
mbuffer on Linux VM. Basically I send (
cat) the tar file over the network using
mbuffer caches it (to a local NVMe drive) and when the cache is almost (90%) full, it starts sending it to the tape drive in 1M blocks using
dd. I am using 64G cache for mbuffer at the moment, but experimenting with other settings time to time.
It takes at minimum around 3 hours to write to an LTO-5 media at full capacity (1.5 TB). The actual time it takes depends on how good you can supply the data as I mentioned above. Using the above method, I get over 100MB/s, the last was 120MB/s.
I sometimes check the tape write speed with
tapestat on the Linux VM.
Because a single set of backup I have is less than 1.5 TB, it fits to a single media. In order to have a few the backups in different medias for redundancy, I rotate 3 or 4 media for the same but changing backup set. For the backups that are not changing often, I keep 1-2 copies.
So what is the cost of this ? Very roughly:
- LTO-5 tape drive, used ~300 USD
- Assuming you need only a few, then a box of LTO-5 media, new ~100 USD
- Not urgent but eventually you will need a cleaning cartrdige, new ~30 USD
- SAS controller card with external SAS ports, used ~50 USD
- SAS cable, used ~15 USD
So it is around 500 USD. Adding another 100 USD or so depending on the quantity, you can purchase lots of used LTO-5 media if needed.
An LTO media is said to have a life-time of around 250 full passes (in practice this might be higher). So even if you do daily backups of full capacity of one media, this should last at least a year.
The cleaning cartridge has a life-time of 50 cleanings. A cleaning is I think recommended after 50 full passes. So similar to above, if you use one full tape daily, this will also last a year. Cleaning is requested by the drive, so you only do this when it is requested.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.