Adventures with Linear Tape-Open (LTO) at Home in 2023
Why and Why LTO-5
Why tape backup, why LTO and why now ? I started to think about this because I wanted to move my QNAP NAS to a TrueNAS server. I thought I could use Google Cloud Storage for transferring the data as I have a pretty fast WAN connection but I realized the network egress (download from Google) cost is very high. At the end, I purchased extra HDDs and used my PC for the transfer, but meanwhile I realized I could have used a tape backup. I was also thinking about having a backup other than cloud backup I am using, so it made me consider tape backup seriously.
I used tapes for backup in late 90s when I was working at data centers but I havent directly used them afterwards. So I had no idea what the current status was. It seems now we have LTO tape drives and media (cartridges), and the latest generation is LTO-9 which can store 18TB uncompressed/45TB compressed to one media. That is a lot, but also very expensive for home use. To give you an idea, HP external SAS LTO-9 drive costs around 5000 USD, and a single media is around 150 USD.
Many previous LTO generation drives and media can still be found in second hand market. I did some research and I find LTO-5 at a sweet spot. It has 1.5TB uncompressed/3TB compressed capacity and encryption capability. LTO-5 media can also be found at acceptable prices. I purchased the drive at around 300 USD, new media around 20 USD/media, and used media around 7 USD/media.
There can be a few different versions of the tape drives within the same generation from the same manufacturer. For example, there are internal SAS drives and external SAS drives in half height (HH) or full height (FH). Because I am going to use it only sometimes, I did not want an internal drive that will be powered all the time (and it might be noisy). External drives have a power on/off button so they can be turned on and off anytime (since they are using SAS, it is hot-plug). However, as all these drives are SAS, a SAS controller card is also needed. I purchased a used LSI 9207e for this purpose (around 50 USD) and SAS cables (around 30USD for 2x 1.8m cables). HH drives are cheaper, so I limited my search to external HH LTO-5 drives. There are also fiber channel (FC) drives, I believe most of these are parts of a larger tape system/library, I have not checked them.
HP, IBM and Quantum branded products can be found in second hand. I think the software and firmware of HP and IBM drives cannot be accessed freely (you need a support contract), so I decided to get Quantum. Particularly the LTO-5 generation of the drives look alike, I do not know if one vendor is the main manufacturer, or if they are co-developed.
I purchased a used Quantum Ultrium HH5 Model C (3580 H5S) and later a used Model B (TC-L52BN). The model C is similar to IBM models, and model B is similar to HP models looking from outside. I actually like the model C, but its fan is noisier than model B. Model B is a little bit bigger, I think it might have a bigger fan so less noisy. From the usage point of view, they are the same for me. The only difference I observed other than the noise is model C returns more diagnostic data, but I do not know if that is really the case or I am using the tool wrong or wrong tool. Both units I purchased came with the latest firmware, so there was no need to upgrade the firmware.
I started using Model B first. However, after some months of use, first the front panel of Model B got broken, and then I started to experience errors. So I switched to Model C. I run tests on Model B and SCSI Interconnect test returns error. As I am sure the cable and HBA is working fine, I think the unit has a problem. I disposed the Model B (actually sold as defect), and using Model C now.
For media, I first purchased a small box (5x) of new Quantum LTO-5 cartridges. Then, I purchased a used lot of 40x IBM LTO-5 cartridges. I also purchased a new Quantum tape head cleaning cartridge. A disadvantage of getting used media is they might come without the plastic cover, and finding the plastic cover is more difficult than finding a used media. Then, a good container becomes a must for storing them. I also made a 3D model of the case and printed them, it is not perfect but better than nothing. I have not experienced any issue with the used LTO-5 cartridges yet.
Update: As I could not resolve the problem of plastic covers, I purchased two boxes (5x) of LTO-2 cartridges for a small amount, and disposed the cartridges and using the plastic covers only.
Instead of LTO-5, it is also possible to go with LTO-6. LTO-6 tape drives can be more than three times more expensive than LTO-5 drives, but sometimes good offers can be found. The cost of new LTO-6 media is not very different (maybe it is harder to find used media, I do not know) and you can still write to LTO-5 media if needed. LTO drives can read two generations back and write one generation back. So it makes sense to look at LTO-6 drives as well.
How I use the tape backup
I know this is not how the tape backups are used, it is just my simple way of using the tape technology at home, at least for the moment.
I have a Linux VM that I solely use for the tape, so the LSI 9207e is assigned to this VM. I have installed the usual tape related packages.
Tape is an interesting technology. There is a very long (really very long, 846m) tape inside one LTO-5 media. This tape has multiple bands, and the tape head can also write to multiple bands at the same time (but not to all) and in both directions. So the tape has to be rewound multiple times (I think 80 times to fill all the tape). Rewind means there is a spool inside the media and also there is a spool inside the tape, so the tape is rewound between these when the tape is operating. So it is a lot of movement (~67km for the full capacity). It is naturally very bad at random access, but it can sustain a sequential speed for a long time (until all the tape is rewound in one direction), and because it can work in both directions, even this pause at the end is I think a very short time.
For LTO-5, the uncompressed write speed is 140MB/s. Uncompressed speed means if the data cannot be compressed further by the tape drive at all, than you need to supply data at this rate, so it is a minimum. This may sound not much, but this is continuous (similar to sequential disk speed) and when you do not provide enough data, tape has to stop and might rewind etc. so you will lose a lot of time actually. It is I think impossible to do this without using some type of memory buffer where the tape is directly attached. That means it is not possible to just copy a set of files to the tape, because it will require random disk access, and it will be way slower than 140MB/s.
I decided to use my Windows PC as the consolidation point of the data to be sent to tape. This is primarily because most of the files I would like to backup is on my PC or I can fetch or copy them to the PC easily. I use PCIe M.2 NVMe drives, these are small capacity (4TB) but at the moment it is enough for the tar files I am creating (if not I can use the HDDs). I have a simple 7z
script to create a tar
file containing everything I would like to backup to tape.
Update: I started using an HDD on my PC for the tape operations. If the tape operation, either making tar or sending tar to the linux VM where the tape is attached is the only operation, HDD speed is OK. However, if making tar and sending tar is done at the same time, sending is not fast enough. So if you plan to use the disk for more than one operation at the same time, I recommend to use an SSD.
7z -bb0 -spf2 -sse -ttar a tape.tar @%1
This creates tape.tar file from the files listed in the file given in the first argument (%1). I have different file lists for different backups (e.g. for myself, for my wife, for some projects only, for music library etc.)
Making tar can take a long time depending on how many files you have. I also backup some folders in my Dropbox and naturally it is much faster if these folders are kept offline. Also, if you have a folder with lots of files inside (>5000 or so) and if you do not use it, it makes sense to zip them, not for compression but to have just one file instead of thousands.
Then, I use Ubuntu on WSL on PC for actually writing to tape. The reason I am using Windows for creating tar and not Ubuntu is it is less problematic to use Windows for accessing some files on cloud (Dropbox etc.) or on network.
On Ubuntu (on WSL), I have a script that does the following:
- enable tape encryption:
ssh $HOST sudo stenc -f $TAPE -e on -d on --no-allow-raw-read --ckod -k tape.key -a 1
- enable tape compression:
ssh $HOST sudo mt -f $TAPE compression 1
- save the encryption and tape status (with stenc and tapeinfo)
- rewind the tape:
ssh $HOST sudo mt -f $TAPE rewind
- send tar to mbuffer and then to the tape:
cat tape.tar | ssh $HOST "mbuffer -f -m $MBUFFER_SIZE -T $MBUFFER_FILE -P 90 | dd of=$TAPE bs=1M iflag=fullblock" 2>&1
As I said, the tape is actually connected to another Linux VM on the network. So I run all the commands through ssh. Naturally the tape drive should be powered on and a media has to be inserted before these are run. I have a 10G network at home, so the raw network speed is not a bottleneck.
I think there is no need to enable encryption and compression everytime (until the tape is powered down), but I do it everytime just to make sure the script does always the same thing that does not depend on the state of the tape drive.
The compression and encryption on the tape drive has no performance penalty, so I enable both. Encryption is a must since anything written on the tape media can be read by someone who has it. The encryption is AES-256-GCM-128, so it is pretty strong and industry standard. tape.key
specified in the stenc
command is a file containing a 256-bit (32 bytes) key (a hex string in the file). Obviously, you have to keep tape.key very safe, otherwise you cannot read the tapes back.
Due to the sequential performance of the tape I mentioned before, I use mbuffer
on Linux VM. Basically I send (cat
) the tar file over the network using ssh
, and mbuffer
caches it (to a local NVMe drive) and when the cache is almost (90%) full, it starts sending it to the tape drive in 1M blocks using dd
. I am using 64G cache for mbuffer at the moment.
It takes over 3 hours to write to an LTO-5 media at full capacity (1.5 TB). The actual time depends on how good you can supply the data as I mentioned above. Using the above method, I get around 140MB/s, so the maximum performance possible.
When needed, the tape write speed can be checked with tapestat
on the Linux VM.
Because a single set of backup I have is less than 1.5 TB, it fits to a single media. In order to have backups in different medias for redundancy, I rotate 4 cartridges for the same backup set. Since I always take full backups, each tape is actually an independent copy.
Cost
So what is the cost of this ? Very roughly:
- LTO-5 tape drive, used ~300 USD
- Assuming you need only a few, then a box (5x) of LTO-5 media, new ~100 USD
- Not urgent but eventually you will need a cleaning cartrdige, new ~30 USD
- SAS controller card with external SAS ports, used ~50 USD
- SAS cable, used ~15 USD
So it is around 500 USD. Add another 100 USD or so depending on the quantity if you purchase a lot of used LTO-5 media.
An LTO media is said to have a life-time of around 250 full passes (in practice this might be higher). So even if you do daily backups of full capacity of one media, this should last at least a year. When using them weekly, it means at least 5 years.
The cleaning cartridge has a life-time of 50 cleanings. A cleaning is recommended only when the drive asks for it, and I think after 50 full passes or so. Not sure how this is calculated but it is definitely not often, I still have not used the cleaning cartridge (after 6 months).
Update: I have used it once I think in 10.2023, as expected the drive indicated the cleaning request and I just inserted the cleaning cartridge.
Hints for Scripting in Windows
Since I am using Windows to create the tar file, I am using a small BAT file to automate the process. However, I am not very familiar with Windows in terms of scripting. So here are a few useful things I have learned.
It is sometimes required to check if an application is not running. Following code can be used for that (stm32cubeide.exe is used as an example):
qprocess "stm32cubeide.exe" >NUL 2>&1
if errorlevel 0 (
echo "please close the STM32CubeIDE application !"
exit /b 1
)
echo "OK, STM32CubeIDE is not running"
I create a timestamp file when needed, and this code can be used for that:
echo %date%%time% > <timestamp_file_path>
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.