Benchmarking Kubernetes Storage Solutions
One of the most difficult subjects in the world of Kubernetes is storage. In our day-to-day operations we’ve often had to choose the best storage solution for our customers, but in a changing landscape of requirements and technical offerings, such choice becomes a major task.
Faced to many options, we decided to benchmark storage solutions in real life conditions, to generate the data required for a proper decision. In this article we’re going to share with you our methodology, our results, and our final choice.
The chosen storage providers for this evaluation were:
- Ceph on Rook and OpenShift Data Foundation (previously known as OpenShift Container Storage)
- Longhorn
- Gluster (benchmarked on APPUiO Exoscale OpenShift 3.11)
All of these benchmarks (except Gluster) were run on an OpenShift 4.7 cluster on Exoscale VMs.
We benchmarked Ceph both with unencrypted and encrypted storage for the OSDs (object-storage daemons). We included Gluster in our evaluation for reference and comparison only, as that’s the solution we offered for storage on OpenShift Container Platform 3.x. We never intended to use Gluster as the storage engine for our new Kubernetes storage cluster product.
Methodology
We first created a custom Python script driving kubestr, which in turn orchestrates Fio. This script performed ten (10) iterations for each benchmark, each of which included the following operations in an isolated Fio run:
- Read iops
- Read bandwidth
- Write iops, with different frequencies of calls to
fsync
:- no fsync calls during each benchmark iteration
- an fsync call after each operation (“fsync=1”)
- an fsync call after every 32 operations (“fsync=32”)
- an fsync call after every 128 operations (“fsync=128”)
- Write bandwidth, with different frequencies of calls to
fsync
:- no fsync calls during each benchmark iteration
- an fsync call after each operation (“fsync=1”)
- an fsync call after every 32 operations (“fsync=32”)
- an fsync call after every 128 operations (“fsync=128”)
This is the Fio configuration used for benchmarking:
[global]
randrepeat=0
verify=0
ioengine=libaio
direct=1
gtod_reduce=1
[job]
name=JOB_NAME (1)
bs=BLOCKSIZE (2)
iodepth=64
size=2G
readwrite=OP (3)
time_based
ramp_time=5s
runtime=30s
fsync=X (4)
- We generate a descriptive fio job name based on the benchmark we’re executing. The job name is generated by taking the operation (“read” or “write”) and the measurement (“bw” or “iops”) and concatenating them as “OP_MEASUREMENT”, for example “read_iops”.
- The blocksize for each operation executed by fio. We use a blocksize of 4K (4kB) for IOPS benchmarks and 128K (128kB) for bandwidth benchmarks.
- The IO pattern which fio uses for the benchmark. randread for read benchmarks. randwrite for write benchmarks.
- The number of operations to batch between fsync calls. This parameter doesn’t have an influence on read benchmarks.
If writing to a file, issue an fsync(2) (or its equivalent) of the dirty data for every number of blocks given. For example, if you give 32 as a parameter, fio will sync the file after every 32 writes issued. If fio is using non-buffered I/O, we may not sync the file. The exception is the sg I/O engine, which synchronizes the disk cache anyway. Defaults to 0, which means fio does not periodically issue and wait for a sync to complete. Also see end_fsync and fsync_on_close.
Fio documentation
Results
The following graph, taken from the full dataset for our benchmark (available for download and study) shows the type of comparison performed across all considered solutions.

The table below gives an overview over the data gathered during our evaluation (each column shows mean ± standard deviation):
Storage solution | read IOPS | read bandwidth MB/s | write IOPS, no fsync | write bandwidth MB/s, no fsync | write IOPS, fsync=1 | write bandwidth MB/s, fsync=1 |
---|---|---|---|---|---|---|
OCS/Rook.io Ceph RBD (unencrypted OSDs) | 42344.21 ± 885.52 | 1585 ± 32.655 | 9549.14 ± 371.11 | 503.208 ± 12.544 | 305.18 ± 15.65 | 35.591 ± 1.349 |
OCS/Rook.io CephFS (unencrypted OSDs) | 44465.21 ± 1657.91 | 1594 ± 82.522 | 9978.00 ± 456.97 | 512.788 ± 8.049 | 8808.47 ± 357.87 | 452.086 ± 10.154 |
OCS/Rook.io Ceph RBD (encrypted OSDs) | 36303.06 ± 2254.87 | 1425 ± 59.720 | 6292.75 ± 424.91 | 310.520 ± 63.047 | 225.00 ± 12.11 | 22.804 ± 1.031 |
OCS/Rook.io CephFS (encrypted OSDs) | 36343.35 ± 1234.93 | 1405 ± 92.868 | 6020.49 ± 251.16 | 278.486 ± 49.101 | 5004.28 ± 152.01 | 291.729 ± 17.367 |
Longhorn (unencrypted backing disk) | 11298.36 ± 664.99 | 295.458 ± 25.458 | 111.197 ± 10.322 | 5975.43 ± 697.14 | 391.57 ± 26.11 | 29.993 ± 1.544 |
Gluster | 22957.87 ± 345.40 | 976.511 ± 45.268 | 2630.89 ± 69.21 | 531.88 ± 48.22 | 133.563 ± 11.455 | 43.549 ± 1.656 |
Unencrypted Rook/OCS numbers are from OCS, encrypted Rook/OCS numbers from vanilla Rook.
Conclusion
After careful evaluation of the results shown above, we chose Rook to implement our APPUiO Managed Storage Cluster product. Rook allows us to have a single product for all the Kubernetes distributions we’re offering at VSHN.
We have released the scripts on GitHub for everyone to verify our results, and published even more data in our Products documentation. Feel free to check the data, run these tests in your own infrastructure, and of course, your pull requests are more than welcome.