Tech

Benchmarking Kubernetes Storage Solutions

23. Jul 2021

One of the most difficult subjects in the world of Kubernetes is storage. In our day-to-day operations we’ve often had to choose the best storage solution for our customers, but in a changing landscape of requirements and technical offerings, such choice becomes a major task.

Faced to many options, we decided to benchmark storage solutions in real life conditions, to generate the data required for a proper decision. In this article we’re going to share with you our methodology, our results, and our final choice.

The chosen storage providers for this evaluation were:

All of these benchmarks (except Gluster) were run on an OpenShift 4.7 cluster on Exoscale VMs.

We benchmarked Ceph both with unencrypted and encrypted storage for the OSDs (object-storage daemons). We included Gluster in our evaluation for reference and comparison only, as that’s the solution we offered for storage on OpenShift Container Platform 3.x. We never intended to use Gluster as the storage engine for our new Kubernetes storage cluster product.

Methodology

We first created a custom Python script driving kubestr, which in turn orchestrates Fio. This script performed ten (10) iterations for each benchmark, each of which included the following operations in an isolated Fio run:

  • Read iops
  • Read bandwidth
  • Write iops, with different frequencies of calls to fsync:
    • no fsync calls during each benchmark iteration
    • an fsync call after each operation (“fsync=1”)
    • an fsync call after every 32 operations (“fsync=32”)
    • an fsync call after every 128 operations (“fsync=128”)
  • Write bandwidth, with different frequencies of calls to fsync:
    • no fsync calls during each benchmark iteration
    • an fsync call after each operation (“fsync=1”)
    • an fsync call after every 32 operations (“fsync=32”)
    • an fsync call after every 128 operations (“fsync=128”)

This is the Fio configuration used for benchmarking:

[global]
randrepeat=0
verify=0
ioengine=libaio
direct=1
gtod_reduce=1
[job]
name=JOB_NAME     (1)
bs=BLOCKSIZE      (2)
iodepth=64
size=2G
readwrite=OP      (3)
time_based
ramp_time=5s
runtime=30s
fsync=X           (4)
  1. We generate a descriptive fio job name based on the benchmark we’re executing. The job name is generated by taking the operation (“read” or “write”) and the measurement (“bw” or “iops”) and concatenating them as “OP_MEASUREMENT”, for example “read_iops”.
  2. The blocksize for each operation executed by fio. We use a blocksize of 4K (4kB) for IOPS benchmarks and 128K (128kB) for bandwidth benchmarks.
  3. The IO pattern which fio uses for the benchmark. randread for read benchmarks. randwrite for write benchmarks.
  4. The number of operations to batch between fsync calls. This parameter doesn’t have an influence on read benchmarks.

If writing to a file, issue an fsync(2) (or its equivalent) of the dirty data for every number of blocks given. For example, if you give 32 as a parameter, fio will sync the file after every 32 writes issued. If fio is using non-buffered I/O, we may not sync the file. The exception is the sg I/O engine, which synchronizes the disk cache anyway. Defaults to 0, which means fio does not periodically issue and wait for a sync to complete. Also see end_fsync and fsync_on_close.

Fio documentation

Results

The following graph, taken from the full dataset for our benchmark (available for download and study) shows the type of comparison performed across all considered solutions.

Figure 1. Read IOPS (higher is better)

The table below gives an overview over the data gathered during our evaluation (each column shows mean ± standard deviation):

Storage solutionread IOPSread bandwidth MB/swrite IOPS, no fsyncwrite bandwidth MB/s, no fsyncwrite IOPS, fsync=1write bandwidth MB/s, fsync=1
OCS/Rook.io Ceph RBD (unencrypted OSDs)42344.21 ± 885.521585 ± 32.6559549.14 ± 371.11503.208 ± 12.544305.18 ± 15.6535.591 ± 1.349
OCS/Rook.io CephFS (unencrypted OSDs)44465.21 ± 1657.911594 ± 82.5229978.00 ± 456.97512.788 ± 8.0498808.47 ± 357.87452.086 ± 10.154
OCS/Rook.io Ceph RBD (encrypted OSDs)36303.06 ± 2254.871425 ± 59.7206292.75 ± 424.91310.520 ± 63.047225.00 ± 12.1122.804 ± 1.031
OCS/Rook.io CephFS (encrypted OSDs)36343.35 ± 1234.931405 ± 92.8686020.49 ± 251.16278.486 ± 49.1015004.28 ± 152.01291.729 ± 17.367
Longhorn (unencrypted backing disk)11298.36 ± 664.99295.458 ± 25.458111.197 ± 10.3225975.43 ± 697.14391.57 ± 26.1129.993 ± 1.544
Gluster22957.87 ± 345.40976.511 ± 45.2682630.89 ± 69.21531.88 ± 48.22133.563 ± 11.45543.549 ± 1.656

Unencrypted Rook/OCS numbers are from OCS, encrypted Rook/OCS numbers from vanilla Rook.

Conclusion

After careful evaluation of the results shown above, we chose Rook to implement our APPUiO Managed Storage Cluster product. Rook allows us to have a single product for all the Kubernetes distributions we’re offering at VSHN.

We have released the scripts on GitHub for everyone to verify our results, and published even more data in our Products documentation. Feel free to check the data, run these tests in your own infrastructure, and of course, your pull requests are more than welcome.

Simon Gerber

Simon Gerber is a DevOps engineer in VSHN.

Contact us

Our team of experts is available for you. In case of emergency also 24/7.

Contact us