How to Benchmark Object Storage Solutions: The CPU Bottleneck Case Study

Lately, we in the Cloud Modeling and Data Analytics group at Seagate have been developing advanced methodology for benchmarking object storage solutions. Today, I would like to introduce you to our methods and offer one concrete illustration of what our tests can tell us.

We use Intel’s COSBench as one building block of our object storage benchmarking platform. On top of COSBench, we built an elaborate script to extend its capabilities and to automate complicated benchmarks. The script creates a series of complex workloads that vary in object size, worker count, and read:write ratio, and then submits them sequentially to our Swift-based object storage system. For each workload, we monitor the bandwidth, throughput, success ratio, and response time via COSBench. Simultaneously, we use the monitoring software Ganglia to record hardware statistics such as network traffic, drive temperature, and CPU and RAM usage. Our scripts then automatically correlate the COSBench and Ganglia data, providing an in-depth, per-workload performance report.

Let me give you an example of what we can learn from our object storage benchmarking method. I ran our object storage benchmarking test on one server but interchanging three different CPUs. Figure 1 shows the bandwidth observed for 200 different workloads for the server with each of the three processors. (At the moment, the details of the workloads aren’t important; just know that the object size increases as the workload number increases.) Both CPU 2 and CPU 3 provide the same bandwidth, but CPU 1 reaches only half of this performance.

Figure 1: Bandwidth observed for 200 different workloads for the same server using three different CPUs.

What is causing the reduced bandwidth?

Because we monitor and correlate the hardware statistics, I can also examine the CPU usage during each workload, which is illustrated in Figure 2. It is obvious that CPU 1 is working much harder, and probably creating the system bottleneck and reduced throughput. This makes sense because CPU 1 was clearly the least powerful of the three (CPU 1 had 4 cores, 4 threads, and ran at 1.8 GHz; CPU 2 was 6/12/2.0 GHz, and CPU 3 was 8/16/2.2 GHz). Another interesting observation is that even though CPU 2 is less powerful than CPU 3, both give the same overall bandwidth. This indicates that a system component other than the CPU becomes the bottleneck for these hardware setups.

Figure 2: CPU utilization for the 200 workloads.

There you have it: by using a combination of COSBench for object storage testing and Ganglia for hardware monitoring, we are able to determine performance and identify bottlenecks in object storage systems. Stay tuned for more results!

Author: Kelsie Betsch