What is the best way to predict system-level performance accurately and cost-efficiently?

System-level performance is one of the most important metrics that IT architects consider when they consider capacity expansions. There are several approaches available to estimate whether a storage system will provide acceptable performance to the user. Modeling system performance requires significant domain knowledge. Carefully interpreted and validated modeling results allow for a quick and cost-efficient evaluation of a wide range of possible solutions. On the other end of the spectrum, system-level testing provides the most accurate results possible. Carrying out system-level performance tests is costly and time consuming, limiting the practicality of this method when trying to evaluate a large number of possible solutions. A hybrid approach that tests key components and uses modeling to interpolate and extrapolate from those results can provide the best of both worlds. In this post we’ll discuss some of the benefits of benchmarking OpenStack Swift at the server level as well as some of the challenges to reproducible results and predictive modeling of system-level performance.

There are several questions that anyone testing system-level performance must answer for themselves before they can start comparing systems. The most important question is what the present and future system-level workloads look like. There’s a temporal aspect of the workload which often has a 24 hour and 7 day frequency components. It’s often sufficient to provision for the maximum workload measured or predicted for the expected lifetime of the system. This workload has to be further broken down into its put (write) and get (read) components for all the object sizes observed. When decomposed in this fashion, the workload can be distilled down to a succinct representation that can be used for testing. Combining workload requirements with knowledge of customer requirements provides a powerful framework for solution optimization.

With the workload data and customer requirements established, we have to decide at what scale we want to test. Testing the full system provides the most accurate results but is often not practical for comparison purposes. It’s also far more difficult to standardize enough parameters at the system level to ensure a meaningful comparison. Because of the difficulty in comparing differently architected and configured systems, we carry out object storage performance testing at the object server level where we can explore various HDD configurations among other aspects (Fig. 1). In a well-designed system, load-balancers, proxy servers and authentication servers shouldn’t constrain the workhorse of the system – the object server. If they do, then a less capable, and more affordable, object server can be used so as to match the overall performance of the system. By testing object servers we can offer easy comparison of various configurations while keeping all other aspects of the system constant.

Figure 1. In red get (read) throughput and in blue put (write) throughput of the object server. Darker colors on the right are for the faster 7.2K RPM Constellation ES drive which outperforms the 5.9K RPM Terascale drive (lighter colors on the left) for object sizes smaller than 1MB.

Object server performance results alone are rarely sufficient to predict system-level performance. By using detailed models of the full system that take into account network bottlenecks we can predict the system-level object storage performance based on the measured object-server performance. For example if we were to use 21 object servers similar to the one benchmarked above with the faster 7.2K RPM drives, and design a reasonable system around them we can expect up to 13K email archival requests to be serviced successfully with a better than 17ms latency (Fig. 2). Leveraging benchmark results and combining them with modeling, we can quickly prototype the optimal system for our workloads and choose the best architecture and object storage servers.

Figure 2. Assuming email archival workloads and 21 object storage servers with 7.2K RPM Constellation ES drives and 10Gbps network interfaces we can expect to service 13K requests concurrently before the response time of the system becomes unacceptable (>17ms).

Author: Dimitar Vlassarev