Understanding Data Center Workloads with Virtual Instruments

Abstract Virtual Instruments combines a storage workload modeling application – WorkloadWisdom, with purpose-built load generation appliances and data capture probes, to help storage architects and engineers to accurately characterize storage performance. Step one is to acquire and analyze product data to characterize the workload model. In 2015, Virtual Instruments changed the industry by automating the analysis of production workload data, via the Workload Data Importer feature of WorkloadWisdom. This document provides a summary and details for characterizing network storage workloads and determining the level of data required for a good, better, or best outcome. The more complete the data reported by and array or performance monitor, and the higher the resolution in terms of reporting period, the more closely a production application can be accurately modeled for testing. We describe the details of what should be monitored and provided to produce a superior workload model from a production workload. The goals of this document are two-fold: 1. Outline ways to characterize workloads for Enterprise data center applications. The goal is intended to cover the most common characteristics that are readily available information and impact developing a good emulation of the workload. It is not intended to cover corner case characteristics that would be required in order to test the full functionality of a storage array. The intent is to enable the development of a realistic enough workload to be able to compare different devices, configurations, firmware versions, or to detect degraded components in an infrastructure. Characterizing a workload that would be complete enough to do a full storage manufacturer regression test of a storage subsystem is outside the scope of these recommendations. 2. Describe how the Workload Data Importer simplifies and automates the analysis of production storage workloads. While understanding the section “What to Characterize” is useful, it’s no longer necessary, as the Importer now does the heavy lifting. Virtual Instruments Services As the leader in infrastructure performance optimization, the Virtual Instruments Professional Services organization helps teams to characterize their workload and to model configurable workloads to test ‘what if’ scenarios against their most common workloads. What to Characterize There are 4 basic areas to consider when characterizing a workload for a storage environment. 1. 2. 3. 3 Description of the size, scope and configuration of the environment, including number of servers, LUNs, volumes, and shares The patterns that describe when, how frequently and in what ways data are accessed IOPS and throughput rates during the time-period data is gathered 4. The impact on the network subsystem in addition to the patterns observed on the array itself. Though this information isn’t used directly to model the workload, it enables an impact comparison to be made between the emulated traffic and the actual production traffic. This helps measure how representative the emulated traffic is to real-world traffic The Importance of Granularity in Workload Characterization Besides knowing what to characterize, it is important to understand the granularity of the collected data. Data granularity determines the quality of the resulting emulated workloads in terms of a good, better, or best characterization. It is important to consider: 1. 2. 3. 4. How often and how long the data is collected How completely the data represents workload elements like LUNs, volumes/mount points, shares, directories/files and LBAs. How detailed the access patterns are in terms of data and metadata protocol command coverage. How detailed the information is in terms of access patterns and request sizes. High Level Guidelines The realism of the Workload Model that can be created from the analyzed Workload Data depends heavily on the fidelity of the workload data that is provided by the storage arrays and performance monitoring tools. As there is not an industry standard for the data structure provided by various storage arrays and performance monitoring tools, and as there are different ways to arrive at the same metrics even though the data sources are different, only a high-level guideline can be provided here as a starting point, and that a deeper technical discussion must be held to define the exact data set that is needed to create a parser / tool that simplifies workload modeling. Protocol / General Description Model Quality 1 FC / iSCSI Per-LUN R/W, Block Size distribution, KPIs 2 Best FC / iSCSI Per-LUN R/W, KPIs Good FC / iSCSI Per-array R/W, KPIs Minimum NFS / SMB Per-volume R/W, Metadata buckets, block sizes, KPIs Best NFS / SMB Per-array commands AND per-volume R/W/Other, KPIs Good NFS / SMB Per-array commands OR per-volume R/W/Other, KPIs Minimum General Data interval: 1s – 1min Best General Data interval: 1min – 5min Good General Data interval: 5min – 10min Minimum Most commercial storage arrays and performance monitoring tools provide data sources that fall under the Good category. The VirtualWisdom solution from Virtual Instruments and a few commercial solutions provide data sources that fall under the Best category above. Workload Details What to Characterize Storage environment Understanding the storage environment differs for file, block and object storage. Each environment has unique characteristics that must be understood to create and map an emulated workload similar to the observed production environment. File (NAS) The Model Quality is assessed only by the Virtual Instruments Product Management team, in the context of creating realistic Workload Models that sufficiently represent the observed Production Workload Data 2 Key Performance Indicators are Throughput, IOPS, Latency 4 1
Please complete the form to gain access to this content