Essential Insights into the Modern Virtualized Data Center

Executive summary his document is addressed to Infrastructure Managers (looking after mission-critical applications) responsible for monitoring SLAs (Service Level Agreements) for virtualised infrastructure; their staff; and (probably executive summary only) associated business managers. The nub of the business issue we spotlight is that performance affects infrastructure risk just as much as availability (probably more so) but vendors of virtualised infrastructure platforms and services (and even an in-house technology provider) are loath to provide effective performance SLAs. In fact, even if they do provide performance information, can it be relied on to be unbiased? And, is it siloed rather than holistic? In the increasingly virtualised world, in order to deliver services reliably, companies need seamless access to performance metrics across their entire infrastructure, whether on-premises or in private/ hybrid/public cloud. We recognise that many organisations are still using on-premises legacy infrastructure; but we believe that the trend is firmly towards virtualised infrastructure and Cloud-enabled interfaces, even if a business case for Cloud cannot be made at present, in a particular situation. In any case, the availability of robust performance metrics is important, even for legacy infrastructure. An essential enabler for the journey towards Cloud is Big Software, which means the use of software-defined, virtualised infrastructure. Or, in other words, general-purpose computers configured to act as different hardware devices by the software programs running on them. Deployment of tools can be programmed, the metrics they deliver can be programmed, and the metrics themselves can be measured (usage metrics are key; if a metric isn’t being used to benefit the business, it is a wasteful overhead). Big Software is inclusive – if a less sophisticated start-up, say, is thinking in terms of appliances, Big Software lets you place a software-defined tool onto T 3 bare metal – and you have a softwaredefined appliance which can evolve into something more sophisticated if the business needs it. Big Software also promotes end-to-end traceability, so whole-system dashboards can present metrics and alerts in purely business terms and still support drill down to, say, packets on a fibre-channel link for the technicians in the organisation. Performance starts with data moving between storage (whether virtualised or not) and processing. If that is too slow, performance tuning higher up the stack can’t help much. What is needed then, in part, is technology- and vendor-agnostic performance monitoring, ideally at the ethernet and fibre-channel fabric level, so no vendor can withhold information. The tool (or integrated tools) providing such monitoring should, ideally: no load on the system – • Place so it should be agent-less and “out of band” (an agent is a piece of software running in the background on a device, collecting management information; out of band communications do not share bandwidth with the business workload); in real-time, with the lowest • Operate possible latency; highly secure, so it should operate • Beon packet headers only, not the payload and offer tested/certified error-proof security; usable by all stakeholders, so with • Berole-based dashboard access and a drill-down interface (from traffic light to engineering detail); exception reporting in • Provide terms of variations from a “normal” workload pattern; and learn from the fixing • Document of problems (Machine learning); “root cause analysis”, and avoid • Offer error message storms; part of an active ecosystem/ • Bepartnership program, which reduces risk for its customers and ensures holistic integration with other monitoring tools. A Bloor Spotlight Paper “ Companies need seamless access to performance metrics across their entire infrastructure, whether on-premises or in private/hybrid/ public cloud. ” “ Performance metrics should be aggregated and prioritised in business-level dashboards, with “traffic light” service indicators… role-based access… and drill-down. ” Such low-level infrastructure monitoring, though necessary, is not sufficient. Agents still have a place, perhaps addressing specific problems. Once you’ve addressed storage issues and bottlenecks, other issues and bottlenecks can appear elsewhere, so it is important that companies take a holistic approach to monitoring performance SLAs throughout the stack. So, data infrastructure monitoring tools, which deal with the issue this Spotlight focuses on, must have APIs for integration with other, more specialist, software-defined tools that allow a holistic management of performance. It is also important that performance metrics, even from the low-level infrastructure, can be related back to real business issues. There is little point in devoting resources to fixing a potential infrastructure bottleneck which isn’t actually impacting the business, if the customer service is being immediately impacted by issues further up the stack. In other words, it is hard for the business to prioritise issues if they are only described in technical terms, and (usually) the business should be prioritising issues in terms of business impact. In practice, this means that performance metrics should be aggregated and prioritised in businesslevel dashboards, with “traffic light” service indicators (OK-warning-serious); with role-based access throughout the company; and with drill-down to technical metrics. This is sometimes called “Application-centric Infrastructure Performance Management” (IPM). © 2018 Bloor These days, you can find vendoragnostic tools to deliver the required performance metrics and SLAs, but you generally have to look for them. They are often not available by default on a platform. 4
Please complete the form to gain access to this content