This was a great “put your architecture cap on” session by two well known VCDX’s, Wade Holmes and Rawlinson Rivera. Software defined <insert virtualization food group here> is all the rage these days. Be it SDN (networking), SDS (storage), SDDC (datacenter) or software defined people. Well maybe not quite at the people stage but some startup is probably working on that.
Given the explosion of SDS solutions, or those on the near horizon, you can’t put on your geek hat and just throw some new software storage product at the problem and expect good results. As an engineer myself “cool” new products always get my attention. But an IT architect has to look at SDS from a very different perspective.
This session gave an overview of the VCDX Way for SDS. I took a different approach to this session’s blog post from most other ‘quick publish’ VMworld session notes. Given the importance of SDS and the new VMware products, I’ve made this post a lot longer and tried to really capture the full breadth of the information presented by Rawlinson and Wade.
Session Introduction
How do you break down the silos in an organization? How do you align application and business requirements to storage capabilities? In the “old” days you matched up physical server attributes such as performance, high availability and performance to a specific workload. Big honking database servers, scale out web servers, or high IOPS email systems.
In the virtual era you gained flexibility, and can better match up workloads to pools of compute resources. Now it is much easier to implement various forms of high availability, scale-out performance, and greatly increase provisioning speed. But some subsystems like storage, even with tools like storage IO control, VASA, and storage DRS were blunt instruments trying to solve complex problems. Did they help? Absolutely, are they ideal? Not at all.
The final destination on the journey in this session is software defined storage (SDS). The remainder of this session covered the “VCDX Way” to SDS. This methodology enables efficient technology solution design, implementation and adoption to meet business requirements. I’ve heard from several people this week the array of storage solutions is nearly bewildering and so following the methodology can help you make your way through the SDS maze and ultimately be very successful in delivering solid solutions.
VCDX Way
- Gather business requirements
- Solution Architecture
- Engineering specifications
- Features: Availability, Manageability, Performance, Recoverability, Security
Software-Defined Storage
Software defined storage is all about automation with policy-driven storage provisioning backed by SLAs. To achieve this storage control logic is abstracted into the software layer. No longer are you tied to physical RAID sets, or using blunt instruments like a VMFS datatore to quasi match up application requirements with performance, availability, and recovery requirements.
The control plane needs to be flexible, easy to use and automatable like crazy. The presentation slide below shows Storage Management with SDS of “tomorrow”. At the top level is the policy-based management engine, better known as the control plane. Various data servers are then offered, such as replication, deduplication, security, performance, and availability. In the data plane you have the physical hardware which would be a traditional external storage array, or the new fangled JBOD scale-out storage tier.
Three Characteristics of SDS
- Policy-Driven control plane – Automated placement, balancing, data services, provisioning
- App-centric data services – Performance SLAs, recoverability, snapshots, clones, replication
- Virtualized data plane – Hypervisor-based pooling of physical storage resources
Solution Areas – Availability
Availability is probably one of the first storage properties that pops to mind for the average IT when you think about storage. RAID level and looking at the fault domain within an array (such as shelf/cage/magazine availability) are simple concepts. But those are pre-SDS concepts that force VMs to inherit the underlying datastore and physical storage characteristics. The LUN-centric operational model is an operational nightmare and the old way of attempting to meet business requirements.
If you are a vSphere administrator then technologies such as VAAI, storage IO control, storage DRS, and storage vMotion are tools in your toolbox to enable meeting application availability and performance requirements. Those tools are there today for you to take advantage of, but were only the first steps VMware took to provide a robust storage platform for vSphere. You also need to fully understand the fault domains for your storage.
Take into account node failures, disk failures, network failures, and storage processor failures. You can be assured that at some point you will have a failure and your design must accommodate it while maintaining SLAs. SDS allows the defining of fault domains on a per-VM basis. Policy based management is what makes VM-centric solutions possible.
Instead of having to define characteristics at the hardware level, you can base it on software. VM storage profiles (available today) is an example of a VM-centric QoS capability. But those are not widely used. Think about how you scale a solution and the cost. Cost constraints are huge, and limit selection. Almost nobody has an unlimited budget, you carefully need to initial capital costs, as well as future expansion and operational costs.
Solution Areas – Management
Agility and simplified management are a hallmark of SDS, enabling easy management of large scale-out solutions. The more complex a solution is, the more costly it will be over the long term to maintain. In each release of vSphere VMware has been introducing building blocks for simplified storage management.
The presenters polled the audience and asked how many were using VASA. Only a couple of people raised there hand. They acknowledged that VASA has not seen wide adoption. In the graphic below you can see VMware’s progression from a basic set of tools (e.g. VASA 1.0), to the upcoming VSAN product (VASA 1.5), to the radically new storage model of vVOLs (VASA 2.0). No release data for VVOLs was mentioned, but I would hope they come to fruition in the next major vSphere release. VSAN is a major progression in the SDS road map, and should be GA in 1H 2014.
The speakers ran through the VSAN VM provisioning process, and highlighted the simple interface and the ability to define on a per-VM level the availability, performance and recoverability characteristics you require. As stated earlier, we are now at the stage where we can provide VM-centric, not datastore or LUN centric, solutions. Each VM maintains its own unique policies in the clustered VSAN datastore.
Management is not just about storage, but about the entire cloud. Think about cloud service provisioning which is policy-based management for compute, networking and storage resources. Too many options can become complex and difficult to manage. Personally, I think VMware still has room for improvement in this area. VSAN, Virsto, vVOLS, plus the myriad of third-party SDS solutions like PernixData, give customers a lot of options but can also be confusing.
Solution Areas – Performance
Clearly storage performance is a big concern, and probably the most common reason for slow application performance in a virtualized environment. Be it VDI or databases or any other application key performance indicators are  IOPS, latency and throughput. Applications have widely varying characteristics, and understanding them is critical to matching up technologies with applications. For example, is your workload read or write intensive? What is the working set size of the data? Are the IOs random or sequential? Do you have bursty activity like VDI boot storms?
With VMware VSAN you can reserve SSD cache on a per-VM basis and tune the cache segment size to match that of the workload. These parameters are defined at the VM layer, not a lower layer, so they are matched to the specific VM workload at hand. VMware has recently introduced new technologies such as Virsto and Flash Read Cache to help address storage performance pain points. Virsto helps address the IO blender effect by serializing writes to the back-end storage, and remove the performance penalty of snapshots, among other features. The VMware VSAN solution is a scale-out solution which lets you add compute and storage node in blocks. There were several sessions at VMworld on VSAN, so I won’t into more details here.
Solution Area – Disaster Recovery
Disaster recovery is extremely important to most businesses, but is often complex to configure, test, and maintain. Solutions like SRM, which use array-based replication, are not very granular. All VMs on a particular datastore have the same recovery profile. This LUN-centric method is not flexible, and complex to manage. In contrast, future solutions based on vVOLS or other technologies enable VM-level recovery profile assignment. Technologies such as VMware NSX could enable pre-provisioning of entire networks at a DR site, to exactly match those of the production site. The combination of NSX and VM-level recovery profiles will truly revolutionize how you do DR and disaster avoidance.
Solution Area – Security
Security should be of concern in a virtual environment. One often overlooked area is security starting at the platform level by using a TPM (trusted platform module). TPM enables trusted and measured booting of ESXi. Third party solutions such as Hytrust can provide an intuitive interface to platform security and validate that ESXi servers only boot using known binaries and trusted hardware.
I make it a standard practice to always order a TPM module for every server, as they only cost a few dollars. How does this relate to SDS? Well if you use VSAN or other scale-out storage solutions, then you can use the TPM module to ensure the platform security of all unified compute and storage blocks. On the policy side, think about defining security options on a per-VM basis, such as encryption, when using vVOLs. The speakers recommended that if you work on air-gapped networks, then looking a fully converged solutions such as Nutanix or Simplivity can increase security and simplify management.
Example Scenario
At the end of this session Wade and Rawlinson quickly went through a sample SDS design scenario. In this scenario they have a rapidly growing software company, PunchingClouds Inc. They have different application tiers, some regulator compliance requirements, and short staffed with a single storage admin.
The current storage design looks like the model fibre channel SAN with redundant components. The administrator has to manage VMs at the LUN/datastore level.
At this point you need to do a full assessment of the environment. Specifications such as capacity, I/O profiles, SLAs, budget and a number of other factors need to be thoroughly documented and agreed upon by the stakeholders. Do you have databases that need high I/O? Or VDI workloads with high write/read ratios? What backup solution are they currently using?
After assessing the environment you need to work with the project stakeholders and define the business requirements and constraints. Do you need charge back? Is cost the primary constraint? Can you hire more staff to manage the solution? How much of the existing storage infrastructure must you re-use? All of these questions and more need to be thoroughly vetted.
After thorough evaluation of all available storage options, they came up with the solution design as shown in the slide below. It consists of a policy-based management framework, using two isolated VSAN data tiers, but also incorporates the existing fibre channel storage array.
Summary
The SDS offers a plethora of new ways to tackle difficult application and business requirements. There are several VMware and third-party solutions on the market, with many more on the horizon. In order to select the proper technologies, you need a methodical and repeatable process, “The VCDX Way”, to act as your guide along the SDS path. Don’t just run to the nearest and shiniest cool product on the market and just hope that it works. That’s not how an enterprise architect should approach the problem, and your customers deserve the best-matched solution as possible so that you become a trusted solution provider solving business critical needs.
Great write-up!
Well written Derek. I'm still personally waiting for software defined software :-). It's "defined" everything else so far woo why not itself.
In the penultimate slide image it says "Provide differentiated storage tiers to match application profiles”.
Would differentiated storage tiers really be a “business requirement”? Not nit-picking here as the session was very good but would the business be specifying this kind of detail?