This was a good session presented by Proximal Data, focusing on using flash-based cache in your VMware environment. They have a product called AutoCache that boosts storage performance for VMware environments. Sounds like an interesting product. Session went by very fast, but here are some of the notes I took:
Proximal Data Company profile:
- Vision: I/O Intelligence in the hypervisor is a universal need
- Near term value is in making use of flash in virtualization
Overview: Proximal Data AutoCache
- I/O caching software for ESXi 4.x to 5.x
- Up to 2-3x VM density improvement
- Business critical apps accelerated
- Transparent to ESXi value like vMotion, DRS, etc.
- Converts a modest amount of flash
- Simple to deploy: Single “VIB” installed on each ESXi host
- vCenter plug-in: Caching effectiveness, cache utilization by guest VM
Case Study
- Month end processing report now takes 6.5 hours instead of 36.5 hours
- Eliminated need to vMotion other guests off during month end processing
- Tripled VM density on database servers
- Decreased SAS analytics report time by 85%
Flash – The Good
- Much faster than disks for random I/O – Sequential I/O performance difference is not as dramatic
- Cheaper than RAM
Flash – The Bad
- More expensive than spinning disks
- Slower than RAM
- Asymmetric read/write characteristics – Reads are much faster, writes cause a lot of wear
- Wears out/limited lifespan
Flash – The Ugly
- Must be erased to be written
- Erase granularity is not the write granularity
- Typical write granularity is 512 bytes, typical erase granularity is 32K, 64K or 128K
- Write/erase characteristics have lead to complexity (Flash translation layers, fragmentation, garbage collection, write amplification)
Flash – Not all are equal
- Steady state performance of controllers – as much as 50% performance loss in steady state vs new (stay with Intel, Micron, LSI, Sandforce, not third-tier)
- MLC is much cheaper and higher density and is the future, but not as robust and wear out faster than SLC
Flash – Ideal Usage
- Random I/O requests – greatest performance gains
- A lot more reads than writes
- Write in large chunks
- Avoid small writes to same logical locations
- If data is critical use SLC
- Read caching is an ideal use of flash
Caching is Everywhere
- Disks have caches, array/RAID controllers, HBAs, OS, application
Caching Basics
- Working set of data is likely a subset of the data
- Caches are used to manage the “working set” in a resouce that is smaller, faster and more costly than the main storage resource
- Cache works best when data flows from a slower device to a faster one
- Read caches primarily help read bound systems
- Write-back cache primarily help bursty environments
- Caches will continue to exist in all layers of the infrastructure
Flash in a Hypervisor
- Most caching algorithms developed for RAM caches – No consideration for device asymmetry
- Hypervisors have very dynamic I/O patterns
- Hypervisors are I/O blenders
- Must consider shared environment (latency, allocations, etc.)
Complications of Write-Back Caching
- Writes from VMs fill the cache
- Cache ultimately flushes to disk
- Cache over runs when disk flushes can’t keep up
- If you are truly write-bound, a cache will not help
- Write-back cache handles write bursts and benchmarks well but is not a panacea
Disk Coherency
- Cache flushes MUST preserve write ordering to preserve disk coherency
- Hardware copy must flush caches
- Hardware snapshots do not reflect current system state without a cache flush
Evaluating Caching
- Results are entirely workload dependent
- Benchmarks are terrible for characterizing devices. You can make IOmeter say anything you want.
- Run your real storage configuration for meaningful results
- Beware of caching claims of 100s or 1000x times improvements
Flash Caching Perspective
- Flash will be pervasive in the enterprise
- Chose the right amount (as little as 200GB can provide a large boost)
- The closer the cache to the processors, the better the performance