Home‎ > ‎

Storage DRS with vSphere 6.0

Storage Distributed Resource Scheduler (SDRS)

 SDRS is a vSphere feature that allows VMware administrators to apply Distributed Resources rules to a storage in similar manner to the way vSphere manages CPU, and Memory Resources in DRS. Independent datastores are grouped together and placed under SDRS control to simplify virtual disk management and improve storage resource utilization in vSphere environments.

  • SDRS relies upon a new storage object called Datastore Cluster.
  • An SDRS cluster is configured by adding existing VMFS or NFS datastores, however – each cluster must contain either NFS or VMFS volumes and not both in the same cluster. Clusters are resized quickly by adding or removing datastores through vCenter SDRS management.
  • Datastore clusters can include LUN’s from multiple VNX systems, although this is not recommended.  However, VAAI only works with LUN’s accessed from the same storage system. The performance of Storage vMotion is impacted due to lack of VAAI support if LUN’s reside on different systems.

What Does Storage DRS Solve?

  • Without Storage DRS

  • Manually identify the datastore with the most disk space and lowest latency.

  • Manually validate which virtual machines are placed on the datastore and ensure there are no conflicts.

  • With Storage DRS

  • Automatic selection of the best placement for the VM.

  • Advanced balancing mechanism to avoid storage performance bottlenecks or “out of space” problems.

  • VM or VMDK Affinity rules

 

What Does Storage DRS Provide?

  •  Storage DRS Provides the following

  1. Initial Placement of VMs and VMDKS based on available space and I/O capacity.

  2. Load balancing between datastores in a datastore cluster  via Storage vMotion based on storage space utilization.

  3. Load Balancing via Storage vMotion based on I/O metrics, i.e latency.

  4. Storage DRS has Affinity/Anti-Affinity Rules for VM’s and VMDKs

  5. VMDK Affinity – Keep a VM’s VMDKs together on the same datastore. This is the default affinity rule.

  6. VMDK Anti-Affinity -  Keep a VM’s VMDKs separate on different datastores.

  7. Virtual Machine Anti-Affinity – Keep VM’s separate on different datastores

  8. Affinity rules cannot be violated during the normal operations


Datastore Cluster

  • An integral part of SDRS is to create a group of datastores called a datastore cluster.

  • Datastore Cluster without Storage DRS – Simply a group of datastores.

  • Datastore Cluster with Storage DRS – Load Balancing domain similar to a DRS Cluster.

  • A Datastore Cluster, without SDRS is just a datastore folder. It is the functionality provided by SDRS which makes it more than just a folder.


Storage DRS Operations

Initial Placement

  •  Initial Placement – VM/VMDK create/clone/relocate.
  • Select a datastore cluster rather than an individual datastore and let SDRS choose the appropriate datastore.

  • Storage DRS selects datastore based on space utilization and I/O load.

  • By default, all VMDKs of a VM will be placed on the same datastore within a datastore cluster (VMDK affinity Rule).


Load Balancing 

 

Load Balancing – SDRS triggers on space usage and latency threshold.

  • Algorithm makes migration recommendations when I/O response time and/or space utilization thresholds have been exceeded.

  • Space utilization statistics are constantly gathered by vCenter, default threshold is 80%

  • I/O load trend is currently evaluated every 8 hours based on a past day history, default threshold is 15ms.

  • Storage DRS will do a cost/benifit analysis.

  • For  I/O load balancing Storage DRS leverages Storage I/O Control functionality.



Setting Thresholds



Datastore Maintenance Mode

  •  Evacuates all VM’s and VMDKs from selected datastore.

  • Note that this action will not move VM Templates.

  • Currently, SDRS only handles registered VM’s.


Rules


Scheduling



So what Does It Look Like?

Initial Placement


Load Balancing

 

  • The Storage DRS tab will show “Utilization before” and “after”.

  • There’s always the option to override the recommendations.


What does SDRS monitor

SDRS monitors the capacity and response time of each datastore within the cluster. It applies policy rules to determine virtual machine initial placement and relocation within the clustered datastores.

Virtual machine placement simplifies resource planning, which has traditionally required performance monitoring and analysis. Instead of running tools to identify hot spots and perform manual migrations we can create an SDRS cluster.

Relocation moves the virtual machine from the existing datastore to one of the other datastores in the cluster. SDRS relocation recommendations can be configured for manual or automated Execution.

When does SDRS makes recommendation for VM relocation

 

DRS monitors available capacity and optionally, device latency for each datastore within the cluster. SDRS makes recommendations for VM relocation when –

  • An individual datastore exceeds its defined threshold.

  • An change occurs in the environment.

  • The administrator selects the SDRS button.

  • A capacity or service level imbalance exists between the datastore where the VM resides and another datastore in the cluster.

Storage DRS is not meant to be highly reactive solution. It can be tuned for aggressive relocations, but the relocation policy required 8-24 hours of activity. SDRS continuously collects datastore capacity and optionally I/O latency Information.

 SDRS policies

  • No Automation :  Presents a recommendation each time a VM relocation is triggered.
  • Fully Automated: Performs initial placements and VM relocation without user intervention.


SDRS Policy Metrics

  •  Utilized space : The amount of space consumed within a datastore. The default value for this parameter is 80%.
  • I/O Latency : The datastore response time measured in m/s. The default value is 15 m/s.

  • Imbalance timer value : This values defines the interval for applying the DRS policy to the datastore Cluster. The default value is 8 hours.

    EMC VNX Recommendations for SDRS

  • Use LUN’s of equal size and storage type.

  • Add LUNs in pairs of distribute LUN ownership between the VNX storage processors.

  • Disable I/O metrics when using FAST VP pool LUN’s.

  • Configure the migration policy to manual mode until you have assessed the environment for a period of time.

  • Assign multiple Storage vMotion connections to reduce migration times.

  • Do not use SDRS with datastore LUNs that are protected with VNX synchronous replication technologies.

    Supported SDRS LUN configurations



 vSphere 6.0 Storage Features: Storage DRS and SIOC

1. Deduplication Interoperability

 In the white paper mentioned above, one of the items it discusses is using Storage DRS in conjunction with deduplication. It states that you must determine whether the deduplication process will be as efficient—that is, able to deduplicate the same amount of data—after the migration as it was before the migration. There is always a chance that this is not the case, and this might be a reason not to apply a recommendation to migrate a virtual machine with a large virtual disk.

If Storage DRS moves a virtual disk from datastore A to datastore B, and the datastores share a common backing pool for deduplication, it may simply inflate the virtual disk contents at datastore A and then re-index it again at datastore B without any real impact for the actual space usage.

The main issue for Storage DRS is that a datastore appears to store more data than it has capacity for. How does Storage DRS do placement on this datastore?

In Storage DRS in vSphere 6.0, VASA 2.0 now exposes if a datastore is being deduplicated and identify whether one or more datastores share common backing pools for deduplication. These enhancements enable Storage DRS to avoid moving VMs between datastores that are not in the same deduplication pool. However it does allow Storage DRS to manage logical space while keeping virtual disks in the same dedupe pool.

2. Thin Provisioned Datastore Interoperability

Let’s look once more at what the white paper says about thin provisioned datastores. Storage DRS by itself does not detect whether the LUN is thin or thick provisioned; it detects only the logical size of the LUN. However this logical LUN size could be much larger than the actual available capacity of the backing pool of storage on the storage array.

In previous versions of vSphere, Storage DRS leveraged the VMware vSphere APIs – Storage Awareness (VASA) thin-provisioning threshold. If the datastore exceeds the thin-provisioning threshold of 75 percent, VASA raises the thin-provisioning alarm. This causes Storage DRS to mark the datastore and prevent any virtual disk deployment or migration to that datastore to avoid running out of capacity.

However this did not address the situation where multiple datastores could be backed by the same pool of storage capacity on the array. In Storage DRS in vSphere 6.0, using VASA 2.0, the following changes were made to thin provisioned datastores interop:

  • Discover the common backing pool being shared by multiple datastores
  • Report the available capacity in the common backing pool

This allows Storage DRS to avoid migrating VMs between two thin provisioned datastores that share the same backing pool. Knowing the available capacity allows Storage DRS to make recommendations based on the actual available space in the shared backing pool rather than the reported capacity of the datastore (which may be larger).

Storage DRS can also provide remediation when the free space in the backing pool is running out by moving VMs away from datastores sharing the same common backing pool.

3. Array-based auto-tiering Interoperability

The interoperability white paper states that by default, Storage DRS is invoked every 8 hours and requires performance data captured over more than a 16 hour period to generate I/O load balancing decisions. However there are multiple storage vendors offering auto-tiering storage arrays, which move hot and cold chunks of data between the different pools of storage. Each array uses different time cycles to collect and analyze workload before moving LUN segments. Some auto-tiering solutions move chunks based on real-time workload; other arrays move chunks after collecting performance data for 24 hours. The misalignment of Storage DRS invocation and auto-tiering algorithm cycles makes it unpredictable when LUN segments might be moved, potentially conflicting with Storage DRS recommendations.

In Storage DRS in vSphere 6.0, we have made changes to array-based auto-tiering datastores interop so that these datastore with auto-tiering can now be identified via VASA 2.0 and treated differently these datastores differently for performance modeling purposes.

4. Site Recovery Manager (SRM) and vSphere Replication (vR) Interoperability

These have been known limitation for some time, and has been a top ask from our customers. I wrote about SRM and SDRS interop issues on the vSphere Storage blog back in the day. I also wrote a post about vSphere Replication interop issues as well. Now, vSphere 6.0 fixes the interop issues and you can now use Storage DRS and SRM/vR together.

Storage DRS recommendations are now also inline with replication policies. For instance some datastore might have asynchronous replication and others may have synchronous replication. It was possible that a VM was moved to a datastore with an incorrect replication type. Storage DRS will now make sure that a VM is placed/balanced on a datastore with the same policy. If there is a recommendation to move a VM to another datastore that does not have the same policy (e.g. maintenance mode), Storage DRS will alert that the datastore that the VM is being moved to may result is (temporary) loss of replication.

In the case of vSphere Replication, replica disks are instantiated on the secondary site. Storage DRS now understands the space usage of replica disks, and Storage DRS can be used for the replica disks on the secondary site. Previously, Storage DRS did not recognize these files. These can now be balanced in the same way as standard VM files.

5. Fix to limit the number of concurrent Storage vMotions

Storage DRS has some hard limits for concurrent Storage vMotions:

  • Maximum number of Storage vMotions for I/O load balancing (default:3, max:10)
  • Maximum number of moves per host (default:8, max:unlimited)

These limits did not appear to be adhered to, and we received many reports of very many Storage vMotion operations running concurrently when Storage DRS was enabled. This had a negative effect on the overall performance of the datastore in question. In vSphere 6.0, both of these settings are now honored, and customers have full control over the number of concurrent Storage vMotions that are initiated.

6. Storage DRS & SIOC support for IOPS reservations

IOPS reservations are something new, introduced with the mClock scheduler in vSphere 5.5. Previously, we could only set limits and shares on a VMDK. With the new mclock scheduler, we can also set an IOPS reservation. SIOC and Storage DRS both honor IOPS reservations. The I/O injector mechanism, previously used to automatically determine the latency threshold of a given datastore, has been enhanced to also determine the IOPS (4K, random read) of a given datastore. Storage DRS uses this information to determine the best datastore for initial placed to satisfy a VM’s IOPS reservations. It also uses it to do on-going load-balancing if there is an IOPS reservation violation. A nice new feature.

You may have notices that many of these enhancements to Storage DRS require VASA 2.0.  The storage vendor will need to provide an updated VASA provider and the administrator will need to register a VASA Provider for the datastore before SDRS will perform the checks outlined above.

Comments