Home‎ > ‎

Bandwidth and IOPS Limit on a Virtual Disk


Bandwidth and IOPS Limit on a Virtual Disk


OK!! Happy to be back with a small Blog Post here. Indeed, wanted to keep this for my first book, but decided to post this as my book will be delayed by couple of months.

Not sure how many of you care about Noisy neighbor issues, but people who has shared infrastructure should really care about it.

We all know that VMware has a really great option in GUI to limit the IOPS on a Virtual Disk! But will that resolve the noisy neighbor issue? Well!! Let the result speak for itself.

Explanation


Lets say you have a Datastore which has a limit of 2000 IOPS and 8MBps Bandwidth. And you have VMDK1(32K block size) from VM1 and VMDK2(4K Block Size) from VM2 on it.

Now VMDK1(VM1) has a IOPS limit of 1000 IOPS and VMDK2(VM2) has also 1000 IOPS. (Per Below Snap)

SNAP1


Note1 - Just for examples and to keep things simple, I am taking the least numbers like 1000IOPS and 8MBps as limits. (Just don't think I am too crazy!) Bandwidth limit in actual tests will +1 to +3


So now, I have a test environment where I have two RHEL VM's configured as mentioned above(With 1000 IOPS Limit per VMDK). VMDK1 from VM1 is on DatastoreA and VMDK2 from VM2 is also on DatastoreA. 

Both the RHEL machines are configured with vdbench(My favorite IO benchmarking tool). 

TESTING

Test Environment

vCenter Version - 5.5U2, 6.0U2, 6.5U1
ESXi version - 5.5U2, 6.0U2, 6.5U1
2 RHEL Servers with 50 GB Thick Provisioned Disk for testing.

Vdbench Config for VM1 and VM2

VM1(VMDK1)

sd=sd1,lun=/dev/sdb,openflags=o_direct
wd=wd1,sd=sd1,xfersize=32768,rdpct=70
rd=run1,wd=wd1,iorate=1000,interval=5,elapsed=400

VM2(VMDK2)
sd=sd1,lun=/dev/sdb,openflags=o_direct
wd=wd1,sd=sd1,xfersize=4096,rdpct=70
rd=run1,wd=wd1,iorate=999,interval=5,elapsed=400

Tool used for Testing

vdbenchYou can download the tool here - http://www.oracle.com/technetwork/server-storage/vdbench-downloads-1901681.html

Vdbench is a controlled I/O load simulation tool for storage systems. It gives control over workload parameters like I/O rate, LUN or file size, transfer size and cache hit percentage. It generates reports which are Web accessible in HTML format. It is an open source tool from Oracle, with 99 per cent of it written in Java and 1 per cent in C language. It works on all Linux and Windows operating systems. It is designed to execute workloads on a storage system.

In a simple equation, f(workload, Config) = Power + Performance. With a set of workloads defined through a configuration file, you can measure the performance of any storage array. It runs as two or more Java virtual machines. The one we invoke is the master, which takes care of parsing and reporting. The actual workload is executed by one or more slaves. A slave can run along with the master or in a different host. Vdbench can be useful for identifying data corruption on the storage system. A combination of data validation and journaling allows you to identify data corruption issues across executions. In short, it is a truly enterprise-grade storage benchmark that is free and open source.


Scenario 1 - Without Bandwidth Limit


Lets run a IO test here with the below config -

IOPS Limit (On both VM's/VMDK's) - 1000
Bandwidth Limit - NOT SET
I/O Block Size on VMDK1 a.k.a VM1 - 32k
I/O Block Size on VMDK2 a.k.a VM2 - 4k
Test Run Time - 400 seconds to 60 minutes

Results!

Below SNAP is for VM1 a.k.a VMDK1

VMDK1-32k

Below SNAP is for VM2 a.k.a VMDK2

VMDK2-4k


There!!!!! Did you see that - the VMDK1 a.k.a VM1 which was generating 32k I/O block size got most of the bandwidth and VMDK2 a.k.a VM2 which was generating 4k I/O block size suffered to get the IOPS it needs! Now that Culprit VM1(VMDK1) is CALLED a NOSIY NEIGHBOR. Well for no fault of VMDK2 too, it still had to suffer as VMDK1 was trying to get more bandwidth.

Question! Can we limit the block size generated from a OS at the Hypervisor layer? Truth is NO! You cannot do that and thats not even practical! Would you tell your customer to run only 4k or 8k I/O block size in your environment? NO!

Note2 - There is a Hypervisor Limit of 4294967294 KB/s bandwidth.

So what can we do? 

There is a Hidden Feature(I would love to call it hidden as its not available from GUI, although its really not *hidden*) called BANDWIDTH LIMIT on a Virtual Disk

Yes you heard it right. You can Set the Bandwidth Limit on a Virtual Disk by using the advanced Parameter "sched.scsi0:0.bandwidthCap" for a VMDK on SCSI Controller slot 0:0, "sched.ide0:0.bandwidthCap" for a IDE VMDK on slot IDE 0:0 and "sched.sata0:0.bandwidthCap" for SATA 0:0 slot and so on!

Note3 - We have tested on all the disk's like SCSI, IDE and SATA. Bandwidth limit works on everything.

Scenario 2 - With Bandwidth Limit


IOPS Limit (On both VM's/VMDK's) - 1000
Bandwidth Limit(On both VM's/VMDK's) - 4MBps
I/O Block Size on VMDK1 a.k.a VM1 - 32k
I/O Block Size on VMDK2 a.k.a VM2 - 4k
Test Run Time - 400 seconds to 60 minutes


Results!!


Below SNAP is for VM1 a.k.a VMDK1

32k-VMDK1

Below SNAP is for VM2 a.k.a VMDK2

4k-VMDK2
That proves that we can tackle NOISY NEIGHBOR issues by having a limit on both IOPS and Bandwidth.

mClock Scheduler


AND IT's NOT OVER YET! (I like to keep it bold as this is very important)

In-order to have bandwidth and IOPS both work perfectly, you need to disable mclock scheduler. Yes, again you heard it right - You need to DISABLE MCLOCK SCHEDULER on the ESXi hosts.

The bandwidth limitations will not work with mclock scheduler. I do not know what to say here! A BUG!? Or a feature? Well! It is what it is.

Below is the VMware link which confirms that Bandwidth and IOPS limitation does not work when mclock scheduler is enabled.
https://kb.vmware.com/s/article/2059192

Questions

What is a mclock Scheduler?


mclock is a IO Scheduler. mclock scheduler was introduced in vSphere 5.5. mClock provides support for IO reservation on top of existing scheduler that provides shares and limit support. mClock is a tag-based IO scheduler to achieve resource allocation based on user's QoS requirements with respect to reservation (throughput lower bound), limit (throughput higher bound), and shares. Every IO request is assigned three tags, reservation-based tag, limit-based tag, and share-based tag when it is added to issue queue. The IO scheduler selects requests for dispatch from different VMs by comparing their tags and deciding weather to use reservation tags or shares based tags.

What happens when mClock IO Scheduler is disabled?


Disabling mClock reverts back to default IO scheduler SFQ.

Does disabling or enabling mclock need a host reboot?


No, Disabling mClock does not need a reboot.

Why do we care about bandwidth? Just having IOPS limitation will not be sufficient?


Since we cannot limit the I/O block size with in the Guest OS and all the storage devices comes with a limit for bandwidth and also IOPS. It would be important to have a limit on both.

What if I am using shares, reservations in my environment? Can I still disable mClock scheduler?

 
If you are using shares and reservations, DO NOT DISABLE mClock Scheduler.

Do you need to reboot/reset  the Virtual Machines once IOPS and Bandwidth limit is set?

VMware best practices say that you need to Power OFF(https://kb.vmware.com/s/article/1038241) the Virtual Machine for IOPS limit to take effect. But we have seen that IOPS works fine without powering OFF the Virtual Machine as well. BUT for Bandwidth limit to take effect - you need to reboot/reset the Virtual machine if that limit is set while the VM is being powered ON.

 
Last but the not the least! I would like to thank my colleagues Mike Karcz and Ganesh Palanisamy for helping me out during the testing scenario's. This was not very straight forward scenario to test with :)
Comments