RIMSS SQL Server Virtualization Best Practices
December 12, 2017 | Ken Leoni
Understanding the fundamentals of how CPU, memory, disk, and network interact in a virtual infrastructure will go a long way towards helping IT properly allocate virtual resources and make the most effective use of SQL licensing.
One of the driving forces behind SQL Server virtualization has been IT's ability to leverage the scaling and resiliency inherent with virtualized infrastructures while simultaneously reducing hardware costs. Additionally, a well-planned virtualization initiative can consolidate SQL servers and reduce the number of required SQL licenses.
The path of least resistance might seem to be configuring a VM that hosts SQL with as many processors as is allowed in SQL licensing - i.e. configuring SQL Server 2016 Standard edition with its maximum of 4 sockets/24 cores, and then deploying as many database instances as needed.
However: overcommitting a virtual SQL Server should be avoided. SQL Server is optimized for performance at the expense of resource utilization, and will use as much of its allocated resource capacity as possible. Allocating more resources than SQL needs will not appreciably improve SQL performance and will mean that resources are not available for other VMs.
Getting the best overall performance from both virtualization and SQL means keeping in mind SQL specific criteria for CPU, memory, disk and network configurations when creating VMs.
CPU Considerations
Resource Allocation
SQL Server is resource intensive and should not be forced to contend for resources with other VMs, so you’ll want to ensure that the CPU resources on the host are not overallocated. An increase in wait time for CPU due to overallocation can cause SQL latency issues that in turn can have a cascading effect on all the applications dependent on SQL Server.
Hyperthreading
In addition, do not take hyperthreading into account when calculating the number of available vCPUs (virtual CPUs). Hyperthreading adds a second processor thread per CPU that is available for use when the CPU would otherwise be idle. While this can increase overall processor performance by decreasing processor idle time, it is possible that two threads competing for a processor's resources will adversely affect SQL performance.
CPU Ready (VMware) or CPU Wait Time Per Dispatch (Hyper-V)
If you do overcommit CPU resources on a host, the CPU Ready or CPU Wait Time Per Dispatch metrics will measure the time that a VM has to wait for available CPU resources from the host.
The metrics show the level of CPU overcommitment, where the current demand for vCPUs exceeds the available supply and causes VMs to wait for vCPUs to become available.
For example, if you’re operating a host with a total capacity of 16 vCPU’s and they are all busy executing for the currently running VMs, while at the same time you have a virtual SQL Server configured with 8 vCPUs that is waiting for CPU resources, then the server must wait for 8 vCPU’s to free up before it has its turn to access the host CPU resources. The hypervisor won’t allow a multi-processor VM to operate until the number of available vCPUs match what is allocated to the VM.
Non Uniform Memory Architecture (NUMA)
NUMA is a computer architecture designed to improve a processor's access to memory by grouping memory and processors into nodes. Memory that is in the same node as a processor is "local" and can be accessed with lower latency than the "remote" or "foreign" memory on another node.
The most straight-forward scenario for allocating resources for a SQL VM is if its resource requirements fit within the resources available in a NUMA node. For example: a host has 4 sockets/8 cores per socket (for 32 total cores) and 256 GB RAM, providing 4 NUMA nodes of 8 cores and 64 GB RAM each. In this configuration, you can create a VM with up to 8 cores and 64 GB RAM within one NUMA node.
If you need to create a SQL VM with more resources than exist in a single NUMA node, you can use virtual NUMA. Virtual NUMA is supported on SQL2005 and higher, and on Windows 2012 and higher, and will present an emulated NUMA architecture to the VM that reflects the underlying hardware. In our previous example of an 8 core/64 GB NUMA node, if the SQL VM required 8 cores and 96 GB RAM, then it would be configured across 2 NUMA nodes, with 4 vCPU per node, and 48 GB memory per node.
Guidelines for configuring VMs with virtual NUMA include assigning an even number of vCPUs to a VM and distributing them evenly across the physical NUMA nodes. However, there are some potential problems with VMs that span NUMA nodes:
Enabling vCPU Host Add in VMware, or Dynamic Memory in Hyper V will disable virtual NUMA. VMs that span NUMA nodes without virtual NUMA will suffer performance degradation when their CPUs access memory on remote nodes.
If a VM using virtual NUMA migrates to a new host and the underlying NUMA architecture on the new host differs, the VM's performance could suffer.
SQL Server Virtualization CPU Best Practice
1. If the total number of vCPUs allocated to all the VMs on a host are too high, the resulting symptoms would be high CPU Ready or CPU Wait Time Per Dispatch and low windows CPU busy %. CPU Ready (VMware) or CPU Wait Time Per Dispatch (Hyper-V) should be < 5%
2. When building SQL Server VMs that span multiple NUMA nodes, enable virtual NUMA and assign CPU and Memory so that they can be evenly distributed among physical NUMA nodes.
3. If virtual NUMA is disabled, create VMs with memory and vCPU resources that fit within a NUMA node. Performance may be better for multiple smaller SQL VMs that fit within a NUMA node than for larger SQL VMs that span NUMA nodes without virtual NUMA.
4. For SQL VMs, do not use hyperthreading to calculate the number of available vCPU and do not overallocate CPU resources.
Memory Considerations
Host Memory
SQL Server consumes lots of memory! The good news, when configuring and purchasing virtualization hardware - memory is one of the least expensive components. However, if you are under budgetary constraints when purchasing your virtualization hosts, you can get the most out of later upgrades by configuring your server with larger memory modules in fewer slots, providing open slots for additional memory modules later. Check the server specs carefully to determine compatible memory and how new modules need to be balanced across memory slots.
Memory Overcommitment
Virtualization allows for the overcommitment of memory in which the total amount of memory allocated for use by the VMs exceeds the physical memory of the host. This works as long as the memory actively used across all VMs does not approach the physical limit of the hosts. When that happens, memory reclamation techniques reclaim memory from the VMs, and this can cause severe performance degradation, especially for memory sensitive applications like SQL. Therefore - it is strongly recommended that you do not overcommit memory on hosts running SQL VMs.
To get the most use out of host memory without overcommitment it is important to determine how much memory is actively being used by all the VMs and to adjust the size of their memory allocations appropriately. This can be done by collecting memory usage metrics for existing VMs and using capacity planning to model memory on the host. Ideally you will want to try out different scenarios as part of your capacity planning exercise so that you can see happens when additional VMs with defined workloads are deployed on the host.
Memory Reservation (VMware)
In VMware, VMs are not given the memory they are allocated when they are powered on, but rather are assigned memory as it is used. Setting up a memory reservation for a VM sets aside the specified amount of memory for the exclusive use of the VM. This is often used to ensure that memory reclamation due to overcommitment doesn't take memory from a VM, but it can be used to ensure that a VM will be guaranteed a minimum amount of memory before it can be powered on.
SQL Server Virtualization Memory Best Practice
1. Do not overcommit memory on hosts running SQL VMs
2. Use capacity planning to determine how much memory is in use and adjust VM memory allocations based on workload.
3. In VMware, memory reservations can ensure that a SQL VM has a fixed amount of memory when it powers on. Reservations are not required to ensure minimum memory use in Hyper-V.
Disk Considerations
SQL Databases are inherently I/O bound, and using low latency, high throughput storage can lower the risk of I/O performance bottlenecks. The storage arrays used for VMs typically have the increased IO speed, capacity and reliability needed for standalone SQL servers.
While the underlying arrays are abstracted into virtual disks for the VMs, the types of drives in the array and their RAID configuration influence their performance characteristics. This matters because SQL writes to several different types of files using different access patterns - for example, transaction logs are sequential writes verses random reads/writes for data files. Separating different access patterns onto different storage arrays can speed up SQL I/O.
SQL Server Virtualization Disk Best Practice
1. Use files hosted on different storage arrays
Putting logs and databases on files in separate storage arrays allows the logs to be written independently from database updates, and scheduling can be optimized to minimize latency.
2. Solid State Drives vs Traditional Hard Drives
Solid state drives (SSDs), especially drives connected via NVMe, provide I/O rates that are orders of magnitude greater than traditional hard drives, especially for random reads/write. SSD prices are becoming more affordable and are certainly worth close consideration. Evaluate your workload to determine if you need the I/O rates provided by SSD, if fast traditional hard drives will meet your needs, or if a combination of SSD and hard drives would work best.
3. Load balance across storage devices
Keep an eye on the I/O rate on your storage devices to make sure that they can accommodate the I/O load over all the VMs they are supporting. If necessary, move files to different storage devices to balance the load.
4. Redundancy
Using drives configured in redundant arrays can compensate for a disk failure, but redundancy itself can incur a performance penalty. Microsoft recommends RAID 10 (mirrored striped sets) for disks with high read and write rates, while RAID 5 (striped sets with parity) is suitable for databases with read-only data.
Network Considerations
Configuring the network for SQL VMs needs to consider not only I/O associated with the SQL application, but also workloads for VM maintenance - that is, failover, load balancing, migration, etc.
SQL Server Virtualization Network Best Practice
1. Paravirtualized network adapters, e.g. VMXNET3 in VMware, offer better performance for VMs than emulated adapters.
2. Jumbo frames can transfer data packets up to 9000 bytes vs the standard ethernet frame size of 1500 bytes, and should be enabled on network channels used for storage traffic (e.g. iSCSI) or VM migration. Using a smaller number of larger data transmissions can increase network performance and reduce CPU load, but needs to be enabled on all network interfaces between the endpoints.
3. Use multinic vMotion to speed up vMotion for high memory VMs.
Conclusion
Sizing SQL Servers in a virtualized infrastructure is a very different exercise compared to sizing in a physical infrastructure. It is important for SQL DBAs to work closely with IT Operations teams as operations has the best understanding of the physical attributes and performance of host, storage, and network resources and how differences between physical and virtualized resources affect server performance. However, IT Operations needs input from DBAs to ensure they know the resources required by SQL Server for it to function properly.
Ultimately the key is to understand the capabilities and physical limitations of the virtual infrastructure. Only then can IT properly assign the resources to support workloads for both SQL Server and any other VMs on the same host:
1. Baseline SQL Server workloads to determine resource usage for CPU, memory, disk, and network, and use that as a basis for assigning resources to a SQL VM.
2. View resource usage from a host level, as aggregate usage over all VMs. Try to eliminate resource contention and balance resource usage over all available host resources.
3. Recognize that not all SQL Servers are equal and that some will have different availability and recovery requirements, affecting how and where your VMs can be deployed