Today I was called in to check what seemed a SQL Server performance problem, in one client-owned datacenter in Asia. I will share with you some details, in order to help you better understand the context and how it was finally concluded to the solution.

The context
The application load is distributed among 3 Microsoft SQL Servers that sit in the back, waiting patiently for the requests to respond to, stoically. Not all of them faced the performance problem, but only one. That host also holds a central database, which does some proprietary work for all the rest.
In the first place, it seems that the IT at the datacenter could not locate the problem so it propagated to the local software vendor, since all seem to check-out fine with the infrastructure.
The ISV checked what seemed to be like a disk performance issue because of some metrics on the performance monitor of the server, and then called me to verify the findings.
The findings
Compared to the other two SQL Server hosts, the Disk Transfers/sec was higher like almost doubled higher. OK… so what? This means that this host performs more IOPS than the other two. Is this a problem? It could be if the number was high but this was low, ~160 (snapshot information). Even for normal SAS drives, this is sustainable.
Hint: do not check the PhysicalDisk section of metrics rather than the LogicalDisk since it’s kind of hard to have something in a physical drive, at least most of the times, it is a RAID subsystem or something equivalent made up of many physical drives that mount up to a volume, thus making a “logical” drive.
Anyway, we connected to the SQL host to check a couple of things. As we started the check, we opened the task manager to check on the current CPU load and memory consumption. By the way, this is a VM running on Hyper-V. So, we noticed the following: 8 sockets / 16 Cores, RAM was OK, network activity was relatively quiet. Hmm… it almost all checked out apart from one thing.
The CPU core load
It was evident that although overall the average CPU load approx. 30% that something was wrong and it was not the disks but a skew in the load balance of the CPU load per core.
Then I clicked on the performance monitor and saw that SQL Server’s tempdb system database was reporting huge latency something around 250ms.
So, two things to note:
- only 4 of the cores were consistently above 95% CPU and
- tempdb was dragging its feet all over.
Next steps
I had in mind a couple of things but I said to start with the easy ones:
- check the SQL instance properties for CPU affinity and SQL Server version
- check tempdb properties
Affinity was at default settings, meaning it could make use of all cores assigned to the host. CHECK!
SQL Server instance was SQL Server 2014 Standard Edition. CHECK!
tempdb had 8 files. CHECK!
Well… almost “CHECK!”
The solution
From the above I kept two (2) things that do not match a proper configuration:
1. Based on the features supported for SQL Server 2014 Standard edition, found in this Microsoft Docs page Features Supported by the Editions of SQL Server 2014, this edition is “Limited to lesser of 4 Sockets or 16 cores”.
“OK, and what’s your point? We have 16 cores.”, one could argue.
Well, 16cores we are OK, but if one pays attention closely to the Task Manager, as well as the SQL Server log but who reads that {sarcasm}, it reads “8 sockets”. That means that the other 4 sockets are ignored by SQL Server because of licensing!
That is why only 4 sockets were getting hammered while the rest were sleeping. Waste of resources.
2. Since only 4 sockets were available, tempdb should start with 4 files, and not 8 files, which evidently led to contention and disk latency. You can read more in this Microsoft KB: Recommendations to reduce allocation contention in SQL Server tempdb database.
Conclusion
So the first thing to make sure is that you don’t jump to conclusions unless you have all the facts. Most of the time, starting with the basics can give you hints on what is wrong without bringing in 3rd party tools or the heavy artillery. Read the log of SQL Server, if something is happening on your SQL, start from there and then move on a layered approach from the OS configuration to SQL Server configuration and act accordingly. Index fragmentation is likely not the culprit but the easy victim.
This fast ‘n’ dirty investigation took less than 15′ to complete.
Another day, another happy customer. 🙂
Cheers!
Comments (0)