What are the best tools to collect and baseline wait statistics? Should you write your own? Watch the 18 minute video or read the episode transcript below.
Dear SQL DBA…
I am getting into performance troubleshooting on SQL Server. I hear you talk about wait stats a lot, and how important they are to the process of troubleshooting.
What ways are there to check the wait stats for a given time? How would you go about creating a baseline for a system you have just taken over?
Waiting on Stats
I do love wait stats!
If you listened to the performance tuning methodology I outlined in an earlier episode, you saw how important I think wait stats are for troubleshooting performance.
If you missed that episode, it’s called Lost in Performance Tuning. (I’ve got an outline of the discussion in the blog post, as always.)
If I’m going to manage the system for a long time, I would buy a vendor tool to baseline wait stats
SQL Server is a mature database. There’s a lot of vendors out there who have tapped into the need to track and baseline wait stats.
They’ve honed tools to:
- Collect the waitstats in a lightweight manner
- Store them in a repository and groom the data over time, so it doesn’t explode
- Build reports for you to see big picture data
- Build fancy UIs for you to zoom in on a point in time
- Find queries that were running when the waits were occurring
Example vendors – I’m listing three that I’ve used before to solve problems:
SQL Sentry Performance Advisor, Idera Diagnostic Manager, Dell Software (formerly Quest) Spotlight on SQL Server Enterprise
I haven’t listed these in order of preference. I know people who swear by each of them.
Since monitoring systems for SQL Server are pretty mature, the differences are in the details.
Details can be very important, of course– research and trials will help you find which one is the best fit for your team, processes, and applications.
Should DBAs write their own tools?
There are some people out there who think you should roll your own tools. That it makes you more legitimate.
I’ve written a lot of my own tools. It takes a lot of time.
To get feature parity with what vendors are offering, we’re talking years of investment.
It’s really easy to negatively impact performance with your tools. Tool vendors work very hard to avoid this, and it even happens to them sometimes.
The difference is that the vendor has a bunch of engineers who can quickly fix the issue and release a new version.
It’s only worth it to write your own tools when nobody offers a solution that fits you.
It’s a little bit like monitoring your heart rate for your own health
I wear a heart rate monitor to help me estimate how active I am during the day, and how hard I work during my workouts. Heart rate monitors are pretty affordable, and you can choose between wearing them on your wrist and wearing a chest strap. Some are more accurate than others, and they have different reporting tools.
I could learn to take my own heart rate and sample and record it myself. I could probably build some reports off it. But I’m really happy having spent $150 for a device that does it for me.
This leaves me free to spend my time interpreting the heart rate and calorie burn data it gives me, and customizing my activity to fit my health plan.
How to get budget for a performance monitoring tool
Do two things:
- Outline the business cases that a performance monitoring tool will help with. Link to specific customer incidents that it would help resolve.
- Pick the top 2 or 3 vendor tools you’d like to test, and map their features to the business cases.
Bam, your request is looking a lot more legitimate.
Test them one at a time. Start with a non-production server.
Your best bet is to write some code to reproduce performance problems against that server.
Ideally these map to your business cases.
- Find sample code with searches to simulate blocking and deadlocks, if you’d like to start there.
- Modify my sample code for testing inserts for race conditions with Microsoft’s free ostress tool for more fun (here it is)
- Write some queries that read a lot of data and possibly run them from an Agent job (maybe it calls ostress)
Review how your use cases all look in the tool you’re testing.
Are the wait stats recorded and displayed well? Are they useful to you?
How easy is it for you to find the queries related to the wait stats?
Reach out to the vendor during your trial if you’re having problem. Sometimes the tools are smart in ways that aren’t obvious. This also gives you some insight into their support processes.
Tip: check if the tool which you test sends monitoring data to the cloud. If so, make sure you get that approved by management before putting the tools into production. In sensitive environments, get that approved before you test it, too.
If I’m troubleshooting a system for a short time, or if there’s no budget, I’ll use and contribute to an open source tool
Sometimes there’s good reasons for budgetary limitations– maybe you work for a non-profit and that money is literally feeding children.
Or maybe you’re doing a short term analysis and you just need to collect information over a couple of days, and there’s no time to test and deploy a more robust tool.
In that case, I’d start with sp_BlitzFirst from Brent Ozar Unlimited:
- It’s free
- It’s open source
- It’s got some documentation to get you started
- It’s already wired up to support different sample lengths, and write to tables
- It looks at running queries as well as some system metrics to help point out critical information related to the wait stats
You can start with what others have built, and slowly contribute on your own as well. Much nicer than starting from scratch.