Finally, A SQL Server Monitoring System that Leverages Query Store

By Kendra Little on May 15, 2024

I’ve spent a bit of time with Microsoft’s new database watcher tool for Azure SQL recently.

There are a lot of things I like about database watcher– which is currently in preview and which refuses to Capitalize Its Name– but it does one big thing that I really, really like: it collects data from Query Store. You can access that Query Store data from built-in database watcher dashboards, query it using KQL, or (something something) in Microsoft Fabric if you’ve got money to burn on your monitoring data.

Query Store has been available since SQL Server 2016, but I haven’t yet heard of monitoring tools that truly take advantage of it. It’s about time.

Three major sources to query performance data from SQL Server

Most monitoring tools for SQL Server collect fairly similar information:

  • What queries are running at a given time?
  • What were their basic metrics like duration, cpu time, logical reads, number of writes, query workspace memory used?
  • Were the queries waiting on resources for a significant amount of time, like lock waits?
  • What query execution plan was in use?

This data can be gathered from SQL Server in a variety of ways, which have different tradeoffs. Most major monitoring tools for SQL Server use a combination of Plan Cache and Query Stats DMV Sampling as well as some tracing to collect information:

APPROACH PROS CONS AVAILABILITY
Plan Cache and Query Stats DMV Sampling Most lightweight polling. Limited or missing data for queries with recompile hints, will miss some data whenever plan cache is cleared Since SQL Server 2005
XEvents or SQL Trace Misses few events, high accuracy Heaviest performance impact of the options. Some things, like execution plans, will really slow performance if collected this way. Since SQL Server 2005
Query Store Data is kept updated by SQL Server itself and is the basis for many recent SQL Server intelligent query processing (IQP) features May not work well with default settings for some workloads, especially if allocated minimal space. Only aggregate performance metrics over specified collection interval. Misses queries that are cancelled before they finish compiling. Since SQL Server 2016

Why isn’t trace data or plan cache data enough?

Plan cache and trace data are certainly useful. Query Store still only collects data from writable primary instances, so for readable secondaries they are the only options we’ve got. (Query Store for Secondary Replicas, a sort-of-maybe feature from SQL Server 2022, remains in preview here in May 2024. It is not available even to preview in Azure SQL Database or Managed Instance. It does feel like a lot might be wrong with it, the longer it lingers.)

But Query Store data is, well, it’s just more reliable. When I use a monitoring tool that polls the SQL Server Plan Cache for a lot of its information (which is most of the time these days), I often find that in real world troubleshooting situations that data is just not quite right. Some things are missing. Some things will be confusing. This is increasingly true for performance issues that slow the whole system down – external polling for monitoring information is impacted by this as well, and it may completely fail for the most problematic periods.

I find that in these cases, Query Store contains the information I need to figure out what was going on. Query Store doesn’t have complete information – the monitoring from plan cache is still useful. But when I want a “source of truth” about query execution plans and aggregate performance metrics, Query Store data is what I want to look at. And if there were major waits — from tempdb metadata waits to resource_semaphore waits to threadpool waits — Query Store gathered the data when not much else worked.

However, I don’t want to/ can’t maintain Query Store data in every database forever. I don’t want it to grow so large that it hits sized based cleanup.

And I also don’t love querying production to get performance data, either. I’d rather it go into a monitoring system.

Why aren’t all monitoring companies doing this for SQL Server?

I haven’t heard of other SQL Server monitoring tools leveraging Query Store. I did some searching on this to see if this had happened and I hadn’t noticed, and found a set of very funny forum posts where users ask if a SQL Server monitoring tool is going to use Query Store, and a sales rep responds with a version of, “Hey, we collect all this query data! Look at it! Here’s a picture! Please don’t notice that I’m not answering your question.”

I don’t think any of the major SQL Server monitoring tools will likely treat Query Store as a major source of data collection for a while, for largely timing reasons.

  • Most of these products have been established for a while, and they already built a foundation on using the plan cache/ DMVs with some trace data sprinkled in.
  • Enterprise sales depend on a tool being able to cover many versions of SQL Server, and lots of folks still run SQL Servers older than 2016.
  • Query Store isn’t enabled by default in SQL Server databases until SQL Server 2022, and even then it’s only for new databases created – it’s not enabled for existing databases on upgrades automatically. There are plenty of SQL Servers where Query Store isn’t running.
  • Query Store does need to be sized appropriately to avoid cleanup problems or it often slipping into read-only mode, when it doesn’t collect data. The plan cache and trace approaches don’t have this weakness.
  • See my above rude commentary on Query Store for readable secondaries growing old while still stuck in preview.
  • A very high number of users don’t need super accurate Query Store data – they aren’t specialists. They need simpler, more high level metrics to act on.

Given all of this, I don’t really blame monitoring companies for not rewriting their tooling to rely on Query Store immediately.

However, I still want Query Store data to be collected into a datastore. I’m part of the portion of specialist users who need this data to ensure I’m diagnosing a situation accurately and recommending the right course of action. Isn’t it about time we got something in this area?

Why this works for database watcher

Database watcher currently supports Azure SQL Managed Instance and Azure SQL Database. Query Store is available to all users of these services, and it’s enabled by default when you create a new database.

While there have been a variety of monitoring-ish tools for Azure SQL over the years, none of them have gotten much traction. They’ve all been so similarly named that I never learned how to tell them apart, to be honest. But basically, there was no “this has been around for 10 years”, robust, widely adopted SQL Server engine monitoring system to displace in Azure.

A fresh start was made, and when you’re making a fresh start with a tool specifically for “current” Azure SQL Engine versions, using Query Store as a major data source is a clearer choice.

For me, it’s a very refreshing one. I’ve already found data in database watcher that has helped solve tough problems.

One final note: having a monitoring system collect data from Query Store helps you balance Query Store size and data retention settings– if the data is being collected by database monitor, you don’t need to retain nearly as much history in the production database. That makes managing Query Store size easier, and things work better all around.

Hopefully other monitoring solutions add Query Store in as a data source before another 8 years go by.