Adaptive Query Processing (Dear SQL DBA Episode 21)

on November 3, 2016

Episode update, April 2017: Learn About Adaptive Query Processing from Joe Sack

Microsoft shared a great video about Adaptive Query Processing, and you can learn about this new feature from Microsoft Program Manager Joe Sack. My original post below is full of speculation. Joe’s video is full of actual facts!

Note: I’ve embedded Joe’s video, which is published on the Microsoft Cloud Platform’s YouTube Channel. I had less than nothing to do with making it. Go give it a thumbs up to show them some love.

And now for my original episode, back when this feature was first announced

I’m mixing things up a bit in this episode. I want to talk about a question that keynotes and sessions at the SQL PASS Summit got me thinking about last week. Let’s talk about Adaptive Query Processing.

Watch the 24 minute video, scroll down to read the transcript, or subscribe to the podcast.

This post includes a lot of speculation

I’m headed to the Microsoft MVP Summit next week. The cool thing about the MVP Summit is that you get to learn some things that aren’t public yet. The downside is that once you get some secret info, that closes off your ability to speculate a bit… because you don’t want to speculate too close to something that is “secret”.

Everything I’m talking about today was either revealed publicly last week at the Summit, or is speculation on my part (and is pure speculation, I have no privileged insights on this stuff).

I’ll do my best to be completely clear about what’s speculation and what isn’t here.

Keynote focus: predicting the future

Perhaps speculation feels like the right topic today because Microsoft folks talked a lot about the importance of prediction in the keynotes at the PASS Summit last week.

SQL Server 2016 features R Services. This brings the ability to learn patterns and make predictions into the database engine.

Using this new feature came up a lot in the keynote. And not just for performing predictions for a user application, either: there were quite a few references about using SQL Server’s predictive powers to make SQL Server itself smarter.

So what might that mean?

We’re used to SQL Server optimizing a query before it runs

When you execute a query, the SQL Server optimizer has to quickly analyze what all its options are for executing the query with the data structures it has at hand. It uses ‘statistics’ to help it estimate how much data it will get back from each structure.

It has a lot to figure out: what types of joins should it choose? Should it use a single core or multiple cores? How much memory should it allocate for operators that need to do things like sorting and creating hash tables in memory?

It has to figure it out fast. Every microsecond taking in optimizing a query is a microsecond the user is waiting.

Once a query starts executing, SQL Server doesn’t (currently) do “re-optimization”

Once the optimizer chooses a plan, the query goes off to the races. SQL Server doesn’t have the option for it to turn back.

Some of us have wondered for a while if we might get a feature where SQL Server can change a query plan after it starts running if it looks like estimates from statistics weren’t accurate.

Oracle has a feature called “Adaptive Query Optimization” which stretches the optimization process out into the query execution phase. Oracle can start a query with a “default plan.”

I’m no Oracle expert, but here’s how their docs describe Adaptive Query Optimization:

  • Once the query is running, if it looks like estimates were way off, it can change portions of the plan based on what it’s finding.
  • It can change joins, parallelism, and even create “dynamic statistics” to get more detailed information where things looked fishy.
  • Oracle can also use what it learns about rowcounts after a query executes to help it optimize future executions of that query.

I’m not going through this to suggest that SQL Server will implement the same features. But it can be useful to think about what competitors are doing in terms of optimization to open up our view a little when we’re thinking about what’s possible. And of course, SQL Server can go beyond this.

Things have been changing in Azure with automatic index tuning in the SQL Database Advisor

tuning-forksThis isn’t your old Database Tuning Advisor. You have a newer option called (similarly) SQL Database Advisor when you use Azure SQL Database.

The SQL Database Advisor in hosted databases can recommend indexes to create and drop, and it’ll note when queries aren’t parameterized or are getting a lot of recompiles to end up with the same plan.

You have the option to tell the SQL Database Advisor to automatically manage indexes. In this case, it’ll not only apply the index changes but watch performance after it makes the change. If things get slower, it’ll revert the change.

How well does this work in practice?

Honestly, I have no idea :)

But I’m starting to get really curious after the Summit this year, so I’m planning to start exploring this more.

Announced last week: Adaptive Query Processing

I attended a session called “What’s New in Azure SQL Database?” at PASS last week. This was presented by Lindsey Allen and other program manager on the SQL Server Engineering team.

There was a lot of cool stuff discussed in the session, but two bullet points in particular jumped out at me:

  • Performance insight and auto-tuning
  • Adaptive query processing

Adaptive query processing is basically a subset of what’s being called “performance intelligence”. We saw a very cool demo video that explained that Adaptive Query Processing is focusing on three things:

  1. Row estimates for “problematic subtrees”
  2. Adjusting memory grants
  3. Fixing join types when needed

How is Adaptive Query Processing going to work?

I have no idea. This is a totally new area, and it was a fast moving session that quickly moved on to other new features.

I got two somewhat conflicting ideas about how this might work, and I’m looking forward to sorting it out in the future.

Count this all as pure speculation, because I may have a very skewed understanding of what I heard at this point.

  1. This might be based on collecting information by observing a workload of queries – say, queries collected in Query Store– and using R Services to find queries where optimization needs to be improved, then giving feedback for future runs of the query.
    • Simple example I can think of when it comes to memory grants: if SQL Server always requests a lot more memory than it actually uses for a frequent query, this could be learned and the grant could be reduced. This could help avoid low workspace query situations on very busy systems (aka RESOURCE_SEMAPHORE waits)
  2. This might also involve some dynamic optimization at runtime. One slide I saw was talking about joins, and used the phrase “Defer the join choice until after the first join input has been scanned.”
    • That sounds a lot like optimization may be stretching out into the execution of the query, right?
    • I also saw the sentence “Materialize estimates for problematic subtrees”, which sounds like getting extra statistics for parts of the plan where estimated rows and actual rows differ. But no idea yet if this could happen on first execution of the query or would be observed across a workload after a bunch of things have run.

Speculation: to optimize a “herd”/ workload of queries, wouldn’t Query Store need wait stats?

If I did understand correctly that Adaptive Query Optimization at least in part requires using data collected from a workload of queries and analyzing it in R, then the 2016 Query Store feature seems like it’d be a big part of the picture. Query Store collects runtime statistics and execution plans.

But to do this well, wouldn’t the analysis also need to know why a query was slow? Perhaps it just couldn’t get started because it was waiting on a lock. That doesn’t necessarily mean it needs to have different joins or its memory grant changed.

This is pure speculation, but if Adaptive Query Processing uses Query Store data, this makes me think we might see Query Store collecting Wait Statistics sometime soon.

Will Adaptive Query Processing be Cloud-Only, or part of “boxed” SQL Server?

The session I was attending was specifically on Azure SQL Database.

I didn’t hear an announcement about whether this feature might be available outside of the cloud. But I also didn’t hear anything that sounded like it would prevent this feature from working in the the “install and manage it yourself” boxed version of SQL Server.

A lot of times we don’t get a clear answer on this until they start to ship previews of new major versions of SQL Server – so treat anything you hear as speculation unless it’s directly from a Microsoft Program Manager.

You can sign up for the preview of Adaptive Query Processing

Check it out yourself!

Got your own speculations? Or even (gasp) some facts?

Tell me all about it in the comments!