Tag Archives | Execution Plans

Are Bad Statistics Making My Query Slow? (Dear SQL DBA Episode 39)

An important query is suddenly slow. Is it because statistics are out of date? This is tricky to figure out, and updating statistics right away can make troubleshooting even harder. Learn how to use query execution plans to get to the heart of the question and find out if stats are really your problem, or if it’s something else.

In this 35 minute episode:

  • 00:39 SQL Server 2017 Announced
  • 01:10 New video from Microsoft’s Joe Sack demonstrating Adaptive Query Processing
  • 03:05 This week’s question: Are bad stats making my query slow?
  • 05:26 Demo of finding plan in cache and analyzing stats begins
  • 28:17 What to do when stats ARE the problem

Code samples: https://gist.github.com/LitKnd/f07848d59cedc61fd057d12ab966f703

Audio-only version (downloadable)

Video version

Related links

SQL Server 2017 Adaptive Query Processing video by Joe Sack

Michael J Swart on finding Dark Matter Queries

Slow in the Application, Fast in SSMS? An SQL text by Erland Sommarskog

Got a question for Dear SQL DBA? Ask!

What’s that Garbage in my Execution Plan? (Dear SQL DBA Episode 27)


Today I was working on some code samples for a user question, and I hit a weird roadblock.

There was a bunch of garbage in my execution plan that I couldn’t explain. And by ‘garbage’, I mean a nested loop to a whole branch of code that I hadn’t asked SQL Server to run — and a warning about an implicit conversion possibly causing problems with the quality of my execution plan.

It took me a while to figure out the issue, and along the way I asked the following questions:

Am I really querying a table, or am I accidentally querying a view? (It’s a table! I checked at least three times.)

Is there some weird computed column in the table that I don’t know about? (Nope, nothing involved was a computed column, I checked that twice.)

Am I awake right now? (Pinched myself, appeared awake. I have had weird dreams about SQL Server before, though.)

Do I actually know anything about SQL Server? (Just about everyone has imposter syndrome sometimes.)

Having gone through this checklist, I decided that I was awake, and that I could figure things out by looking through the execution plan carefully.

Sure enough, calming down and stepping through the plan did the trick. Just like it always has.

Watch the 18 minute video to find out the mysteries in this execution plan, or scroll on down to read about what I discovered. If you enjoy the video, you might like to subscribe to the podcast. A review on iTunes will help other folks find out about the show.

Here’s what the weirdness looked like

I was using the WideWorldImporters sample database from Microsoft. And I was running a stupid simple query.

Stupid simple like this:

SELECT 
    CustomerName
FROM Sales.Customers;
GO

I’m selecting one column. There are no predicates at all. Sales.Customers is a normal old rowstore table, and CustomerName is nvarchar(100).

For a query like this, I’d expect a very simple plan: an index seek or scan operator to pull back the data, and a SELECT operator.

Instead, I saw an execution plan with 20 nodes, and a big old warning on the SELECT operator.

Here’s what it looked like in SQL Sentry Plan Explorer:

Click to see a larger image

If you’d like to play along in a more interactive version, here’s the query over at Paste the Plan. (The website view doesn’t show al the operator properties I’m going to talk about here, but you can grab the XML and use it in SSMS or Plan Explorer to see the detail if you’d like.)

Hovering over the warning on the SELECT operator, here’s the warning (this is the tooltip from SSMS):

Weird, it’s warning about the SalesTerritory column. I didn’t ask for it to do anything at all with SalesTerritory. Why is it doing a type conversion on that column?

Let’s start at the top right of the query

When I’m in doubt about an execution plan, I like to just start at the top right of the query plan and work my way through. I think of that top right operator as the “driver”.

In this case, it’s a clustered index scan of Sales.Customers. That makes sense: I asked for all the customer names. Hovering over that operator, though, there is something funny. When I look at the ‘output’ columns, it is outputting not only CustomerName, but also DeliveryCityID!

So what’s it doing with DeliveryCityID?

Moving one step to the left in the plan, there’s a nested loop operator. Hovering over that operator, it says that it outputs the CustomerName column to the select operator. (Good, because that’s what we asked for!)

It also says that the Outer References for the nested loop are based on DeliveryCityID. OK, so it’s pulling back that column because it needs it to run the nested loop. We still don’t know why, but if we hunt around in that branch of the plan, maybe there’ll be a clue.

At this point, I started hovering over operators in that branch of the plan

As in life, when you’re lost in an execution plan, move around slowly and carefully, observe your surroundings, and look for your mom. I mean, look for inspiration.

I could see that the query was pulling from the Cities and StateProvinces tables. And there were a bunch of filter operators as well.

Click to see a larger image

Here’s what the filters are doing:

  • is_rolemember(N’db_owner’)<>(0)
  • is_rolemember([Expr1011]+N’ Sales’)<>(0)
  • [Expr1012]=session_context(N’SalesTerritory’)
  • original_login()=N’Website’

This is security garbage! Err… a security feature!

Aha! This is a definite clue. Some sort of security wizardry has been applied to this table, so that when I query it, a bunch of junk gets tacked onto my query.

I have no shame in admitting that I couldn’t remember at all what feature this was and how it works. A lot of security features were added in SQL Server 2016, and the whole point of a sample database like this to kick the tires of the features.

I did a little remembering, and a little searching, and figured out that this is the Row Level Security feature (RLS) in SQL Server 2016.

Row Level Security (RLS) adds predicates to your query execution plans

Here’s how row level security works. You create a table valued function that can be “inlined” (meaning merged) into your query execution plan. That function determines who can see the data.

Then you create a Security Policy for Row Level Security which defines when the table valued function will be applied to queries against a table.

The whole point of Row Level Security is, actually, that it adds these predicates to your execution plans.

How do I tell if Row Level Security changed my plan?

There’s a really easy way to tell if your plan was modified by RLS, I just didn’t know to look for it.

Click on the ‘SELECT’ operator in the plan and look down in the properties pane. If you see ‘SecurityPolicyApplied’ = True, then parts of your execution plan may have come from a table valued function that Row Level Security added in.

Should you use row level security?

Wellll…. maybe. If you’re interested, read up on the possible loopholes in RLS as it stands now, and consider if those would impact you or not. Aaron Bertrand has a great article to get you started: read SQL Server 2016 Row Level Security Limitations, Performance and Troubleshooting.

Everyone feels dumb looking at Execution Plans sometimes

I look at plans a lot, and still, they had me questioning my sanity today. When I first started doing performance tuning in SQL Server, I understood so little about plans that I gave up pretty easily.

I’m really happy that I kept going, though. Because as confusing as they are, most of the time the answers you’re looking for are right there in the plan. Somewhere.

Just keep swimming.

Parallelism and tempdb data file usage in SQL Server

baguettes-tempdb-files-franceI’m sometimes asked if the number of CPU cores used by a query determines the number of tempdb files that the query can use.

Good news: even a single threaded query can use multiple tempdb data files.

First, tempdb is a tricksy place!

My first version of this post used a flawed methodology: I configured an instance of SQL Server 2016 with four equally sized, small tempdb files. I then tested a set of queries that qualify for parallelism alternately using 4 processors, and running them single threaded (using option maxdop 1 as a query hint).

I observed that the queries always made all four files grow.

However, in that first version of the post, I forgot that the default behavior in SQL Server 2016 is to grow all files in tempdb simultaneously when one grows. Basically, one small feature of SQL Server 2016 is that trace flag 1117 is always enabled by default for tempdb. Just because all the data files grew doesn’t mean they all got used!

(For more on tempdb behavior changes in SQL Server 2016, read Aaron Bertrand’s post here.)

Can we just turn off TF 1117 for tempdb?

For most databases, you can control whether all the files in a filegroup grow at once by changing a setting on the filegroup.

But not tempdb. If you run this command:

ALTER DATABASE tempdb MODIFY FILEGROUP [PRIMARY] AUTOGROW_SINGLE_FILE;
GO

You get the error:

Msg 5058, Level 16, State 12, Line 1
Option 'AUTOGROW_SINGLE_FILE' cannot be set in database 'tempdb'.

Let’s use science. Otherwise known as Extended Events.

I admit: I didn’t do this in the first place because it was a bunch of work. The devil is in the detail with stuff like this. A lot of events seem like they’d work, but then you try to use them and either you don’t get the data you’d expect, or the trace generates such a massive amount of data that it’s hard to work with.

I used the sqlserver.file_read event and a histogram target

I love the histogram target because it doesn’t generate a huge file of data to confuse myself with. I set up my new test this way…

Extended Events trace for sql server.file_read

  • Collect file paths (just to make it easy to see the data file names)
  • filter on database_id=2 (tempdb)
  • filter on session_id = 57 (the session I happened to be testing under)
  • The histogram target was set to “bucket” file_read by each file path

I set up a bunch of queries in a batch, and for each query:

  1. Dumped everything out of memory (ran a checkpoint in tempdb and dbcc dropcleanbuffers)
  2. Ran a short waitfor in case the checkpoint took a bit of time
  3. Started the trace
  4. Ran the query
  5. Dumped the data from the histogram into a table
  6. Stopped the trace (this clears the histogram)

I ran the whole thing a couple of times to make sure I got consistent results. And I did!

Results: single threaded queries CAN use multiple tempdb data files

One of my queries does a large sort operation. Using maxdop 4 and maxdop 1, it still evenly used my tempdb data files.

sort-operator-multiple-tempdb-files

I saw similar patterns with a query using a merge join, a query using a lazy spool, and a query with a sort that was engineered to be under-estimated (and has a memory spill).

Results: queries may not use tempdb files in the same ways

As you might guess, things may not always get evenly accessed, even if you have evenly sized tempdb files. One of my queries did a select into a temp table. Although it used all four tempdb files whether or not it went parallel, there were more file_read events against the first tempdb file than against the other four.

multiple-tempdb-files-different-access-patterns

Could my methodology still have problems?

It’s possible. I haven’t used the sqlserver.file_read event much, or tested it super thoroughly. It seemed to give me consistent results.

So while it’s possible that I’ll suddenly realize that there’s a better way to measure this, I think at least my results are better than they were the first time I wrote this post!

Columnstore Indexes and Computed Columns in SQL Server 2016

You can’t do everything with a columnstore index — but SQL Server’s optimizer can get pretty creative so it can use a columnstore index in ways you might not expect.

You can’t put a computed column in a columnstore index

If you try to create a nonclustered columnstore index on a computed column, you’ll get error message 35307:

Msg 35307, Level 16, State 1, Line 270

The statement failed because column 'BirthYear' on table 'FirstNameByBirthDate_1976_2015' is a computed column. Columnstore index cannot include a computed column implicitly or explicitly.

But SQL Server may still decide to use a Columnstore index for a query specifying a computed column!

I went ahead and created a nonclustered columnstore index on the other columns in my table, like this:

CREATE NONCLUSTERED COLUMNSTORE INDEX col_dbo_FirstNameByBirthDate_1976_2015
    on dbo.FirstNameByBirthDate_1976_2015 
    ( FakeBirthDateStamp, FirstNameByBirthDateId, FirstNameId, Gender);
GO

Then I ran this query against the table, which groups rows by the computed column, BirthYear:

SELECT TOP 3
    BirthYear,
    COUNT(*) as NameCount
FROM dbo.FirstNameByBirthDate_1976_2015
WHERE BirthYear BETWEEN 2001 and 2015
GROUP BY 
    BirthYear
ORDER BY COUNT(*) DESC;
GO

Looking at the execution plan, SQL Server decided to scan the non-clustered columnstore index, even though it doesn’t contain the computed column BirthYear! This surprised me, because I have a plain old non-clustered index on BirthYear which covers the query as well. I guess the optimizer is really excited about that nonclustered columnstore.

Click for larger image

Click for larger image

The columnstore index isn’t the best choice for this query:

  • Duration using nonclustered rowstore index on computed BirthYear: 2.722 seconds
  • Duration using nonclustered columnstore index: 5.5 seconds

Where’s BirthYear? Let’s look at the Compute Scalar farthest to the right

Clicking on that compute scalar operator and looking at the properties window, we can see that SQL Server looked up the definition for the computed column, and figured out that the computation is based on columns in our nonclustered index– so it could scan that index, then run the computation for each row.

compute-scalar-operator-defined-values

SQL Server is waiting until the third operator, a filter, to filter out the rows for BirthYear between 2001 and 2015:

execution-plan-first-three-steps

The cost estimate on that Compute Scalar is waaaayyy low…

This is an actual execution plan, so I have Actual Time Statistics, and I can see exactly how much CPU was burned to compute BirthYear for every row. Scrolling up in the properties window, I find that this took almost five seconds for each thread that worked on the compute scalar. That’s more than 80% of the query’s duration just to figure out BirthYear.

Oops!

compute-scalar-operator-actual-time-statistics

 

I can rewrite my query a bit to push that filter down…

My original query has the predicate, “BirthYear BETWEEN 2001 and 2015″. Let’s change that predicate to a non-computed column:

SELECT TOP 3
    BirthYear,
    COUNT(*) as NameCount
FROM dbo.FirstNameByBirthDate_1976_2015 
WHERE FakeBirthDateStamp >= CAST('2001-01-01' AS DATETIME2(0))
    and FakeBirthDateStamp < CAST('2016-01-01' AS DATETIME2(0))
GROUP BY 
    BirthYear
ORDER BY COUNT(*) DESC;
GO

I’m still using the computed column BirthYear in my SELECT and GROUP BY.

SQL Server still chooses the columnstore index for this query, but now there is a predicate on the columnstore index scan itself:

predicate-on-columnstore-index-scan

This means far fewer rows are flowing into the compute scalar operator — we don’t have to calculate BirthYear for any of the rows from 1976 through the end of 2000.

Sure enough, it’s faster

Making this change to the query text makes our nonclustered columnstore index highly competitive with Ye Olde covering rowstore b-tree index:

  • Duration using nonclustered rowstore index on computed BirthYear: 2.722 seconds
  • Duration using nonclustered columnstore index with original query: 5.5 seconds
  • Duration using nonclustered columnstore index with predicate re-written to not reference computed column: 2.2 seconds

If we couldn’t re-write the predicate easily for whatever reason, we might choose to keep the non-clustered rowstore index on BirthYear around and use OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) in our query.

Be careful with computed columns and columnstore

matrix-95-percent-computed-columnsI had assumed the optimizer would be reluctant to create a plan for a computed column, since that column can’t be in the columnstore index. But it turned out to be pretty eager to do it.

If you’ve got computed columns and are testing out columnstore, look carefully at your queries and check to make sure you don’t have any super-expensive compute scalar operators showing up in your plans where you might not want them.

Vote to allow computed columns in columnstore indexes

Wouldn’t this all be easier if you could just put the computed column in the columnstore, anyway? Vote up this Connect item.

 

Why is My Query Faster the Second Time it Runs? (Dear SQL DBA Episode 23)

Today’s question is about why a query might be slow at first, then fast the next time you run it.

Watch the 26 minute video or scroll on down and read the written version of the episode instead. If you enjoy the podcast, I’d appreciate your review on iTunes! Info on how to ask questions and subscribe is here.

Dear SQL DBA,

Whenever I create a complex view, the first time I use it in a query, the query is slow. The same thing happens after I alter the view. What is the reason behind this?

This is a great question — because when you ask this question, you’re about to discover a lot of interesting, useful things about how SQL Server runs queries.

There are two ways that SQL Server can use memory to make a query faster the second time you run it. Let’s talk about how the magic happens.

1) It takes CPU time to figure out how to run a query. SQL Server uses memory to cache execution plans to save time the next time you run the query.

slow-then-fastThe first time you run a query using the view, SQL Server has to ‘compile’ an execution plan to figure out the best way to run the query.

SQL doesn’t compile the plan when you create the view– it compiles a plan when you first use the view in a query. After all, you could use the view in a query in different ways: you might select only some columns, you could use different ‘where’ clauses, you could use joins.

Secretly, it doesn’t matter too much that you’re using a view. When you run a query that references it, behind the scenes SQL Server will expand the TSQL in the view just like any other query, and it will look for ways to simplify it.

So SQL Server waits to compiles a plan for the exact query you run.

  • Depending on how many joins are in the view and how many ways SQL Server could run the query, it may take it a while to compile the query execution plan.
  • SQL Server tries to come up with a decent plan quickly. But I have seen some cases where query compile time took 5+ seconds, and query execution time was much smaller.

SQL Server is designed to store the execution plan for a query in memory in the execution plan cache, in case you run it again. It would be very costly for the CPUs to generate a plan for every query, and people tend to re-run many queries.

If you re-run a query and there is already an execution plain in the plan cache, SQL Server can use and save all that compile time.

When you alter a view, this will force SQL Server to generate a new execution plan the first time a query uses the view afterward. Something has changed, so SQL Server can’t use any plans that were in cache before.

Restarting the SQL Server, taking the database offline and online, memory pressure, and many server level settings changes will also clear execution plans out of cache, so you have to wait for a compile.

How much time are you spending compiling?

There are a few ways to see this:

  • If you are testing and want to see how much time is spent on compiling, you can run in your session: SET STATISTICS TIME ON; After that, SQL will show you the ‘parse and compile time’ for each query in your ‘Messages’ window for that session.
  • If you’re looking at execution plans, ‘compile time’ is stored in the XML. You can see it in the properties on the upper left-most operator. It’s reported in milliseconds and is the same as the ‘elapsed time’ that appears under parse and compile time from SET STATISTICS TIME.

compile-time-in-execution-plan

  • Query Store collects compilation time statistics, including compile duration. You can see some of the details in this post I wrote on Query Store and recompile hints.

Views aren’t really a problem. Sometimes, lots of joins are a problem, but SQL Server still has tricks to compile a plan quickly.

When people use complex views, particularly nested views, they often end up with a LOT of joins in each query.

When SQL Server has a lot of joins, optimization gets harder. There’s a ton of ways the query would be executed.

The SQL Server optimizer doesn’t want you to wait a long time while it looks at every single thing it could possibly do. It takes some shortcuts. It wants to get to a good plan fast.

Generally, SQL Server tries to keep optimization times short, even when you have a lot of joins.

But there are cases where sometimes compilation takes longer than normal.

What if you see multiple seconds of parse and compile time?

Usually compilation time is a number of milliseconds, but I have seen some cases where it’s seconds. This could be for a few reasons:

  1. SQL Server had to wait for CPU when it was trying to compile the query. It’s not the query’s fault, there just weren’t processors available. I would look at SQL Server’s wait statistics to identify this. If there were notable SOS_SCHEDULER_YIELD waits in that time period, the issue is more server load than the specific query.
  2. You’ve hit an identified bug for compile time. Some cumulative updates for SQL Server fix long compilation times. It’s worth looking at the version of SQL Server you’re on, setting up a fully patched test instance, and checking if applying updates might improve your compilation times.
  3. You’ve hit an unidentified bug for compile time. SQL Server works pretty hard to compile plans quickly, so multi-second compile time usually looks like a bug to me, if it’s not due to server load. For views and complex queries, I would:
    • Look to see if I could simplify the query where-ever possible, to make things easier for the optimizer.
    • Check if indexes might simplify the plan. Good indexes can make choosing a good plan easier and faster.
    • Try query hints as a last resort. The problem is that hints are really directives, and force a behavior — what’s beneficial to force today may not be so great after the next upgrade, or even  if my data sizes change over time.

2) It takes time to read data from disk. SQL Server uses memory to cache data in the Buffer Pool so it doesn’t have to go to disk the next time you use that data.

There are more reasons that the second run of a query might be faster.

The first time you run the query it may be using data that is on disk. It will bring that into memory (this memory is called the “buffer pool”). If you run the query again immediately afterward and the data is already in memory, it may be much faster — it depends how much memory it had to go read from disk, and how slow your storage system is.

When you are testing, you can see if your query is reading from disk (physical reads and read ahead reads) by running: SET STATISTICS IO ON;

One difference with this type of memory is that your buffer pool memory is not impacted by ALTERING the view. SQL Server does not dump data from the buffer pool when you change a view or procedure. Instead, it keeps track of how frequently different pages of data are used, and ages out the least useful pages from memory over time.

So this might be part of the answer in your case, but it wouldn’t necessarily explain why the query would be slower on the first run after an ALTER — unless the data pages that you’re using just hadn’t been queried a while and were no longer in the buffer pool cache by chance.

Takeaways for testing queries

I usually tune queries with a “warm cache”

Most queries that reference commonly used tables have a good chance of the data they need being in cache.

To test against a warm cache, I run the query once, and don’t pay a ton of attention to that first run.

I run the query again and measure the duration with the execution plan cached and the data pages in the buffer pool.

You can tune with a “cold cache”, but be careful

If I have a reason to believe that the data won’t be in cache when the query is run, then I will test it against a “cold cache”. I might need to do this if it’s a nightly query that runs, and the data it uses isn’t relevant at all to the daytime workload– so the pages are likely to not be in the buffer pool when it’s time for the job to run that night.

To test against a cold cache, you have to do some things that are NOT friendly for a production server — so make sure you only use this approach against test instances:

  • Run DBCC DROPCLEANBUFFERS to flush unmodified pages from the buffer pool. (This will slow down your whole instance because EVERYBODY then has to start reading from disk)
  • If modifications have been happing as part of the test, run a manual CHECKPOINT to flush dirty pages to disk from the buffer pool
  • Use RECOMPILE hints or a variant of DBCC FREEPROCCACHE to make sure I measure compilation time each time

Estimated vs. Actual Number of Rows in Nested Loop Operators

Reading Query PlansThis is one of those little details that confused me a ton when I was first working with execution plans.

One problem with learning to work with plans is that there’s just SO MUCH to look at. And it’s a bit spread out.

So, even when looking at a single tooltip, things can be confusing.

Let’s talk about the nested loop operator, who can be particularly weird to understand.

Meet our nested loop

Here’s a nested loop from a SQL Server Execution plan:

Nested Loop

For every row that comes in from the top right index seek, SQL Server goes and does the bottom right index seek. Like this:

Nested Loop-inner and outer

I think the best way to explain this was tweeted by Andy Mallon:

But when you hover over that bottom index seek (the inner input), things may look at first like they’re wrong with our nested cheese and crackers.

We’re trained early to compare estimated vs actual rows in plans

One of the first things we often learn when we’re looking at plans is that SQL Server uses estimates. And sometimes, those estimates are wrong. At first glance, this looks really wrong– it estimated 11.5 rows, and actually got 20,825 rows!

Estimated Vs Actual Rows

The highlighted numbers look waaaay off

Similarly, we see these same wrong-looking numbers if we hover over the line between the nested loop operator and the “inner” seek:

Another view- hovering over the line

Read “estimated number of rows” as the estimate per execution

With a nested loop, you have to remember to also look at the number of executions, and do a little math. The number of executions is on the tooltip of the seek itself, but I often have to do a double take to find it, because it’s so crowded. Here it is:

Estimated Vs Actual Rows With Estimated Executions

The estimate here is 11.5692 rows per execution * 2,055.56 executions = 23,782.22598 estimated rows.

And that’s not terribly far off from the 29,969 rows that it ended up reading.

When you see what looks like a bad estimate, take a second look!

Check the estimated number of executions and do a little math. SQL Server may have known exactly what it was doing, after all.